Materials

Workshop Materials

✨Download today’s Slides!✨

✨To the Collaborative Guidelines✨

Case Studies

#1 - HTR for Codex Palatinus Graecus 23

👥 Speakers
Mathilde Verstraete, Maxime Guénette (with the help of Alix Chagué and Marcello Vitali-Rosati)

📌 Overview
This case study focused on the development of an HTR model for the Codex Palatinus Graecus 23, the primary witness of the Greek Anthology.

📖 Sources
A single manuscript, divided in 2 parts: Palatinus graec. 23, pp. 1-614 and Parisinus Suppl. graec. 384, pp. 615-709

📋 Characteristics:

  • Script: Xth century Byzantine round minuscule
  • Hands: at least 4
  • Readability and abbreviations: clear and few abbreviations
  • Layout: One columns & a few scholia

🎯 Project Context
This project emerged from the Greek Anthology Project held at the Canada Research Chair on Digital Textualities

🔬 Results
HTR models trained using Kraken and eScriptorium: Meleagre-NFD-finetuned (91,05%), Meleagre-NFD (90,85%), Meleagre-NFC (91,00%)

📑 Repository & Guidelines

#2 - HTR for Byzantine manuscripts: recognising Ioannes Chrysostomus, Maximus Planudes, and Cyril of Alexandria

(→Slides←)

👥 Spsker
Elpida Perdiki, PhD Candidate, Democritus University of Thrace

📌 Overview
This is a presentation of three case studies, all concerning the training of HTR models on data from Byzantine manuscripts. The manuscripts in question were transmitting texts of a) Ioannes Chrysostomus, b) Maximus Planudes, and c) Cyril of Alexandria. The results are three separate models, not a single unified one, since these are three distinct projects.

📖 Sources & 📋 Characteristics
Based on the previously mentioned list, the manuscripts used are the following:

a) Ioannes Chrysostomus:

  • (Q) Athos, Dionysiou 70, ff. 380v-387r & ff. 404v-411r;
    • 10th c., minuscule, clear readability, minimal ligatures and abbreviations, two columns, minimal marginalia
  • (H) Athos, Vatopedi 328, ff. 004r-010r & ff. 027r-033v;
    • 14th c., minuscule, clear readability, few ligatures and abbreviations, two columns, minimal marginalia
  • (A) Athens, Nat. Libr. 263, ff. 155r-159r & ff. 171v-176v;
    • 10th c., minuscule, clear readability, few ligatures, abbreviations of nomina sacra, two columns, minimal marginalia
  • (I) Alexandria, Patr. Libr. 34, ff. 97v-101v & ff. 112v-116v;
    • 10th c., minuscule, clear readability, no ligatures, abbreviations only on nomina sacra, two columns, no marginalia;
  • (D) Venice, ONB theol. gr.14, ff. 120v-127r;
    • 10th/11th c., minuscule, clear readability, minimal ligatures and abbreviations, two columns, minimal marginalia
  • (E) Paris, Bibl. Nat., Gr. 745, ff. 12r-18r;
    • 12th c., minuscule, clear readability, almost no ligatures but nomina sacra and minimal abbreviated ending syllables, two columns, minimal marginalia
  • (K) Munich, Gr. 377, ff. 134r-137v & ff. 146r-149r;
    • 10th/11th c., minuscule, clear readability, minimal ligatures and abbreviations, two columns, minimal marginalia
  • (L) Munich, Gr. 353, ff. 200v-205v & ff. 225v-234v;
    • 10th c., minuscule, clear readability, minimal ligatures and abbreviations, two columns, almost no marginalia.

b) Maximus Planudes:

  • (V) Vat. Urb. gr. 125, ff. 215r-223v;
    • 13th c., minuscule – Maximus Planudes’ autograph, clear readability, few ligatures and abbreviations, one column, minimal marginalia
  • (A) National Library of Scotland Adv.MS.18.7.15, ff. 55r-60r;
    • 13th c., minuscule – Maximus Planudes’ autograph, somewhat clear readability, heavy on ligatures and abbreviations, one column, no marginalia.

c) Cyril of Alexandria:

🎯 Project Context

a) Ioannes Chrysostomus: The model was curated at the Department of Greek Philology, Democritus University of Thrace by the author, for the purposes of her PhD dissertation.

b) Maximus Planudes: The model was curated at the Department of Greek Philology, Democritus University of Thrace by the following team: Maria Konstantinidou, Assistant Professor (team supervisor); Elpida Perdiki, PhD Candidate (data curation); Athanasia Kiorapostolou, Postgraduate Student (transcriber); Irene Mpogdanou, Postgraduate Student (transcriber); Athanasios Papadopoulos, Postgraduate Student (transcriber); Maria Tsikouraki, Postgraduate Student (transcriber). The project was conducted for the purposes of the postgraduate course “Palaeography I”.

c) Cyril of Alexandria: A sample of the Greek manuscripts transmitting the lexicon of Cyril of Alexandria. The model is curated by Maria Konstantinidou, Assistant Professor (Principal Investigator and team supervisor); Elpida Perdiki, PhD Candidate (data curation); Ioannis Kouroudis, PhD Candidate (transcriber); Nikolaos Tsoukatos, Postgraduate Researcher (transcriber). The project is under the scope of the DMC – Lexi research, implemented within the framework of H.F.R.I. call “Basic Research Financing (Horizontal support of all Sciences)” under the National Recovery and Resilience Plan “Greece 2.0” funded by the European Union – NextGenerationEU (H.F.R.I. Number: KE 014890).

🔬 Results

a) Ioannes Chrysostomus: The model “Chrysostomicus I” (ID: 44872) with a 3.90% CER. The model is trained from combined data of all the previously mentioned manuscripts. A sample of the data is already available in the Zenodo repository and will be gradually updated with the full dataset.

b) Maximus Planudes: In total 4 models were trained. Two with data from the ms. V and two with data from both the V and the A mss. The results were: 16%, 8.50%, 13.1%, and 8.9% respectively. The results will be discussed in more detail during the presentation.

c) Cyril of Alexandria: The model is currently under development. Results to be announced.

#3 - HTR for Codex Genavensis Graecus 44

(→Slides←)

👥 Speakers
Ariane Jambé (postdoctoral researcher, Université de Lausanne)

📌 Overview
The proposed case study, which focuses on the development of an HTR model on eScriptorium for the Genavensis graecus 44, aims to raise two issues that have received relatively little attention in existing HTR projects dealing with Greek codices:

  1. the challenge of analysing the layout of exegetical manuscripts;
  2. the need to anticipate transcription problems likely to arise from the complexity of such layouts.

Accordingly, this case study seeks to shed light on some of the difficulties inherent in manuscripts that have been heavily used and annotated over time—a condition that characterises the Genavensis graecus 44, which was in use for nearly four centuries, first in Constantinople and later by the Genevan humanist Henri II Estienne. The study will explore how such challenges relate (or fail to relate) to the broader methodological reflection necessary for the creation of standards and guidelines within our field.

📖 Sources
The case study will draw on the Genavensis graecus 44, a thirteenth-century Byzantine manuscript of the Iliad. Its main point of interest lies in its rich paratext, which includes a prose paraphrase in Greek inserted between each verse (Α 1 to Μ 454, p. 1-526), as well as numerous scholia and interlinear glosses. This manuscript comprises 802 pages.

📋 Characteristics:

  • Script: XIIIth century “scholarly” Byzantine round minuscule.
  • Hands: One main hand (Gen I); a second hand responsible for numerous corrections (Gen II); and several additional hands whose identification is still debated (for example, Gen *II, possibly intervening between Gen I and Gen II, and Gen III, clearly of a later date).
  • Readability and abbreviations: A distinction should be made here between legibility and readability. While legibility—understood as the ease with which one can decipher a string of characters—is generally quite high in the Genavensis, this assessment could be tempered in the case of the scholia and glosses, as certain writing modules are rather small and (standard) abbreviations are regularly employed. Readability, on the other hand—defined as the ability to trace the relationship between the poem and its paratext—is challenging.
  • Layout: In the first part of the manuscript (p. 1-526), which contains the paraphrase, the main scribe’s editorial project appears to be as follows: in the principal text block, each Homeric verse is immediately followed by its prose paraphrase, while the margins (particularly the lateral ones) are occupied by scholia seemingly aligned with the corresponding verse. In the second half of the manuscript (p. 527-802), the main text block is divided into two unequal columns, the first reserved for the poem and the second for the scholia. However—and this is a central point of discussion in a HTR project context—the very category of “layout” may prove inadequate for documents that have been continuously annotated and reused. Disruptions to the original layout are frequent and do not always follow a systemic logic since each successive owner added glosses and scholia wherever space was available.

🎯 Project Context
The development of an HTR model for the Genavensis graecus 44 forms part of a postdoctoral research project (August 2024 to July 2029), whose aim is to produce a digital edition capable of connecting the poem with its paratextual apparatus (paraphrase, scholia, and glosses).

🔬 Results
As the project is still in its early stages and the model under devlopment, no results have yet been published.

#4 - OCR for Patrologia Graeca: Recognizing Noisy 19th-Century Greek Editions

👥 Speakers
Chahan Vidal-Gorène (Calfa)

📌 Overview
This talk presents a case study on the development of a specialized OCR model for the Patrologia Graeca (PG), a large 19th-century printed collection of Christian Greek texts. It highlights the challenges of working with typographically complex, poorly digitized, and linguistically rich documents, and demonstrates the effectiveness of an active learning strategy based on iterative fine-tuning.

📖 Sources
The corpus consists of 161 volumes of the Patrologia Graeca, published by J.-P. Migne (1857–1866), encompassing works by Greek Church Fathers and Byzantine authors from the 1st to the 15th century.

📋 Characteristics:

  • Script: XIXth century Greek minuscule print (variable quality)
  • Hands: N/A (printed text, but significant variation across volumes)
  • Readability and abbreviations: High use of diacritics, significant noise and print degradation; footnote markers excluded from transcription
  • Layout: Dense dual-column layout (Greek and Latin), with marginalia, running titles, interlinear content, and footnotes

🎯 Project Context
The project is led by Calfa and the GREgORI initiative, under the academic supervision of Prof. Jean-Marie Auwers. It is part of the Calfa GREgORI Patrologia Graeca (CGPG) project, which aims to produce a machine-readable, enriched digital edition of the PG.

🔬 Results
Models were trained using Calfa Vision’s iterative fine-tuning approach, building on an initial HTR model developed for Codex Genavensis Graecus 44. Following an automatic classification phase to identify pages with low recognition performance, we show that transcribing and correcting just 10 pages reduced the Character Error Rate (CER) to 4.19%. With 50 pages, the CER dropped to 1.1% on a target document. The layout model reached a 95% mean IoU for Greek zone detection. A major challenge remains the accurate treatment of transversal text spanning multiple columns.

📑 Repository & Guidelines:

#5 - A variant to HTR: Detection and recognition of Greek characters on papyri

(→Slides←)

👥 Speakers
Isabelle Marthot-Santaniello, University of Basel

📌 Overview
This talk will briefly explain why a character-based approach, rather than the line-based level of HTR, is more promising in the case of paleographic research on Greek papyri. It will present the work on the detection and recognition of Greek letters on papyrus done in the scope of the project EGRAPSA. It will also show how the project invites “humans in the loop” to curate the automatically generated output.

📖 Sources
The corpus studied in the project is constantly extending. A first dataset served as material for the ICDAR2023 Competition on Detection and Recognition of Greek Letters on Papyri (Seuret et al. 2023). It is composed of 185 images from 136 different manuscripts of Homer’s Iliad, covering the millennium between the 3rd c. BCE and the 7th c. CE.

📋 Characteristics:

  • Script: Various kinds used to pen Homer’s Iliad over a millennium (from book hands to semi-cursive), variable quality of execution and preservation.
  • Hands: Extremely varied, anonymous
  • Readability and abbreviations: Mostly detached characters, a few are semi-detached (connected but not really ligatured, meaning in contact with but not distorted by the neighboring letters)
  • Layout: mostly columns from scrolls or pages from codices, very rare interlinear interventions, some diacritics in a few manuscripts.

🎯 Project Context
The project entitled “EGRAPSA: Retracing the evolutions of handwritings in Graeco-Roman Egypt thanks to digital palaeography” is a Starting Grant funded by the Swiss National Science Foundation in Basel between June 2023 and May 2028. A team of Papyrologists and Computer Scientists joins forces to improve computer-assisted paleography, especially on the topic of script typology, dating and writer identification.

🔬 Results
Elaborating upon the ICDAR 2023 Competition on Detection and Recognition of Greek Letters on Papyri, the Mean Average Precision (mAP) for Detection is 52.43 and for Recognition is 45.18. However, recent evaluations based on human curation indicate that in fact only an average of 20% of the automatically generated output needs to be corrected by experts.

📑 Repository & Guidelines:

📚 References on HTR and Ancient Greek

Something missing ? Let us know

Bibliographie
7. Retraining with PyLaia. (s. d.). Consulté 23 avril 2025, à l’adresse https://help.transkribus.org/retraining-with-pylaia
Agostini, G. (s. d.). The Ligorio 0.3 model, Free Public AI Model for Handwritten Text Recognition with Transkribus (Version (Version 42105)) [Jeu de données]. Consulté 23 avril 2025, à l’adresse https://www.transkribus.org/model/ligorio-0-3
Andrews, T. L., & Macé, C. (Éds.). (2014). Analysis of Ancient and Medieval Texts and Manuscripts : Digital Approaches. Brepols.
Baumann, R. (2022a). Kraken-gaza-iliad [HTML]. https://github.com/ryanfb/kraken-gaza-iliad (Édition originale 2019)
Baumann, R. (2022b). Kraken-voulgaris-aeneid [HTML]. https://github.com/ryanfb/kraken-voulgaris-aeneid (Édition originale 2019)
Baumann, R. (2023). Kraken-gaza-batrachomyomachia [CSS]. https://github.com/ryanfb/kraken-gaza-batrachomyomachia (Édition originale 2020)
Calvelli, L., Boschetti, F., & Tommasi, T. (2023). EpiSearch. Identifying Ancient Inscriptions in Epigraphic Manuscripts. Journal of Data Mining & Digital Humanities, Documents historiques et reconnaissance automatique de textes, 10417. https://doi.org/10.46298/jdmdh.10417
Camps, J.-B., Vidal-Gorène, C., & Vernet, M. (2021). Handling Heavily Abbreviated Manuscripts : HTR engines vs text normalisation approaches (No. arXiv:2107.03450). arXiv. https://doi.org/10.48550/arXiv.2107.03450
Chagué, A. (s. d.). eScriptorium Tutorial (en). LECTAUREP. Consulté 25 mars 2024, à l’adresse https://lectaurep.hypotheses.org/documentation/escriptorium-tutorial-en
Chagué, A., & Clérice, T. (2023a). Données ouvertes, données propres, et autres vies : Testaments de Poilus et CREMMA. https://inria.hal.science/hal-04347066
Chagué, A., & Clérice, T. (2023b, juillet 10). « I’m here to fight for ground truth » : HTR-United, a solution towards a common for HTR training data. Digital Humanities 2023: Collaboration as Opportunity. https://inria.hal.science/hal-04094233
Chagué, A., Clérice, T., & Romary, L. (2021). HTR-United : Mutualisons la vérité de terrain ! DHNord2021 - Publier, partager, réutiliser les données de la recherche : les data papers et leurs enjeux. https://hal.science/hal-03398740
Chagué, A., Clérice, T., & Romary, L. (2022). HTR-United : Un écosystème pour une approche mutualisée de la transcription automatique des écritures manuscrites. https://inria.hal.science/hal-04124743
Chagué, A., & Scheithauer, H. (2024, janvier 31). Do (colored) backgrounds matter? An experiment on artificially augmented ground truth for handwritten text recognition applied to historical manuscripts. https://inria.hal.science/hal-04450004
Chauhan, R. (2023, septembre 26). Train Your Own OCR/HTR Models with Kraken, part 1. The Digital Orientalist. https://digitalorientalist.com/2023/09/26/train-your-own-ocr-htr-models-with-kraken-part-1/
Clérice, T. (2023). You Actually Look Twice At it (YALTAi) : Using an object detection approach instead of region segmentation within the Kraken engine. Journal of Data Mining & Digital Humanities, Documents historiques et reconnaissance automatique de textes, 9806. https://doi.org/10.46298/jdmdh.9806
Clérice, T., Vlachou-Efstathiou, M., & Chagué, A. (2023). CREMMA Medii Aevi : Literary Manuscript Text Recognition in Latin. Journal of Open Humanities Data, 9(1). https://doi.org/10.5334/johd.97
Clivaz, C., & Allen, G. V. (2021). Introduction : Ancient Manuscripts and Virtual Research Environments. Classics@ Journal, 18 (Ancient Manuscripts and Virtual Research Environments, special issue). https://classics-at.chs.harvard.edu/classics18-introduction/
Gabay, S., Camps, J.-B., Pinche, A., & Carboni, N. (2021). SegmOnto, A Controlled Vocabulary to Describe the Layout of Pages (Version 0.9) [Jeu de données]. https://segmonto.github.io/
Gabay, S., Pinche, A., Christensen, K., & Camps, J.-B. (2023). SegmOnto : A Controlled Vocabulary to Describe and Process Digital Facsimiles. https://hal.science/hal-04343404
Gatos, B., Louloudis, G., Stamatopoulos, N., Retsinas, G., Sfikas, G., Giotis, A., Simistira Liwicki, F., Papavassiliou, V., & Katsouros, V. (2020). OldDocPro : Old Greek Document Recognition. In A. Fischer, M. Liwicki, & R. Ingold (Éds.), Handwritten Historical Document Analysis, Recognition, and Retrieval—State of the Art and Future Trends (Vol. 89, p. 157‑174). World Scientific. https://doi.org/10.1142/11353
Gautier, D., Huguet, A., Massot, M.-L., Tricoche, A., Carlin, M., Moreux, J.-P., & Aurélia, R. (2022). Compte-rendu de la journée d’étude “ Point HTR 2022 ”Transkribus / eScriptorium : Transcrire, annoter et éditernumériquement des documents d’archives [Research Report]. CAPHES - UMS 3610 CNRS/ENS ; AOROC. https://hal.science/hal-03692413
Guénette, M., Verstraete, M., Chagué, A., & Vitali-Rosati, M. (2024a). HTR Model Palatinus graecus 23 (Meleagre-NFC) [Jeu de données]. Zenodo. https://doi.org/10.5281/zenodo.10932711
Guénette, M., Verstraete, M., Chagué, A., & Vitali-Rosati, M. (2024b). HTR Model Palatinus graecus 23 (Meleagre-NFD) [Jeu de données]. Zenodo. https://doi.org/10.5281/zenodo.10932742
Guénette, M., Verstraete, M., Chagué, A., & Vitali-Rosati, M. (2024c). HTR Model Palatinus graecus 23 (Meleagre-NFD-finetuned) [Jeu de données]. Zenodo. https://doi.org/10.5281/zenodo.10932751
Hassner, T., Sablatnig, R., Stutzmann, D., & Tarte, S. (2014). Digital Palaeography : New Machines and Old Texts. Dagstuhl Reports, 4(7), 112‑134. https://doi.org/10.4230/DagRep.4.7.112
Hodel, T., Schoch, D., Schneider, C., & Purcell, J. (2021). General Models for Handwritten Text Recognition : Feasibility and State-of-the Art. German Kurrent as an Example. Journal of Open Humanities Data, 7(13), Article 0. https://doi.org/10.5334/johd.46
Kaddas, P., Gatos, B., Palaiologos, K., Christopoulou, K., & Kritsis, K. (2023). Text Line Detection and Recognition of Greek Polytonic Documents. In M. Coustaty & A. Fornés (Éds.), Document Analysis and Recognition – ICDAR 2023 Workshops (p. 213‑225). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-41501-2_15
Kaddas, P., Palaiologos, K., Gatos, B., Katsouros, V., & Christopoulou, K. (2023). A System for Processing and Recognition of Greek Byzantine and Post-Byzantine Documents. In G. A. Fink, R. Jain, K. Kise, & R. Zanibbi (Éds.), Document Analysis and Recognition—ICDAR 2023 (p. 366‑376). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-41685-9_23
Kahle, P., Colutto, S., Hackl, G., & Mühlberger, G. (2017). Transkribus—A Service Platform for Transcription, Recognition and Retrieval of Historical Documents. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 4, 19‑24. https://doi.org/10.1109/ICDAR.2017.307
Katsouros, V., Papavassiliou, V., Simistira, F., & Gatos, B. (2016). Recognition of Greek Polytonic on Historical Degraded Texts Using HMMs. 2016 12th IAPR Workshop on Document Analysis Systems (DAS), 346‑351. https://doi.org/10.1109/DAS.2016.60
Kiessling, B. (2019). Kraken—An Universal Text Recognizer for the Humanities. ADHO 2019. https://dh-abstracts.library.cmu.edu/works/9912
Kiessling, B. (2021). Advances in Optical Character Recognition for Historical Arabic Documents [Thèse de doctorat]. Paris sciences et lettres.
Kiessling, B., Tissot, R., Stokes, P., & Stökl Ben Ezra, D. (2019). eScriptorium : An Open Source Platform for Historical Document Analysis. 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), 19‑24. https://doi.org/10.1109/ICDARW.2019.10032
Kindt, B., & Vidal-Gorène, C. (2022). From Manuscript to Tagged Corpora.  An Automated Process for Ancient Armenian or Other Under-Resourced Languages of the Christian East. Armeniaca, 1.
Kindt, B., Vidal-Gorène, C., & Delle Donne, S. (2022). Analyse automatique du grec ancien par réseau de neurones. Évaluation sur le corpus De Thessalonica Capta. Bulletin de l’Académie Belge pour l’Étude des Langues Anciennes et Orientales, 1011, 537‑562. https://doi.org/10.14428/babelao.vol1011.2022.65073
Koidaki, F. (s. d.). 19th century Greek 8.0 (Version (Version 148525)) [Jeu de données]. Consulté 23 avril 2025, à l’adresse https://www.transkribus.org/model/19th-century-greek-8.0
Li, M., Lv, T., Chen, J., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., & Wei, F. (2022). TrOCR : Transformer-based Optical Character Recognition with Pre-trained Models (No. arXiv:2109.10282). arXiv. https://doi.org/10.48550/arXiv.2109.10282
Lombardi, F., & Marinai, S. (2020). Deep Learning for Historical Document Analysis and Recognition—A Survey. Journal of Imaging, 6(10), Article 10. https://doi.org/10.3390/jimaging6100110
Maniaci, M., Atanasiu, V., Ceccherini, I., Falmagne, T., Guerreau, A., Gurrado, M., Scotto Di Freca, A., Stokes, P. A., Stutzmann, D., Vogeler, G., & Webber, T. (2011). Applications actuelles de l’informatique à la paléographie. Quelles méthodes pour quelles finalités ? Gazette du Livre Médiéval, 56‑57, 119-130.
Markou, K., Tsochatzidis, L., Zagoris, K., Papazoglou, A., Karagiannis, X., Symeonidis, S., & Pratikakis, I. (2021). A Convolutional Recurrent Neural Network for the Handwritten Text Recognition of Historical Greek Manuscripts. In A. Del Bimbo, R. Cucchiara, S. Sclaroff, G. M. Farinella, T. Mei, M. Bertini, H. J. Escalante, & R. Vezzani (Éds.), Pattern Recognition. ICPR International Workshops and Challenges (p. 249‑262). Springer International Publishing. https://doi.org/10.1007/978-3-030-68787-8_18
Marthot-Santaniello, I., & Serbaeva, O. (2024). Digital Palaeography of Iliad Papyri, D-scribes Project and the Research Environment for Ancient Documents (READ) Platform. In N. Reggiani (Éd.), Digital Papyrology III: The Digital Critical Edition of Greek Papyri : Issues, Projects, and Perspectives (p. 327‑346). De Gruyter. https://www.degruyterbrill.com/document/doi/10.1515/9783111070162-019/html
Marthot-Santaniello, I., Vu, M. T., Serbaeva, O., & Beurton-Aimar, M. (2023). Stylistic Similarities in Greek Papyri Based on Letter Shapes : A Deep Learning Approach. In M. Coustaty & A. Fornés (Éds.), Document Analysis and Recognition – ICDAR 2023 Workshops (p. 307‑323). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-41498-5_22
Michael, J., Weidemann, M., & Labahn, R. (s. d.). HTR Engine Based on NNs P3. Optimizing speed and performance—HTR+ (READ H2020 Project 674943 No. D7.9).
Muehlberger, G., Seaward, L., Terras, M., Ares Oliveira, S., Bosch, V., Bryan, M., Colutto, S., Déjean, H., Diem, M., Fiel, S., Gatos, B., Greinoecker, A., Grüning, T., Hackl, G., Haukkovaara, V., Heyer, G., Hirvonen, L., Hodel, T., Jokinen, M., … Zagoris, K. (2019). Transforming scholarship in the archives through handwritten text recognition. Journal of Documentation, 75(5), 954‑976. https://doi.org/10.1108/JD-07-2018-0114
Muehlberger, G., Seaward, L., Terras, M., Oliveira, S. A., Bosch, V., Bryan, M., Colutto, S., Déjean, H., Diem, M., Fiel, S., Gatos, B., Greinoecker, A., Grüning, T., Hackl, G., Haukkovaara, V., Heyer, G., Hirvonen, L., Hodel, T., Jokinen, M., … Zagoris, K. (2019). Transforming scholarship in the archives through handwritten text recognition : Transkribus as a case study. Journal of Documentation, 75(5), 954‑976. https://doi.org/10.1108/JD-07-2018-0114
Nockels, J., Gooding, P., Ames, S., & Terras, M. (2022). Understanding the application of handwritten text recognition technology in heritage contexts : A systematic review of Transkribus in published research. Archival Science, 22(3), 367‑392. https://doi.org/10.1007/s10502-022-09397-0
Nünlist, R. (2020). Nicanor’s System of Punctuation. Greek, Roman, and Byzantine Studies, 60(1), Article 1.
Pang, B., Nijkamp, E., & Wu, Y. N. (2020). Deep Learning with TensorFlow : A Review. Journal of Educational and Behavioral Statistics, 45(2), 227‑248. https://doi.org/10.3102/1076998619872761
Patel, C., Patel, A., & Patel, D. (2012). Optical Character Recognition by Open Source OCR Tool Tesseract : A Case Study. International Journal of Computer Applications, 55(10), 50‑56. https://doi.org/10.5120/8794-2784
Pavlopoulos, J., Kougia, V., Arias, E. G., Platanou, P., Shabalin, S., Liagkou, K., Papadatos, E., Essler, H., Camps, J.-B., & Fischer, F. (2024). Challenging Error Correction in Recognised Byzantine Greek. https://doi.org/10.21203/rs.3.rs-2921088/v3
Pavlopoulos, J., Kougia, V., Platanou, P., & Essler, H. (2023). Detecting Erroneously Recognized Handwritten Byzantine Text. Findings of the Association for Computational Linguistics: EMNLP 2023, 7818‑7828. https://doi.org/10.18653/v1/2023.findings-emnlp.524
Pavlopoulos, J., Kougia, V., Platanou, P., Shabalin, S., Liagkou, K., Papadatos, E., Essler, H., Camps, J.-B., & Fischer, F. (2023, mai 15). Error Correcting HTR’ed Byzantine Text. https://doi.org/10.21203/rs.3.rs-2921088/v1
Perdiki, E. (s. d.). Chrysostomicus I (Version (Version 44872)) [Jeu de données]. Consulté 23 avril 2025, à l’adresse https://www.transkribus.org/model/chrysostomicus-i
Perdiki, E. (2022). Transkribus : Reviewing HTR training on (Greek) manuscripts – RIDE. RIDE: A Review Journal for Digital Editions and Resources, 15 (3rd issue on Tools for Digital Scholarly Editions). https://doi.org/10.18716/ride.a.15.6
Perdiki, E. (2023a). List of manuscripts containing John Chrysostom’s Homilies and the relevant manual transcriptions (Version 1.2) [Jeu de données]. Zenodo. https://doi.org/10.5281/zenodo.8102662
Perdiki, E. (2023b). Preparing Big Manuscript Data for Hierarchical Clustering with Minimal HTR Training. Journal of Data Mining & Digital Humanities, Documents historiques et reconnaissance automatique de textes, 10419. https://doi.org/10.46298/jdmdh.10419
Perdiki, E., & Konstantinidou, M. (2021). Handling Big Manuscript Data – Classics@ Journal. Classics@ Journal, 18(1). https://classics-at.chs.harvard.edu/classics18-perdiki-and-konstantinidou/
Pinche, A. (2022a). Des images au texte : Comment apprendre à des ordinateurs à lire des manuscrits médiévaux ?
Pinche, A. (2022b). Guide de transcription pour les manuscrits du Xe au XVe siècle.
Pinche, A. (2022c). HTR model Cremma Medieval. https://doi.org/10.5281/zenodo.6669508
Pinche, A. (2023). Generic HTR Models for Medieval Manuscripts. The CREMMALab Project. Journal of Data Mining & Digital Humanities, Documents historiques et reconnaissance automatique de textes, 10252. https://doi.org/10.46298/jdmdh.10252
Pinche, A., Clérice, T., Chagué, A., Camps, J.-B., Vlachou-Efstathiou, M., Levenson, M. G., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., Haverals, W., Kestemont, M., Vandyck, C., & O’Connor, P. (2024). CATMuS-Medieval : Consistent Approaches to Transcribing ManuScripts. DH2024. https://inria.hal.science/hal-04346939
Pinche, A., & Stokes, P. (2024). Historical Documents and Automatic Text Recognition : Introduction. Journal of Data Mining & Digital Humanities, Documents historiques et reconnaissance automatique de textes, 13247. https://doi.org/10.46298/jdmdh.13247
Platanou, P., Pavlopoulos, J., & Papaioannou, G. (2022). Handwritten Paleographic Greek Text Recognition : A Century-Based Approach. Proceedings of the Thirteenth Language Resources and Evaluation Conference, 6585‑6589.
Polomac, V. (2022). Serbian Early Printed Books from Venice : Creating Models for Automatic Text Recognition using Transkribus. Scripta & E-Scripta, 22, 11‑29.
READ-COOP. (2023, août 29). Introducing the new Transkribus web app. https://blog.transkribus.org/en/introducing-the-new-transkribus-web-app
Robertson, B. (2019). Optical Character Recognition for Classical Philology. In M. Berti (Éd.), Digital Classical Philology : Ancient Greek and Latin in the Digital Revolution (p. 117‑136). De Gruyter. https://doi.org/10.1515/9783110599572-008
Robertson, B., & Boschetti, F. (2017). Large-Scale Optical Character Recognition of Ancient Greek. Mouseion, 14(3), 341‑359. https://doi.org/10.3138/mous.14.3-3
Romanello, M., Najem-Meyer, S., & Robertson, B. (2021). Optical Character Recognition of 19th Century Classical Commentaries : The Current State of Affairs. The 6th International Workshop on Historical Document Imaging and Processing, 1‑6. https://doi.org/10.1145/3476887.3476911
Romein, C. A., Hodel, T., Gordijn, F., Zundert, J. J. V., Chagué, A., Lange, M. V., Jensen, H. S., Stauder, A., Purcell, J., Terras, M. M., Heuvel, P. V. D., Keijzer, C., Rabus, A., Sitaram, C., Bhatia, A., Depuydt, K., Afolabi-Adeolu, M. A., Anikina, A., Bastianello, E., … Zweistra, R. (2024). Exploring Data Provenance in Handwritten Text Recognition Infrastructure : Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done. Journal of Data Mining & Digital Humanities, Documents historiques et reconnaissance automatique de textes, 10403. https://doi.org/10.46298/jdmdh.10403
Sánchez, J. A., Romero, V., Toselli, A. H., Villegas, M., & Vidal, E. (2019). A set of benchmarks for Handwritten Text Recognition on historical documents. Pattern Recognition, 94, 122‑134. https://doi.org/10.1016/j.patcog.2019.05.025
Schoen, J., & Saretto, G. E. (2022). Optical Character Recognition (OCR) and Medieval Manuscripts : Reconsidering Transcriptions in the Digital Age. Digital Philology: A Journal of Medieval Cultures, 11(1), 174‑206.
Schön, C. (2023). HTR in the making : En studie av hur Handwritten Text Recognition görs vid tre svenska arkivverksamheter. http://lup.lub.lu.se/student-papers/record/9117098
Schulthess, S. (2018). A Trilingual Manuscript of the New Testament in Digital Research. https://humarec.org/webbook/book/index.html
Seuret, M., Marthot-Santaniello, I., White, S. A., Serbaeva Saraogi, O., Agolli, S., Carrière, G., Rodriguez-Salas, D., & Christlein, V. (2023). ICDAR 2023 Competition on Detection and Recognition of Greek Letters on Papyri. In G. A. Fink, R. Jain, K. Kise, & R. Zanibbi (Éds.), Document Analysis and Recognition—ICDAR 2023 (Vol. 14188, p. 498‑507). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-41679-8_29
Singh, P., & Manure, A. (2020). Introduction to TensorFlow 2.0. In Learn TensorFlow 2.0 : Implement machine learning and deep learning models with python (p. 1‑24). Apress. https://doi.org/10.1007/978-1-4842-5558-2_1
Snydman, S., Sanderson, R., & Cramer, T. (2015). The International Image Interoperability Framework (IIIF) : A community & technology approach for web-based images. Archiving Conference, 2015. https://doi.org/10.2352/issn.2168-3204.2015.12.1.art00005
Sommerschield, T., Assael, Y., Pavlopoulos, J., Stefanak, V., Senior, A., Dyer, C., Bodel, J., Prag, J., Androutsopoulos, I., & Freitas, N. de. (2023). Machine Learning for Ancient Languages : A Survey. Computational Linguistics, 1‑45. https://doi.org/10.1162/coli_a_00481
Stokes, P. A., Kiessling, B., Stökl Ben Ezra, D., Tissot, R., & Gargel, E. H. (2021). The eScriptorium VRE for Manuscript Cultures. Classics@ Journal: Ancient Manuscripts and Virtual Research Environments, 18. https://classics-at.chs.harvard.edu/classics18-stokes-kiessling-stokl-ben-ezra-tissot-gargem/
Ströbel, P. B., & Clematide, S. (2019). Improving OCR of Black Letter in Historical Newspapers : The Unreasonable Effectiveness of HTR Models on Low-Resolution Images. Ströbel, Phillip Benjamin; Clematide, Simon  (2019). Improving OCR of Black Letter in Historical Newspapers: The Unreasonable Effectiveness of HTR Models on Low-Resolution Images.  Utrecht: Digital Humanities 2019. https://doi.org/10.5167/uzh-177164
Ströbel, P. B., Clematide, S., & Volk, M. (2020). How Much Data Do You Need? About the Creation of a Ground Truth for Black Letter and the Effectiveness of Neural OCR. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Éds.), Proceedings of the Twelfth Language Resources and Evaluation Conference (p. 3551‑3559). European Language Resources Association. https://aclanthology.org/2020.lrec-1.436
Ströbel, P. B., Clematide, S., Volk, M., & Hodel, T. (2022). Transformer-based HTR for Historical Documents (No. arXiv:2203.11008). arXiv. https://doi.org/10.48550/arXiv.2203.11008
Ströbel, P. B., Volk, M., Clematide, S., Schwitter, R., Hodel, T., & Schoch, D. (2022). Evaluation of HTR models without Ground Truth Material. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Éds.), Proceedings of the Thirteenth Language Resources and Evaluation Conference (p. 4395‑4404). European Language Resources Association. https://aclanthology.org/2022.lrec-1.467/
Stutzmann, D. (2018a). Intelligence artificielle et humanités numériques : Déchiffrement des écritures médiévales, analyse des écritures et étude des sources en SHS. La Lettre de l’InSHS, n° 54, 3‑5.
Stutzmann, D. (2018b). La lecture des manuscrits médiévaux grâce à l’intelligence artificielle : Une grande première de l’IRHT. Les Amis de l’IRHT. Bulletin de l’association. https://shs.hal.science/halshs-01927958
Tarride, S., Schneider, Y., Generali-Lince, M., Boillet, M., Abadie, B., & Kermorvant, C. (2024). Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library (No. arXiv:2404.18722). arXiv. https://doi.org/10.48550/arXiv.2404.18722
Tauber, J. (s. d.). Python, Unicode and Ancient Greek. Consulté 11 septembre 2023, à l’adresse https://jktauber.com/articles/python-unicode-ancient-greek/
Terras, M., Anzinger, B., Stauder, A., Stauder, F., Gooding, P., Mühlberger, G., Nockels, J., & Romein, A. C. (2025). The artificial intelligence cooperative : READ-COOP, Transkribus, and the benefits of shared community infrastructure for automated text recognition. Open Research Europe, 5(16). https://doi.org/10.12688/openreseurope.18747.1
Thompson, W. P. (2021). Using Handwritten Text Recognition (HTR) Tools to Transcribe Historical Multilingual Lexica. Scripta & E-Scripta, 21, 217‑231.
Transkribus (Réalisateur). (2022, octobre 10). One to Rule Them All or How to Transcribe Greek Manuscripts with a HTR Model | Elpida Perdiki #TUC22 [Enregistrement vidéo]. https://www.youtube.com/watch?v=8DNhg3YXTgI
Tsochatzidis, L., Symeonidis, S., Papazoglou, A., & Pratikakis, I. (2021). HTR for Greek Historical Handwritten Documents. Journal of Imaging, 7(12), 260. https://doi.org/10.3390/jimaging7120260
Vidal-Gorène, C. (2023). La reconnaissance automatique d’écriture à l’épreuve des langues peu dotées. Programming Historian en français. https://doi.org/10.46430/phfr0023
Vidal-Gorène, C., Dupin, B., Decours-Perez, A., & Riccioli, T. (2021). A Modular and Automated Annotation Platform for Handwritings : Evaluation on Under-Resourced Languages. In J. Lladós, D. Lopresti, & S. Uchida (Éds.), Document Analysis and Recognition – ICDAR 2021 (p. 507‑522). Springer International Publishing. https://doi.org/10.1007/978-3-030-86334-0_33
Vidal-Gorène, C., Lucas, N., Salah, C., Decours-Perez, A., & Dupin, B. (2021). RASAM – A Dataset for the Recognition and Analysis of Scripts in Arabic Maghrebi. In E. H. Barney Smith & U. Pal (Éds.), Document Analysis and Recognition – ICDAR 2021 Workshops (p. 265‑281). Springer International Publishing. https://doi.org/10.1007/978-3-030-86198-8_19
White, N. (2012). Training Tesseract for Ancient Greek OCR. Εὔτυπον, 28‑29, 1‑11.
Wilken, J. (s. d.-a). Greek Ancient Majuscule (spaced) (Version (Version 45013)) [Jeu de données]. Consulté 23 avril 2025, à l’adresse https://www.transkribus.org/model/greek-ancient-majuscule-spaced
Wilken, J. (s. d.-b). Greek Medieval and Modern Minuscule (Version (Version 45032)) [Jeu de données]. Consulté 23 avril 2025, à l’adresse https://www.transkribus.org/model/greek-ancient-majuscule-spaced
Wion, A., & Vidal-Gorène, C. (2023). Préparer un projet incluant l’extraction des textes en graphies non-latines par transcription et HTR, Quelques conseils pratiques. https://shs.hal.science/halshs-04161903
μDOC.tS: A platform for the transcription of historical handwritten documents. (s. d.). Consulté 23 avril 2025, à l’adresse https://mdoc-ts.ee.duth.gr/