Ὅσοι ἄνθρωποι, τοσαῦται γνῶμαι

Harmonizing Guidelines for Handwritten Text Recognition of Ancient Greek

DH2025, Lisbon, July 14-18th, 2025

A workshop to foster a community of practices and establish guidelines for HTR dedicated to Ancient Greek.

In recent years, there have been significant developments in the field of HTR with the rise of several general public software such as eScriptorium and Transkribus. In conjunction with technological progress, there has been a growing emphasis on the development of datasets and guidelines to standardize procedures and thereby facilitate their implementation.

For Latin script manuscripts, a critical mass of data appears to have been reached. However, non-Latin scripts often remain on the margins from this progress. Currently, researchers often work on their own, following diverse practices and standards tailored to the needs of their specific projects. This situation mirrors the early days of Latin script HTR, before the development of comprehensive guidelines that slowly gained the potential to transform the field. Similarly, the establishment of a collaborative and interdisciplinary community could facilitate significant progress in non-Latin script HTR.

HTR for Ancient Greek faces several challenges, including a limited and dispersed corpus compared to Latin scripts, which hinders the development of effective models. Another issue is the lack of consistent encoding practices for Greek paleographic features. While Unicode provides basic characters for the Greek alphabet, it does not account for all the paleographic elements (such as ligatures and abbreviations) found in Ancient Greek manuscripts. This gap has led to varying encoding practices, complicating data standardization and limiting interoperability.

To improve data reuse and collaboration, a more generalized approach is needed, with harmonized practices across projects to enable data aggregation, sharing, and pre-annotation.

Goals of this Workshop

This workshop is not intended to be a technical training session on HTR, but rather a space for reflection, collaboration, and community building. It aims to bring together researchers and practitioners from a range of disciplines, including philologists, paleographers, HTR users (across various languages), and dataset creators, to address the technical and methodological challenges that have hindered the development of effective HTR for Ancient Greek.

The primary goals are:

  1. To foster the development of an interdisciplinary community of practice that combines complementary expertise to address the challenges posed by the automatic recognition of Greek documents.
  2. To develop common guidelines for the transcription and encoding of ancient Greek texts, especially for the training of automatic text recognition models.

In short

Date & Time

15th of July 2025, 09:00-12:30

Location

NOVA FCSH [tbc]

Workshop organizers

See also the Acknowledgements

Want to join?

We welcome anyone interested in (digital) paleography or in developing guidelines for Handwritten Text Recognition (HTR) applied to Ancient Greek. This workshop will be held in hybrid format: participants may attend either in person or online (see the conference website for details). Please note that in-person attendance is limited to 30 participants.

To participate, you must register for the DH2025 conference and select the workshop when registering. Please note that the Early Bird registration period ends on May 4, 2025 and that registration closes on June 2, 2025.

If you have any questions, or if you’d like to suggest a case study (10–15 minutes – see Project #1 as an example), don’t hesitate to get in touch (write to Mathilde Verstraete or Maxime Guénette with a short description of your project). We’ll do our best to accommodate proposals within the available time, though we may not be able to include all of them.

Schedule

09:00-09:30
Introduction. Preliminary Thoughts on Guidelines
by Mathilde Verstraete, Maxime Guénette, Malamatenia Vlachou-Efstathiou & Marianne Reboul
Summary
Workshop goals and structure;
Challenges & Opportunities;
Importance of Guidelines;
Presentation of the participants.
15:00-16:00
Creating Guidelines.
by All
Summary
Identification of key issues from case studies;
Comparison of approaches and strategies;
Drafting of common principles or recommendations;
Suggestions for future collaboration and documentation;
Conclusion.

Meet the Team

France, Toulouse. Bibliothèque de Toulouse, Ms 140 f. 009

Mathilde Verstraete

University of Montreal
(PhD Student in Digital Humanities)

More info
Mathilde Verstraete is a PhD student in digital humanities at the University of Montreal. After obtaining a master’s degree in classical languages and literature at the Catholic University of Louvain (Belgium), she joined the Canada Research Chair on Digital Textualities to coordinate the collaborative digital edition of the Greek Anthology. Under the supervision of Marcello Vitali-Rosati and Elsa Bouchard, her research focuses on digital critical editions and the tools that produce them. She worked at the production of HTR models for the cod. pal. gr. 23 (see Meleagre-NFC, Meleagre-NFD, Meleagre-NFD-finetuned).
Homme (France, Rouen. Bibliothèque municipale, Ms. 489 f. 072)

Maxime Guénette

University of Montreal
(PhD Student in History)

More info
Maxime Guénette is a PhD student in History at the Université de Montréal. His research focuses on religions in the Roman Empire through Linked Open Data and GIS. Since 2020, he is a collaborator in the collaborative digital edition of the Greek Anthology. During his internship at the Canada Research Chair on Digital Textualities in Summer 2023, he also worked on the application of HTR to the cod. pal. gr. 23 using the eScriptorium platform (see Meleagre-NFC, Meleagre-NFD, Meleagre-NFD-finetuned).
France, Arras. Médiathèque municipale, Ms. 696 f. 131v

Malamatenia (Matenia) Vlachou

IRHT-CNRS & IMAGINE (ENPC) labs
(PhD Student in Digital Palaeography)

More info
Malamatenia is a PhD candidate in Computer Vision for Latin Palaeography at IRHT/CNRS and ENPC, Paris, under the supervision of Dominique Stutzmann and Mathieu Aubry. Her research explores interpretable Deep Learning for handwriting characterization and analysis. She is part of the CATMuS project, which develops consistent transcription norms for Latin script documents for automatic transcription (HTR) models. She holds an MA in Classics and Digital Humanities and is particularly interested in Late Antique Greek and Latin grammatical theory and bilingual glossed grammatical manuscripts.
France, Marseille. Bibliothèque municipale, Ms. 111 f. 137v

Marianne Reboul

ENS Lyon
(Associate professor in Digital Humanities)

More info
Marianne Reboul is an Associate Professor at the École normale supérieure de Lyon, where she leads research groups on digital humanities applied to ancient languages, specilizing in Digital classics. Her work focuses on the application of artificial intelligence techniques to ancient languages, with particular expertise in cross-lingual alignment and handwritten text recognition (HTR) for Ancient Greek.
Homme (France, Rouen. Bibliothèque municipale, Ms. 489 f. 072)

Marcello Vitali-Rosati

University of Montreal & University of Rouen
(Full Professor in French Literature and Digital Humanities)

More info
Marcello Vitali-Rosati is a professor in the Department of French Literature at the Université de Montréal and holds the Chair of excellence in digital publishing at the University of Rouen. He is developing a philosophical reflection on what the world is becoming in the age of digital technologies. Through the study and practice of code, he analyses the way in which algorithms, formats, software and platforms redefine notions of the human, identity, knowledge and literature. He is at the head of several digital humanities projects, particularly in the field of scholarly publishing: platforms for publishing enriched journals and monographs, the Stylo text editor and a platform for collaborative editing of the Greek Anthology.