Harmonizing Guidelines for Handwritten Text Recognition of Ancient Greek
DH2025, Lisbon, July 14-18th, 2025
In recent years, there have been significant developments in the field of HTR with the rise of several general public software such as eScriptorium and Transkribus. In conjunction with technological progress, there has been a growing emphasis on the development of datasets and guidelines to standardize procedures and thereby facilitate their implementation.
For Latin script manuscripts, a critical mass of data appears to have been reached. However, non-Latin scripts often remain on the margins from this progress. Currently, researchers often work on their own, following diverse practices and standards tailored to the needs of their specific projects. This situation mirrors the early days of Latin script HTR, before the development of comprehensive guidelines that slowly gained the potential to transform the field. Similarly, the establishment of a collaborative and interdisciplinary community could facilitate significant progress in non-Latin script HTR.
HTR for Ancient Greek faces several challenges, including a limited and dispersed corpus compared to Latin scripts, which hinders the development of effective models. Another issue is the lack of consistent encoding practices for Greek paleographic features. While Unicode provides basic characters for the Greek alphabet, it does not account for all the paleographic elements (such as ligatures and abbreviations) found in Ancient Greek manuscripts. This gap has led to varying encoding practices, complicating data standardization and limiting interoperability.
To improve data reuse and collaboration, a more generalized approach is needed, with harmonized practices across projects to enable data aggregation, sharing, and pre-annotation.
This workshop is not intended to be a technical training session on HTR, but rather a space for reflection, collaboration, and community building. It aims to bring together researchers and practitioners from a range of disciplines, including philologists, paleographers, HTR users (across various languages), and dataset creators, to address the technical and methodological challenges that have hindered the development of effective HTR for Ancient Greek.
The primary goals are:
15th of July 2025, 09:00-12:30
NOVA FCSH [tbc]
See also the Acknowledgements
We welcome anyone interested in (digital) paleography or in developing guidelines for Handwritten Text Recognition (HTR) applied to Ancient Greek. This workshop will be held in hybrid format: participants may attend either in person or online (see the conference website for details). Please note that in-person attendance is limited to 30 participants.
To participate, you must register for the DH2025 conference and select the workshop when registering. Please note that the Early Bird registration period ends on May 4, 2025 and that registration closes on June 2, 2025.
If you have any questions, or if you’d like to suggest a case study (10–15 minutes – see Project #1 as an example), don’t hesitate to get in touch (write to Mathilde Verstraete or Maxime Guénette with a short description of your project). We’ll do our best to accommodate proposals within the available time, though we may not be able to include all of them.
University of Montreal
(PhD Student in Digital Humanities)
University of Montreal
(PhD Student in History)
IRHT-CNRS & IMAGINE (ENPC) labs
(PhD Student in Digital Palaeography)
ENS Lyon
(Associate professor in Digital Humanities)
University of Montreal & University of Rouen
(Full Professor in French Literature and Digital Humanities)