• Lecture

How to Transcribe a Million Manuscripts with eScriptorium

Peter Stokes, École Pratique des Hautes Études – Université PSL (Paris)

 

This event has already occurred

calendar_month
Friday, December 1, 2023, 12:00 - 1:30 pm EST
location_on
Virtual
group
Open to the Public

Hosted by: Kislak Center

Detail of manuscript leaf showing transcription markup.

Recent advances in machine learning combined with the availability of millions of images of manuscript pages means that we are now able to produce automatic transcriptions of medieval and other manuscripts, with over 99% accuracy in the right circumstances. This is extremely promising and opens up many new possibilities, but – as with any new approach – it naturally raises challenges and questions as well. Perhaps the first question is how we can best make use of this opportunity, in other words, how to read a million manuscripts. At the same time, machine learning and other "big data" approaches also raise questions about representation, since by definition they only work for scripts and languages that are already available in large quantities, whereas rare or historical languages that have fewer resources become all the more ignored. This talk will address these questions in the context of kraken and eScriptorium, a pair of tools for automatic transcription of handwritten and printed documents especially for rare and historical scripts, led by the Digital Humanities team in the lab "Archéologie et Philologie d'Orient et d'Occident" at the École Pratique des Hautes Études – Université PSL, in Paris.

Event Series