Changes coming to Books by Mail: Starting in fall 2025, Books by Mail items will only be shipped to addresses in the contiguous United States 50 miles or further from campus. We will make exceptions for students, faculty, and staff with disabilities. Learn more

Fisher restoration: Fisher Fine Arts Library is open and operating normally during ongoing exterior work. Find more details about this important restoration project.

Much of the world’s cultural heritage is handwritten and legible by humans but, until recently, not by machines. We are working to build a sustainable, scalable, and diverse community of practice in Handwritten Text Recognition (HTR)  by bringing together interested people from across Penn and the world. HTR uses AI to transcribe handwritten texts and can make corpora of manuscript materials searchable, help us discover unique texts, and turn handwritten texts into data for linguistic, historical, and quantitative analyses.

 

Vision

Machine learning-driven HTR and image recognition: The RDDS and SIMS teams will advance conversations and projects that use Handwritten Text Recognition (HTR) with a particular focus on less-represented languages/scripts and on leveraging Penn's collections both on campus in the broader manuscript studies community. We will publish our models if we see an opportunity for others to reuse our work and will offer opportunities for scholars to learn and use these tools with their own corpora, thus  empowering cutting-edge research in line with the Penn Libraries Strategic Priorities.

Projects

Using HTR models, two SIMS graduate fellows will be trained to produce transcriptions of texts in the eScriptorium platform from manuscripts housed in the Kislak Center for Special Collections, Rare Books and Manuscripts. These transcriptions will appear in the Global Medieval & Early Modern Digital Library. The Manuscript Collections As Data Research Group will support this work through the development of technical infrastructure and workflows, with an aim to develop a reproducible approach for students to use HTR. 

We are working and consulting with Linguistic Data Consortium for multilingual HTR process workflows, shared student training material, and scalable model pipelines that empower global cultural heritage research communities to transcribe and analyze handwritten texts across diverse languages and scripts.

man pointing to a projection with two windows. The window at the back is the image of a Devanagiri script manuscript. The window in front is a black with green machine readable text of the Devanagari text.
Dr. Andrew Ollett at the SASDHW workshop showcasing Sanskrit HTR using Google Cloud Vision

In October 2024, the Penn Libraries and South Asia Studies presented a space for technology orientation, where participants will shared a nuanced and informed understanding of the possibilities and limitations of critical digital humanities tools, particularly Computational Text Analysis (CTA) of content found in manuscripts, inscriptions, maps, and other historical documents.

Workshop page Watch video recording Day 1 

Watch video recording Day 2 Blog Post

 

Our Resources

Maps and More

Campus Libraries Map

Staff Information

Resources for Staff Committees