• Workshop

South Asia Studies Digital Humanities Workshop

The Penn Libraries and South Asia Studies present a space for technology orientation, where participants will strive for a nuanced and informed understanding of the possibilities and limitations of critical digital humanities tools, particularly Computational Text Analysis (CTA) of content found in manuscripts, inscriptions, maps, and other historical documents.

calendar_month
October 10-11, 2024
location_on
Orrery Pavilion, Van Pelt-Dietrich Library Center, 6th floor
group
Open to the Public
Four manuscripts in various languages

The discussions in these sessions aim to bring together South Asia scholars, digital humanities specialists, data librarians, subject specialists, research software & programming engineers, and manuscript studies curators to engage in conversation about the field of collections as data at large.

Andrew Ollett, Associate Professor in the Department of South Asian Languages and Civilizations at the University of Chicago, will deliver the keynote address "Texts as Data: Tools and Perspectives for South Asianists."

The workshop will be held from 9:30am - 5:00pm on October 10 and from 9:30am - 4:00pm on October 11. The talks are open to the public, may be presented in a hybrid format, and will be recorded for sharing at a later date. See below for complete program schedule.

Apply to Participate in Hands-On Sessions

South Asia Studies scholars are invited to apply to participate in hands-on workshop sessions held during this event. These sessions will introduce scholars of South Asia to digital tools for Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) for the practical purpose of searching through texts for specific keywords. OCR and HTR make it possible to transform archival sources into searchable texts. Participants in the hands-on sessions must be prepared to attend both days of the workshop as well as a prior software installation session.

Organizers

  • Kashi Gomez, Sanskrit Lecturer, Department of South Asia Studies, University of Pennsylvania
  • Jajwalya Karajgikar, Applied Data Science Librarian, Research Data and Digital Scholarship, University of Pennsylvania Libraries

Session Facilitators

  • Eug Xu, Data Science & Society Research Assistant, Research Data and Digital Scholarship, University of Pennsylvania Libraries
  • Dot Porter, Schoenberg Institute for Manuscript Studies Curator of Digital Humanities, University of Pennsylvania Libraries
  • Andy Janco, Research Software Engineer, Research Data and Digital Scholarship, University of Pennsylvania Libraries
  • Doug Emery, Special Collections Digital Content Programmer, Cultural Heritage Computing, University of Pennsylvania Libraries
  • Jessie Dummer, Digitization Project Coordinator, Cultural Heritage Computing, University of Pennsylvania Libraries

Sponsors

This workshop is sponsored by the Department of South Asia Studies, Research Data and Digital Scholarship's AI Literacy Interest Group, the Schoenberg Institute for Manuscript Studies, The Penn South Asia Center, the Price Lab for Digital Humanities, and the Wolf Humanities Center.

Learning Objectives

  • Prepare a Multilingual text for OCR and HTR
  • Use Google Cloud Vision and Python to perform text recognition and extract the data as a searchable text file
  • Use Python to transliterate South Asian languages into Roman script
  • Perform a simple text search with grep, Google Pinpoint, and visualize text data with Voyant Tools
  • Conceptualize projects and research questions that utilize computational text mining and analysis

Schedule

Thursday, October 10, 2024

Friday, October 11, 2024

Image Credits (clockwise from top left): Kislak Special Collections 1. Rag Darshan, 1799; 2. Midrash Lamentations, 1850; 3. Sinhalese protective charm yantras, 1700; 4. Pravacanasāroddhārasūtra, 1652.