The main entrance to the Van Pelt-Dietrich Library Center is now open. Van Pelt Library and the Fisher Fine Arts Library are currently open to Penn Card holders, Penn affiliates, and certain visitors. See our Service Alerts for details.

Some Background Information

Over the past few weeks, I've been immersed in developing and refining a groundbreaking project: creating a linked data model to capture Penn's rich and diverse Jewish music collections. I call this project Shira (Hebrew for song or poem). This initiative will connect musical compositions, recordings, sheet music, and more, allowing users to explore and access music-related assets across our collections and beyond.

Wikibase, OpenRefine, and the need to reconcile

Seeing how my colleagues employed wikibase.cloud for their Digital Scriptorium project, I realized that this type of data model could work really well for this project (learn more about Wikibase here). After I created our own wikibase.cloud instance, I soon realized that I needed to find a more efficient way to import data than entering it manually. Beyond just the desire to efficiently enter data, I also saw that I would need a tool to help me reconcile my data. I didn't want duplicate pages of the same singer or song because there were discrepancies in the spelling or in the language (Haim Effendi is the same person as חיים אפנדי). 

To solve my problem, I used OpenRefine to reconcile, edit, add new items to my wikibase.cloud instance. While there is extensive documentation on how to connect  Wikidata (the largest wikibase instance in the world) to OpenRefine, I did not find any recently updated documentation on how to do this for one's personal wikibase instance.  Below, I offer a step-by-step tutorial on how to do so based on work found below:

- https://openrefine-wikibase.readthedocs.io/en/latest/scoring.html

- https://gitlab.com/nfdi4culture/ta1-data-enrichment/openrefine-wikibase

- https://ceur-ws.org/Vol-2773/paper-17.pdf

- https://github.com/KBNLresearch/OpenRefine-Wikibase/tree/main

Tutorial

Requirements: Python 3, Docker, OpenRefine, Wikibase Extension for OpenRefine, Wikibase instance where you have privileges

Creating the Reconciliation Service:

  1. Run the following in your terminal: 

    git clone https://github.com/judaicadh/wikibaseopenrefine 
            open wikibaseopenrefine
            cd wikibaseopenrefine
  2. You should see the contents of the folder wikibaseopenrefine in your file browser. 
  3. Open the config.py file in your favorite code editor.
  4. Replace "shira" with the name of your wikibase.cloud instance.
  5. Replace the entity identifiers with those specific to your wikibase (e.g. type_property_path = 'P39').
  6. Save your now modified config.py file and close the code editor
  7. Repeat steps 3–6 for the manifest.json file. 
  8. In your terminal, run docker-compose up
  9. You should see something like the following indicating that the container in docker successfully deployed. 

    Terminal results for Docker
  10. You should also be able to navigate to http://localhost:8000/

Reconciliation Service and Wikibase:

  1. Navigate to your https://foo.wikibase.cloud/wiki/Special:Tags
  2. Create tags for whatever versions of OpenRefine will be uploading to your wikibase. Below is an example for oprefine-3.7.

    Open Refine tag in Wikibase

Connecting OpenRefine, Wikibase, and your Reconciliation Service

  1. Open the spreadsheet your wish to reconcile with your wikibase instance in OpenRefine
  2. Select Wikibase from the extensions menu  and click "Manage Wikibase Instances"
  3. Click "Add Wikibase Instance"
  4. Open the manifest.json file found in repository (which you should have edited above)
  5. Copy the file's contents and past them into the larger text box. 
  6. Click "Add Wikibase"
  7. Your wikibase instance should appear in your wikibase extension window. (The wikibase you hope to reconcile against should be selected, if it isn't select it.)

    Shira Wikibase Window
  8. Click "Ok"
  9. Return to the Wikibase Extension Menu. 
  10. This time, click Manage Wikibase Account
  11. Enter the credentials for your wikibase (or a bot, if you want to be fancy) and login.
  12. Now, you can reconcile against your wikibase like any other reconciliation service using: http://localhost:8000/en/api

 

If you have any questions, don't hesitate to reach out. 

 

-Laura

Author

Date

May 14, 2024

Share