Reconciling Shira: Wikibase Cloud and OpenRefine
A step-by-step tutorial on how to build a reconciliation service for your wikibase.cloud instance to use in OpenRefine.
Some Background Information
Over the past few weeks, I've been immersed in developing and refining a groundbreaking project: creating a linked data model to capture Penn's rich and diverse Jewish music collections. I call this project Shira (Hebrew for song or poem). This initiative will connect musical compositions, recordings, sheet music, and more, allowing users to explore and access music-related assets across our collections and beyond.
Wikibase, OpenRefine, and the need to reconcile
Seeing how my colleagues employed wikibase.cloud for their Digital Scriptorium project, I realized that this type of data model could work really well for this project (learn more about Wikibase here). After I created our own wikibase.cloud instance, I soon realized that I needed to find a more efficient way to import data than entering it manually. Beyond just the desire to efficiently enter data, I also saw that I would need a tool to help me reconcile my data. I didn't want duplicate pages of the same singer or song because there were discrepancies in the spelling or in the language (Haim Effendi is the same person as חיים אפנדי).
To solve my problem, I used OpenRefine to reconcile, edit, add new items to my wikibase.cloud instance. While there is extensive documentation on how to connect Wikidata (the largest wikibase instance in the world) to OpenRefine, I did not find any recently updated documentation on how to do this for one's personal wikibase instance. Below, I offer a step-by-step tutorial on how to do so based on work found below:
- https://openrefine-wikibase.readthedocs.io/en/latest/scoring.html
- https://gitlab.com/nfdi4culture/ta1-data-enrichment/openrefine-wikibase
- https://ceur-ws.org/Vol-2773/paper-17.pdf
- https://github.com/KBNLresearch/OpenRefine-Wikibase/tree/main
Tutorial
Requirements: Python 3, Docker, OpenRefine, Wikibase Extension for OpenRefine, Wikibase instance where you have privileges
Creating the Reconciliation Service:
-
Run the following in your terminal:
git clone https://github.com/judaicadh/wikibaseopenrefine open wikibaseopenrefine cd wikibaseopenrefine
- You should see the contents of the folder wikibaseopenrefine in your file browser.
- Open the
config.py
file in your favorite code editor. - Replace "shira" with the name of your wikibase.cloud instance.
- Replace the entity identifiers with those specific to your wikibase (e.g.
type_property_path = 'P39'
). - Save your now modified
config.py
file and close the code editor - Repeat steps 3–6 for the
manifest.json
file. - In your terminal, run
docker-compose up
-
You should see something like the following indicating that the container in docker successfully deployed.
- You should also be able to navigate to http://localhost:8000/.
Reconciliation Service and Wikibase:
- Navigate to your https://foo.wikibase.cloud/wiki/Special:Tags
-
Create tags for whatever versions of OpenRefine will be uploading to your wikibase. Below is an example for oprefine-3.7.
Connecting OpenRefine, Wikibase, and your Reconciliation Service
- Open the spreadsheet your wish to reconcile with your wikibase instance in OpenRefine
- Select Wikibase from the extensions menu and click "Manage Wikibase Instances"
- Click "Add Wikibase Instance"
- Open the manifest.json file found in repository (which you should have edited above)
- Copy the file's contents and past them into the larger text box.
- Click "Add Wikibase"
-
Your wikibase instance should appear in your wikibase extension window. (The wikibase you hope to reconcile against should be selected, if it isn't select it.)
- Click "Ok"
- Return to the Wikibase Extension Menu.
- This time, click Manage Wikibase Account
- Enter the credentials for your wikibase (or a bot, if you want to be fancy) and login.
- Now, you can reconcile against your wikibase like any other reconciliation service using: http://localhost:8000/en/api
If you have any questions, don't hesitate to reach out.
-Laura
Date
May 14, 2024