The main entrance to the Van Pelt-Dietrich Library Center is now open. Van Pelt Library and the Fisher Fine Arts Library are currently open to Penn Card holders, Penn affiliates, and certain visitors. See our Service Alerts for details.

Introduction

We are working on a large Jewish music project that brings together our disparate and diverse Jewish music collections at Penn Libraries (and beyond) into a database that is accessible for scholars, educators, audiophiles, and the general public. 

As part of this work, we want to supplement our data with information that can be found on sites like Discogs and Internet Archive.  In order to pull data from these repositories, we need to use a reconciliation service. This will allow us to fuzzy match, for example, the name of an album in a Penn Libraries' database with the album and its associated data in Discogs or Internet Archive. Then we can pull the data to supplement our records. 

A Case for Reconciliation

The Robert and Molly Freedman Jewish Sound Archive (referred to hereafter as the Freedman Archive) contains over 4,000 recordings. One of these recordings is Tevya And His Daughters (1957). The Freedman Archive displays the following metadata forTevya And His Daughters (1957):

Tevye and His Daughters in the Freedman Database
Freedman Database entry for Tevya And His Daughters

 

You can see that this album metadata in the Freedman Archive has lots of information including the album publisher, the number of tracks, the location of publication etc. However, we don't have some important information that we hope to include in our larger music database such as  track titles (there are three tracks), the date of publication, and more. 

Some of the data that the Freedman Archive does not have can be found through Discogs and Internet Archive. 

Discogs Metadata
Discogs entry for Tevya And HIs Daughters
Internet Archive Record for Tevya And His Daughters
Tevya And His Daughters on Internet Archive 

 

To harvest this data, we created two reconciliation services. These allow us to take a spreadsheet with album titles from the Freedman Archive and match them up to the titles found in Internet Archive and Discogs. We then can pull the data from these sources easily and decide what data will be useful.  

Tutorial for Reconciling in OpenRefine to Discogs and Internet Archive

Note: you must have Python installed and an API personal access token from Discogs (you can get one for free when you make an account). Also, Discogs can be slow due to their rate limiter, so if you have loads of data it might take some time to process. 

Creating your Discogs reconciliation service

  1. First, clone the reconciliation service repository   
    git clone https://github.com/judaicadh/discogsreconciliation or git clone https://github.com/judaicadh/internetarchivereconciliation This repository is adapted from Michael Stephens with his https://github.com/mikejs/reconcile-demo
  2. Next, open Terminal or another CLI and navigate to the discogsreconciliation or internetarchivereconciliation folder you've just cloned. 

    Steps 3–5 are only for the discogs reconciliation service

  3. Open the discogs.py file in a code editor
  4. Replace 'YOUR_DISCOGS_API_TOKEN' with your actual Discogs API personal access token.
  5. Save the discogs_reconciliation.py file
  6. Install the required Python packages if you haven’t already pip install Flask flask-cors requests fuzzywuzzy
  7. Run python discogs_reconciliation.py or python internetarchiverecon.py

Linking your Discogs or Internet Archive reconciliation service to OpenRefine

  1. Navigate to OpenRefine and open your spreadsheet with the music metadata you hope to reconcile
  2. Open the menu for column you wish to reconcile -> click Reconcile -> click Start Reconciling. A smaller window should open.
  3. Select the  "Add standard service..."  buttons at the bottom of the new window
  4. Enter http://127.0.0.1:3000/reconcile (for Discogs) or  http://127.0.0.1:9000/reconcile (for Internet Archive) as your reconciliation address
  5. Select "Add service"
  6. The service should now appear in the list of reconciliation services. 
  7. You can select the service that you just added
  8. You should see a window similar to the one below. If you are reconciling Artist names, select Artist to reconcile against, same for the other options (Release, Master, Label) (The Internet Archive service does not have this feature).
  9. Then click "Start reconciling..." You'll see your results like you would with any other reconciliation service (see documentation here)
Services

 

What's Next?

That's next weeks blog. Essentially, you will need to verify the matches that you've made with these reconciliation services, then you will pull the supplemental data, and finally you'll curate the data. 

Also, if you want to listen to Tevya And His Daughters, you can play the LP below.

Author

Date

June 4, 2024

Share