How Penn's Librarians are Using Linked Data to Make the Web Better
Are you a sucker for Google auto-complete compilations? Have you ever used a search engine to find out how to convert pounds to kilograms? You have something called “linked data” to thank.
“Linked data is the language of the web,” says Beth Camden, Director of Information Processing. “It’s building the infrastructure that makes the web better.”
For the past year, Camden and a team of linked data editors, including Digital Library Strategist and Metadata Architect John Mark Ockerbloom and Head of Metadata Research Jim Hahn, have been harnessing the power of linked data to connect people with information about the Penn Libraries and its collections, even if those people never visit our website. As part of a pilot project run by the Program for Cooperative Cataloging, the team has been carefully and systematically adding information about materials in the Penn Libraries’ collections to Wikidata, an open and easily editable type of linked data dataset that is part of the suite of products created by the Wikimedia Foundation that also includes Wikipedia. To date, the Penn Libraries team has added over 5,000 new “items”--each representing an individual serial, publication, person, institution, academic department, or related thing of interest to library users--and edited over 7,000 others. The Penn Libraries is just one of 74 institutions involved in this project.
What is linked data, exactly, and why does it matter to librarians? We’re all familiar with website links, so much so that we’ve probably forgotten how magical they are. They allow us to do something we could never do before the internet: instantaneously jump from document to document, making connections among otherwise unrelated bits of information on the spot. For example, I, the writer of this piece, can direct you, a reader, to this very useful video, which can teach you even more about how linked data works.
But machines can’t intuit those connections in the same way. They don’t understand the relationship between this website and the website I have just sent you to explore. Linked data allows a machine--a computer, the internet itself--to understand those connections. Like a person checking different web pages to piece together someone’s life story, linked data lets an individual (or even another machine) ask a computer a question in a way that allows the computer to return an answer with information that comes from multiple websites.
But in order for a computer to understand connections and answer these questions, the information has to be presented in a particular form. And that’s where something like Wikidata, which is formatted in a way that allows machines to “read” connections among links, comes in.
Linked data gives Penn Libraries staff the power to connect items from our collections with information about those items from across the web. Take the Journal of Biological Chemistry, one of the many publications that is part of Deep Backfile, a project to identify and make available out-of-copyright academic journals in the Penn Libraries collections. Its Wikidata entry includes information about its publication dates, founder, editors, copyright status, and even the number of people who follow it on Twitter. While much of this information could also be found in our online catalog, by putting the information in Wikidata, it becomes part of the larger, interconnected web, allowing more people to find the journal--even if it wasn’t exactly what they were searching for.
“The more you link this stuff together…the more it attracts people,” says John Mark Ockerbloom. “It’s another way that we can reach outside our physical space and get in touch with people who are interested in this content.”
The team is also using Wikidata to improve the visibility of Penn’s teaching and scholarship. Along with adding information about the Libraries’ out-of-copyright journals, they have been creating or augmenting entries about the University’s schools and academic departments. While simple on their own, entries like these will allow search engines and other automated tools to make connections among research topics, faculty, and publications that they might not be able to otherwise.
Wikidata can also help improve the Penn Libraries’ catalog itself. In the future, the team hopes to use linked data to augment the information in catalog records and make the work of catalogers much more efficient. “Once you have the linked data…you can potentially bring in external resources from the Wikidata record,” says Jim Hahn. For example, he explains, creating a catalog entry for a film can be time consuming because it requires the cataloger to indicate all the people who played a significant role in the film’s creation--the director, major actors, screenwriter(s), cinematographer, and potentially other important individuals. Using linked data to automatically pull in this information can make the job significantly easier and the catalog record richer. “We've been experimenting a lot, and we're moving in the direction of thinking, ‘okay, how might [using linked data] streamline some of our cataloging? But it's also such a new paradigm that the training has been pretty intense.”
In turn, these improvements make it easier for library users to search for books, journals, and articles on precisely the topic they’re looking for. To further facilitate searching, Hahn is even toying with the idea of creating an autofill feature that works a bit like Google’s autofill, suggesting words and phrases you might search for as you type.
Using Wikidata isn’t the only way to create linked data, but as Camden notes, it’s one of the most straightforward. That means that library catalogers who aren’t experts in coding or artificial intelligence can still contribute. “If you can think structurally, almost anyone can do linked data with Wikidata.”
Wikidata has its pitfalls, of course. Like its more well-known sister, Wikipedia, Wikidata is extremely collaborative; it’s easy for anyone and anything--including bots--to augment a record. That’s part of the appeal, as it means that no one person needs to supply all the information about a given item. But it also means that the information might be poorly formatted, or just plain inaccurate. The team is intimately familiar with these complications; along with adding items to Wikidata, they have spent a lot of time editing and improving already-existing records. To other institutions considering adding information from their own collections to Wikidata, Ockerbloom notes, “There’s a bit of a trade-off: Do you want to scale up quickly? Or do you want to make sure there is quality control?” How much you want to embrace the fast-moving, collaborative nature of Wikidata may rely on your own comfort and ability to cede control.
But the opportunity presented by Wikidata is too great to ignore. “It's emblematic of what library description needs to do in the future, which is to be in spaces that are not just the library catalog,” says Hahn. “That might mean going into spaces where people are not trained as library catalogers and working with people who might have different values, goals, and objectives, But it's so important because it really expands the reach [of what we do].”
He adds, “It’s kind of uncomfortable, it’s kind of messy, and we need to be there."