Penn Libraries News

Discover Your Next Read with the Online Books Page

For 30 years, the Online Books Page has been cataloging books and articles that can be found on the World Wide Web while advocating for open access publishing and the public domain.

A laptop sits on a desk beside a mug and a pair of glasses. On the laptop's screen is the landing page for the Online Books Page, a fully text-based web page.

Looking for an easily-readable, free copy of Hamlet for an upcoming class or theater production? Wondering what kind of scientific journal articles you can find for free online? Curious about science fiction by marginalized authors that was never broadly published? Your first instinct might be to start your exploration with Google. But you might have more luck exploring the Online Books Page.

Now in its 30th year, the Online Books Page is the brainchild of John Mark Ockerbloom— a librarian with his fingers in a lot of pies. As the Digital Library Specialist and Metadata Architect, his role is to investigate, plan, and implement digital initiatives at the Penn Libraries. The projects he works on allow him to connect with and serve people outside the Penn community, including scholars and students across the world, while helping to grow the Libraries’ digital infrastructure. The Online Books Page is one such initiative.

Birth of the Online Books Page

In June of 1993, Mark Ockerbloom was a computer science graduate student at Carnegie Mellon University and the World Wide Web was in its infancy. Earlier that year, the web browser Mosaic had been made commercially available, allowing non-experts to access websites from their home computers quickly and easily. It was an exciting time to start playing around with what kinds of information could be shared online, and how—and Mark Ockerbloom set about doing just that. After setting up the first website at Carnegie Mellon for his department, he, as he put it in a recent interview, “Turned it loose on our local file system so that anybody in the department who wanted to put up web pages could do so.”

A number of colleagues were interested in creating websites that featured the text of out-of-copyright books. By this point, the online book database Project Gutenberg was over a decade old and growing quickly, and that gave Mark Ockerbloom an idea: what if he created a website that compiled links to books that could be found across the web? Today, the idea seems almost mind-bogglingly simple. In 1993 it was revolutionary.

The Online Books Page was born.

Operating like a library catalog for the internet, the Online Books Page today includes links to over 3 million publications. You can browse by Library of Congress subject headings and search by author or title, read supplemental information that Mark Ockerbloom has compiled, and discover publications that went out of print over 100 years ago. HathiTrust recently noted that more people arrive on its site via the Online Books Page than via Google.

A World Wide Web of Words

Over the past three decades, the Online Books Page has not only grown in size, but in complexity— even if its aesthetic still takes you back to the early days of home computing. “A lot of directories have been put up on the web over time. They go for a while, and then they usually sputter out,” Mark Ockerbloom says. He credits innovations made to the site over the years for its longevity. The discoveries he made while improving this website have also gone on to influence projects he has worked on that more directly serve the Penn Libraries, including improvements to the Franklin catalog.

One major innovation was figuring out how to automatically import information about texts maintained on other websites. When the project started, there were a limited number of websites and a limited number of online books, so Mark Ockerbloom was able to add each record by hand. “The site grew to thousands of entries, and then to tens of thousands of entries, and I couldn’t make it any bigger on my own. So I said, ‘Okay, I’ve got to automate some of this.’” Now, he can automatically import metadata from other websites that host online books and turn that metadata into entries in his catalog. The site lists books from well-known sources like HathiTrust, Project Gutenberg, Early English Books Online, and the Directory of Open Access Journals.

Along with relying on additions from sites like these, he continues to add entries the old-fashioned way: by doing it himself. Often, his additions are targeted, based on gaps he notices in his listings or projects of importance to Penn or the wider scholarly community. For example, during the early days of the COVID-19 pandemic when all staff were working remotely, a number of librarians undertook a project to determine which serials in the Penn Libraries collections—publications like academic journals, magazines, newspapers, and comic books—were out of copyright and could be digitized. Called the Deep Backfile project, these serials are indexed on Mark Ockerbloom’s site.

Mark Ockerbloom also maintains a robust relationship with his users who, since the early days of the site, have submitted suggestions for additions. After checking to see if a particular work truly is out of copyright and ensuring that appropriate metadata exists, he will generally add suggested works to the site. These days, he can sometimes receive hundreds of submissions in a single month.

Why the Public Domain Matters

The Online Books Page isn’t just a resource: it is a project with a clear mission. What started as an experiment in the early days of the World Wide Web has grown into a passion project that centers and celebrates public domain and open access works. “One of my goals is simply to encourage more people to make their material openly available originally.”

Mark Ockerbloom notes that when the web was young, there was a movement among some authors of fiction and nonfiction, academic and popular writing, to bypass traditional publishing hurdles by putting their work online and making it available free of charge. For a number of reasons, including the proliferation of ebooks and journals, the massive growth of the web, and economic sustainability, that particular movement has lost steam, but others have taken its place. “The Open Access movement has advanced a lot since I started this,” he says. “Now the idea is that it wouldn't just be authors, but publishers and whole publishing enterprises that would make materials really available.” Along with adding materials that are no longer in copyright, Mark Ockerbloom lists open access works to the site as well.

But it’s the continued availability of public domain works that remains central to the Online Books Page. When a work enters the public domain, any person is allowed to acquire, share, adapt, remix, or otherwise consume the work without paying for it. To Mark Ockerbloom and many others, this makes public domain works of vital importance to both creativity and education.

“The purpose of copyright [in the United States] is to ‘promote the progress of science and useful arts’ as the Constitution put it,” says Mark Ockerbloom. “So we give exclusive rights to creators for a while to give them an incentive to create and hopefully get compensated.”

That’s the value of copyright as most people think about it. But a new kind of value comes into play when a work’s copyright ends. “After a time, we give [creative works] to the public at large so that they can freely read them, and they can freely build on them and do interesting things with them. And we've seen all kinds of new creative works built using works in the public domain.”

A significant part of Mark Ockerbloom’s work is reminding people that there are far more works out of copyright than people imagine. By default, as of January 1, 2023, anything published in 1927 or earlier is considered out of copyright in the United States—but that’s not all there is. Sometimes, works enter the public domain because an author never filed for copyright; other times, the copyright was never renewed. These kinds of works are both free to share and hard to find, and Mark Ockerbloom strives to ensure that they remain read thanks to the Online Books Page.

Mark Ockerbloom notes that, thanks to the 1998 Copyright Term Extension Act, no new works were added to the public domain in the United States for 20 years, thus restricting an important funnel for creativity, learning, and scholarship. By pointing people to copyright-free works, he hopes the Online Books Page can play a role in ensuring that something similar doesn’t happen again.

“I want the public domain to continue to grow.”