Changes coming to Books by Mail: Starting in fall 2025, Books by Mail items will only be shipped to addresses in the contiguous United States 50 miles or further from campus. We will make exceptions for students, faculty, and staff with disabilities. Learn more.

Fisher restoration: Fisher Fine Arts Library is open and operating normally during ongoing exterior work. Find more details about this important restoration project.

Penn Libraries News

Does AI ‘Dilute’ the Market for Books Written by Human Authors? Two Courts Weigh In 

A look at two new court cases examining the intersection of copyright and generative AI. 

Sarah Silverman stands behind a podium on a stage.

These are certainly interesting times for copyright law and American law in general! Just a few weeks after the Copyright Office released its “pre-publication” report on AI training, and a few months after the first court decision on this topic in Thomson Reuters v. Ross, we now have two more decisions, this time both from the Northern District of California, about whether training AI models on copyrighted works constitutes copyright infringement or fair use. While these new cases are nominally similar insofar as they both had reasons to find in favor of fair use for AI training, the differences in their analyses may portend how other courts will split on this issue.  

Bartz. V. Anthropic: splitting the baby 

The first case is Bartz v. Anthropic. In this case, a group of authors sued Anthropic, the developer of the large language model, Claude, for using the authors' books to train the model. Anthropic used an online library of nearly 200,000 unauthorized copies of books called Books3, as well as several other, similar online libraries. The company also purchased millions of print books that they scanned and discarded, saving only the digital copies. Altogether, the company assembled a collection of over 7 million book copies to serve as training data for Claude. Anthropic then saved these copies for future uses, including, but not necessarily limited to, training future AI models.  

Judge Willam Alsup held that both using books to train Claude and digitizing print books were fair uses. He described the training as “spectacularly” and “quintessentially” transformative because it enabled Claude to produce new works, not simply reproduce its training data. Moreover, he fully rejected the argument that training Claude harmed the market for the plaintiff’s works by offering a tool to the public that could easily create works that could compete in the marketplace with the plaintiffs’ works. On this point, Judge Alsup wrote, “Authors’ complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works. This is not the kind of competitive or creative displacement that concerns the Copyright Act. The Act seeks to advance original works of authorship, not to protect authors against competition.” 

However, even though Judge Alsup found that Anthopic’s use of the works for training was fair, including both the unauthorized copies and the digitized copies of books that Anthropic purchased, the same was not true for storing these unauthorized copies indefinitely. On this point, the court found that storing unauthorized copies was not fair use. 

Professor Edward Lee described Judge Alsup’s divided decision as “Solomonic.” He added, “Anthropic and other AI companies have a fair use precedent supporting AI training. But it also gives book authors an infringement precedent for use of pirated book copies at least if they are stored in a central library indefinitely.”

Kadrey v. Meta: A win — and a warning — for Meta

  The second case, Kadrey v. Meta, also from the Northern District of California, dealt with Meta’s Large Language Model, LLaMa. Like Anthropic, Meta also trained its AI model on the Books3 library, as well as other things the court describes as “shadow libraries.” A group of authors, including the comedian Sarah Silverman, sued on behalf of a class of similarly situated people whose works were part of Meta’s AI training data.

Judge Vince Chhabria found that Meta’s use was fair, seemingly reluctantly. Even though, like Alsup, he characterized Meta’s use as “highly transformative,” he disagreed with Alsup’s market harm analysis and spent a significant portion of his decision discussing how the theory of market dilution might weigh heavily against fair use. Under this theory, AI can harm the market for works in its training data by being able to create, at scale, other works that can compete with human-created works, even if they don’t specifically replicate the works in their training data. Judge Chhabria wrote that it would be easy to imagine how an influx of rapidly produced AI generated works could impact the sales of at least some works made by human authors.

Nevertheless, Judge Chhabria ultimately decided in favor of Meta in part because the plaintiffs did not present an argument for how market dilution may have harmed the market for their works. He wrote, “On this record, then, Meta has defeated the plaintiffs’ half-hearted argument that its copying causes or threatens significant market harm.”

Alsup v. Chhabria on market dilution

Looking at these two decisions, the harsh divide between them on market dilution is striking. Responding to Judge Alsup's assessment that market dilution is no different than teaching children to write using other people’s written works, Judge Chhabria wrote, “When it comes to market effects, using books to teach children to write is not remotely like using books to create a product that a single individual could employ to generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take. This inapt analogy is not a basis for blowing off the most important factor in the fair use analysis.”

Notably, Judge Chhabria’s views on market dilution echo the recent report from the United States Copyright Office on AI training. That report also elaborated on how market dilution could weigh against a finding of fair use. Even though Judge Chhabria did not cite the agency's report, his opinion indicates that this theory may gain support as other courts debate whether training is fair use.

While this expansive focus on one potential economic impact that AI may have on human-created works may make some sense, I find it troubling. On the one hand, the reason we protect copyright in this country is to support authors’ economic interest in their works to encourage them to create those works in the first place. Protecting those interests against potentially unfair competition fits within the law’s general design. On the other hand, expanding the scope of how we understand market harm is a slippery slope that could damage fair use in unpredictable ways. When courts analyze fair use, one thing they consider is whether a secondary use harms the market for the original work. Often courts look at whether a secondary use operates as a market substitute for the original, not whether the use competes in the same general market as the original. While I am sensitive to the very real concerns that creators have about AI replacing them or otherwise devaluing their work, I am afraid that embracing a theory like market dilution could narrow the scope of fair use in ways that would harm other uses that we would want fair use to protect.

So what is the solution? I’m not sure. We need to preserve the delicate balance that copyright law seeks to strike between the rights of both authors and users alike. But how exactly to do that is complicated. The courts will continue to wrestle with this issue, and there will be more decisions like these two in the coming months and years, so we should start to see how other courts think copyright law applies here.

But legal regulation alone may not be enough to address all the issues that generative AI raises. These issues may be so complex that any legal regulation that tries to address them will either be inadequate or overly restrictive. In that case, social standards and expectations may be able to supplement the law and guide the development of AI tools in ways that respect authors' concerns and do not replace their contributions that make the AI tools possible. The law can then act as a stopgap to prevent the worst abuses and inequities. For this, it is important for us all to talk to each other and for developers to talk to creators about what are appropriate ways to use and develop these technologies. In doing so, we can build standards and expectations that will hopefully lead to better technologies that work for all of us.

Author

Date

July 16, 2025

Share

Maps and More

Campus Libraries Map

Staff Information

Resources for Staff Committees