In a legal skirmish that could shape the future of artificial intelligence and intellectual property rights, Meta CEO Mark Zuckerberg has drawn parallels between YouTube’s handling of pirated content and Meta’s use of copyrighted material to defend his company’s practices. Newly unveiled excerpts from a deposition given by Zuckerberg late last year reveal the tech magnate’s strategy as Meta faces allegations of copyright infringement in the high-profile case Kadrey v. Meta.
The case, one of many lawsuits that pit authors and intellectual property holders against AI giants, centers around Meta’s use of copyrighted e-books in training its AI models. At the heart of the legal debate is whether using copyrighted material to train AI systems constitutes “fair use,” a claim AI companies consistently invoke but which copyright holders vehemently challenge.
Zuckerberg’s YouTube Defense
During his deposition, Zuckerberg likened Meta’s use of copyrighted data to YouTube’s approach to user-uploaded content. “For example, YouTube may end up hosting some stuff that people pirate for some period of time, but YouTube is trying to take that stuff down,” he stated. “And the vast majority of the stuff on YouTube, I would assume, is kind of good and they have the license to do.”
This comparison sheds light on Zuckerberg’s stance regarding the ethical and legal implications of using copyrighted data. While the analogy suggests that Meta is committed to addressing potential violations, critics argue that the situations are not comparable, as YouTube acts as a platform hosting user-generated content, whereas Meta actively trains its AI systems on copyrighted works.
LibGen and Meta’s AI Ambitions
Central to the lawsuit is Meta’s alleged use of LibGen, a controversial online repository often referred to as a “links aggregator” that provides free access to copyrighted works. LibGen’s library includes publications from major publishers like Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education. Despite repeated lawsuits, shutdown orders, and multi-million-dollar fines for copyright infringement, LibGen remains a thorn in the side of publishers.
Court documents unsealed this week allege that Meta used LibGen’s data to train its advanced AI models, including its Llama series, which competes with industry leaders like OpenAI. According to the filings, Meta’s internal teams flagged the legal risks of using LibGen. Some employees referred to the repository as a “data set we know to be pirated” and expressed concerns that such practices could undermine Meta’s credibility with regulators.
Notably, Zuckerberg’s deposition included a denial of detailed knowledge about LibGen. “I get that you’re trying to get me to give an opinion of LibGen, which I haven’t really heard of,” he stated. Despite this claim, internal documents presented by the plaintiffs suggest otherwise.
New Allegations Against Meta
The plaintiffs, including bestselling authors Sarah Silverman and Ta-Nehisi Coates, have recently amended their complaint to include fresh allegations. They accuse Meta of using LibGen to train its latest AI models, Llama 3, and even its forthcoming Llama 4. The lawsuit also claims that Meta cross-referenced pirated books in LibGen with legitimately licensed works to identify whether licensing agreements with publishers were necessary.
Furthermore, the plaintiffs allege that Meta downloaded additional copyrighted e-books from Z-Library, another notorious online repository, as recently as April 2024. Z-Library, like LibGen, has faced numerous legal challenges and takedowns, including domain seizures and criminal charges against its operators.
In an apparent effort to conceal the use of copyrighted material, Meta’s researchers allegedly inserted “supervised samples” during the fine-tuning of its Llama models. This tactic, the plaintiffs argue, was intended to obscure the origins of the training data.
Zuckerberg’s Take on Fair Use
Under questioning from plaintiffs’ attorney David Boies, Zuckerberg elaborated on his views regarding copyrighted material. He argued that imposing blanket bans on using datasets like LibGen would be unreasonable. “Would I want to have a policy against people using YouTube because some of the content may be copyrighted? No,” he said. “There are cases where having such a blanket ban might not be the right thing to do.”
However, Zuckerberg acknowledged the need for caution. “If there’s someone who’s providing a website and they’re intentionally trying to violate people’s rights,” he said, “obviously it’s something that we would want to be cautious about or careful about how we engaged with it or maybe even prevent our teams from engaging with it.”
Broader Implications
The Kadrey v. Meta case is part of a growing wave of litigation aimed at holding AI companies accountable for their use of copyrighted material. The outcomes of these cases could set precedents with far-reaching implications for the AI industry, copyright law, and the balance between innovation and intellectual property rights.
For Meta, the stakes are high. As one of the leading players in the AI race, its practices will not only be scrutinized in court but could also influence public perception and regulatory policies. For authors and copyright holders, the case represents an opportunity to push back against what they see as a systematic disregard for their rights.
As the battle unfolds, one thing is clear: the intersection of AI and copyright law is becoming one of the most contentious and consequential arenas in technology today. And with high-profile figures like Mark Zuckerberg taking the stand, the spotlight on these cases will only grow brighter.