By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Times CatalogTimes CatalogTimes Catalog
  • Home
  • Tech
    • Google
    • Microsoft
    • YouTube
    • Twitter
  • News
  • How To
  • Bookmarks
Search
Technology
  • Meta
Others
  • Apple
  • WhatsApp
  • Elon Musk
  • Threads
  • About
  • Contact
  • Privacy Policy and Disclaimer
© 2025 Times Catalog
Reading: Mark Zuckerberg gave Meta’s Llama team the OK to train on copyrighted works, filing claims
Share
Notification
Font ResizerAa
Font ResizerAa
Times CatalogTimes Catalog
Search
  • News
  • How To
  • Tech
    • AI
    • Apple
    • Microsoft
    • Google
    • ChatGPT
    • Gemini
    • YouTube
    • Twitter
  • Coming Soon
Follow US
  • About
  • Contact
  • Privacy Policy and Disclaimer
© 2025 Times Catalog
Times Catalog > Blog > Tech > AI > Mark Zuckerberg gave Meta’s Llama team the OK to train on copyrighted works, filing claims
AITech

Mark Zuckerberg gave Meta’s Llama team the OK to train on copyrighted works, filing claims

Usama
Last updated: January 10, 2025 11:56 am
Usama
Share
7 Min Read
Mark Zuckerberg gave Meta’s Llama team the OK to train on copyrighted works, filing claims
SHARE

Meta, the tech giant helmed by Mark Zuckerberg, is facing intense scrutiny over allegations that its AI models were trained on pirated content. Plaintiffs in the ongoing copyright infringement case Kadrey v. Meta claim that Zuckerberg personally approved the use of a dataset composed of pirated e-books and articles, a move that has ignited heated debates about ethics, copyright law, and the development of artificial intelligence.

Contents
The Core AllegationsThe Zuckerberg ConnectionThe “Fair Use” Defense vs. Ethical ConcernsTorrenting Pirated Works: A New AccusationWhy It MattersJudicial Pushback Against SecrecyThe Road Ahead

The Core Allegations

The lawsuit, spearheaded by renowned authors Sarah Silverman and Ta-Nehisi Coates, accuses Meta of leveraging copyrighted material without authorization to train its Llama AI models. According to recently unsealed court documents, Meta’s leadership allegedly greenlit the use of a dataset known as LibGen—short for Library Genesis—a controversial online repository that provides access to copyrighted works from major publishers like Pearson Education, McGraw Hill, and Macmillan Learning.

LibGen has faced numerous legal actions in the past for facilitating copyright infringement, with courts ordering its shutdown and imposing heavy fines. Despite its reputation, the dataset allegedly became a core resource for Meta’s AI training efforts.

The newly unredacted documents filed with the U.S. District Court for the Northern District of California recount Meta’s internal discussions on the matter. The filings reveal that some Meta employees referred to LibGen as a “dataset we know to be pirated” and raised concerns that its use could undermine the company’s credibility with regulators. However, these warnings were reportedly overruled after Zuckerberg’s direct approval.

The Zuckerberg Connection

According to the plaintiffs’ filing, Meta’s decision to proceed with LibGen was escalated to Zuckerberg himself, who ultimately gave the go-ahead. An internal memo cited in the filing states that after being “escalated to MZ” (Mark Zuckerberg’s initials), Meta’s AI team was “approved to use LibGen.”

This revelation aligns with earlier reporting that Meta had cut corners in sourcing training data for its AI. A report from April 2023 detailed how Meta hired contractors in Africa to summarize books and even considered acquiring the publisher Simon & Schuster to expedite data acquisition. Ultimately, however, the company opted to rely on datasets like LibGen, citing fair use as its legal justification.

The “Fair Use” Defense vs. Ethical Concerns

The central argument in Meta’s defense is the U.S. doctrine of fair use, which allows copyrighted material to be used in transformative ways. Tech companies like Meta argue that training AI models falls under this category. However, creators and copyright holders have pushed back, asserting that repurposing their works for profit-driven AI development is neither fair nor transformative.

The plaintiffs argue that Meta’s use of LibGen crosses a line, particularly because the company allegedly took active steps to conceal its actions. The filing claims that Meta stripped copyright notices and acknowledgments from the LibGen dataset before using it in training.

For instance, Meta engineer Nikolay Bashlykov reportedly wrote scripts to remove copyright metadata from e-books and scientific articles. Plaintiffs allege this wasn’t merely a technical step for training purposes but a deliberate attempt to obscure infringement, preventing public awareness of the copyrighted origins of Llama’s outputs.

Torrenting Pirated Works: A New Accusation

The filing goes further, alleging that Meta used torrenting—a method of file sharing—to access LibGen, thereby engaging in another form of copyright infringement. Torrenting typically involves both downloading and uploading files, meaning Meta may have unintentionally helped distribute copyrighted materials.

Internal discussions reportedly highlight concerns among Meta engineers about this method. Bashlykov, for example, is quoted as warning that torrenting LibGen “could be legally not OK.” Despite these warnings, Meta’s head of generative AI, Ahmad Al-Dahle, allegedly dismissed the concerns, enabling the torrenting of the dataset.

Why It Matters

The implications of these allegations are far-reaching. If proven true, Meta’s actions could set a troubling precedent for the tech industry, raising questions about accountability, corporate ethics, and the trade-offs made in the race to dominate AI.

The plaintiffs argue that Meta’s shortcuts are particularly egregious because the company had lawful alternatives. “Had Meta bought plaintiffs’ works in a bookstore or borrowed them from a library and trained its Llama models on them without a license, it would have committed copyright infringement,” the filing states. “Meta’s decision to bypass lawful methods … serves as proof of copyright infringement.”

The court has yet to rule on these allegations, and the case only concerns Meta’s earliest Llama models. Still, the revelations have already cast a shadow over Meta’s AI operations, threatening to erode trust in its approach to innovation.

Judicial Pushback Against Secrecy

Meta’s attempts to downplay the controversy have also faced challenges. On Wednesday, Judge Vince Chhabria rejected Meta’s request to redact substantial portions of the plaintiffs’ filing, stating that the request seemed more focused on avoiding bad publicity than on protecting sensitive business information.

“It is clear that Meta’s sealing request is not designed to protect against the disclosure of sensitive business information that competitors could use to their advantage,” Chhabria wrote. “Rather, it is designed to avoid negative publicity.”

This judicial critique adds to the growing public perception that Meta’s actions may have been both legally and ethically dubious.

The Road Ahead

The outcome of Kadrey v. Meta remains uncertain. Courts have dismissed similar copyright claims against AI companies in the past, often ruling that plaintiffs failed to demonstrate specific instances of infringement. However, the depth of evidence presented in this case, coupled with the high-profile nature of the plaintiffs, could lead to a different outcome.

As AI continues to revolutionize industries, the battle over the legal and ethical boundaries of training data is far from over. This case could become a watershed moment, influencing how companies source data for AI while balancing innovation with respect for intellectual property.

For now, the allegations raise serious questions about Meta’s commitment to responsible AI development—and whether the pursuit of technological dominance justifies cutting corners at the expense of creators and copyright holders.

You Might Also Like

Logitech’s MX Creative Console now supports Figma and Adobe Lightroom

Samsung resumes its troubled One UI 7 rollout

Google Messages starts rolling out sensitive content warnings for nude images

Vivo wants its new smartphone to replace your camera

Uber users can now earn miles with Delta Air Lines

Share This Article
Facebook Twitter Pinterest Whatsapp Whatsapp Copy Link
What do you think?
Love0
Happy0
Sad0
Sleepy0
Angry0
Previous Article You can finally buy a Thunderbolt 5 SSD You can finally buy a Thunderbolt 5 SSD
Next Article India’s digital payments strategy is cutting out Visa and Mastercard India’s digital payments strategy is cutting out Visa and Mastercard
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

144FollowersLike
23FollowersFollow
237FollowersPin
19FollowersFollow

Latest News

Pinterest is prompting teens to close the app at school
Pinterest is prompting teens to close the app at school
News Tech April 22, 2025
ChatGPT search is growing quickly in Europe, OpenAI data suggests
ChatGPT search is growing quickly in Europe, OpenAI data suggests
AI ChatGPT OpenAI April 22, 2025
social-media-is-not-wholly-terrible-for-teen-mental-health-study-says
Social media is not wholly terrible for teen mental health, study says
News April 22, 2025
Google is trying to get college students hooked on AI with a free year of Gemini Advanced
Google is trying to get college students hooked on AI with a free year of Gemini Advanced
AI Gemini Google Tech April 19, 2025
Times CatalogTimes Catalog
Follow US
© 2025 Times Catalog
  • About
  • Contact
  • Privacy Policy and Disclaimer
Welcome Back!

Sign in to your account

Lost your password?