On May 5, 2026, five of the world's largest publishers — Hachette, Macmillan, McGraw Hill, Elsevier and Cengage — joined bestselling legal-thriller author Scott Turow in suing Meta and its chief executive, Mark Zuckerberg, in Manhattan federal court. Their charge: that Meta, a roughly $1.5 trillion company, built its Llama family of AI models on a foundation of stolen books. According to the complaint, Meta downloaded millions of copyrighted books and journal articles from notorious pirate sites such as LibGen and Anna's Archive, scraped 'virtually the entire internet,' and then stripped copyright-management information from the files to obscure where the texts came from.
The plaintiffs allege something more striking than careless data sourcing. They claim Zuckerberg personally authorized the infringement. Internal communications cited in the suit indicate that Meta initially explored licensing deals with publishers in early 2023 but abandoned them on what the complaint calls 'Zuckerberg's personal instruction.' The reasoning, according to one Meta employee quoted in the complaint, was strategic: paying for even a single book would undermine the company's planned 'fair use' defense. Meta has vowed to fight the lawsuit aggressively, arguing that 'AI is powering transformative innovations' and that courts have already found AI training on copyrighted material can qualify as fair use.
Here's the catch. Meta did win a similar case in June, brought by authors including Ta-Nehisi Coates and Richard Kadrey. But the judge in that case ruled narrowly — saying the plaintiffs hadn't supplied enough evidence that Llama would harm the market for human-written books — while explicitly describing their underlying market-harm argument as 'potentially winning' if backed by stronger proof. The publishers in this new suit appear to have built their case to clear exactly that bar. They describe Llama as 'an infinite substitution machine' capable of flooding Amazon, already the world's largest book marketplace, with imitation versions of copyrighted works.
The Meta case is the latest in a wave of copyright lawsuits filed by artists, authors and newspapers against AI developers including Microsoft and OpenAI. The financial stakes were dramatized last year when AI startup Anthropic agreed to pay $1.5 billion to settle a similar suit over pirated training texts — a number that now functions as a benchmark for damages in the broader fight. Unlike earlier suits driven by individual authors, this one is a coordinated front from the publishing industry's largest players, and it specifically targets two distinct alleged wrongs: how Meta obtained the texts (piracy plus metadata stripping) and how it used them (training a commercial product without permission).
The legal question isn't whether AI can read. It's whether scraping pirate libraries to teach it counts as fair use — and whether 'move fast and break things,' Meta's old motto, applies when the things being broken are the property rights of the people who wrote the books. The plaintiffs are seeking unspecified damages and want to represent a broader class of copyright owners, meaning a single ruling could reshape the economics of AI development for years. If they win, expect a multi-billion-dollar licensing market for high-quality text to appear almost overnight. If Meta wins, expect every other AI company to start treating the open internet — and pirate libraries beyond it — as a legally defensible buffet.
Imagine pirating millions of books — then telling a judge it's legal because you only stole them to teach a machine. That's roughly Meta's argument, and five major publishers just called the bluff.
Five major publishers — Hachette, Macmillan, McGraw Hill, Elsevier and Cengage — plus bestselling author Scott Turow have filed a class-action lawsuit against Meta and CEO Mark Zuckerberg in Manhattan federal court. They allege Meta downloaded millions of pirated books and journal articles from shady sites like LibGen and Anna's Archive to train its Llama AI models, then stripped out the copyright info to hide where the texts came from.
The plaintiffs claim Zuckerberg himself authorized the piracy. According to internal communications cited in the complaint, Meta initially explored licensing deals with publishers in early 2023 but abandoned them — reportedly because licensing even one book would weaken the company's planned 'fair use' legal defense. Meta says it will fight 'aggressively,' arguing courts have already found that AI training on copyrighted material can qualify as fair use.
The legal fight isn't really about whether Llama 'read' books — it's about whether the *way* it got them, and what it does with them, breaks copyright law. Two parallels make the stakes clearer:
If you've ever used ChatGPT to summarize a novel for English class, or watched an AI generate fake textbooks on Amazon, you've seen the downstream effects of how these models were trained. The outcome of this case will shape whether the writers of the books you read in college get paid when AI digests their work — and whether the next decade of AI is built on licensing deals (more expensive, slower) or scraped data (faster, possibly illegal). It will also influence what creative careers actually look like by the time you're applying for one.
This is the latest in a wave of suits from artists, novelists and newspapers against companies like Microsoft and OpenAI, and it's the first major one brought by publishers as a coordinated bloc. Watch for whether courts start treating *how* AI companies acquired training data as a separate violation from *what* they output — that distinction could blow a hole in the fair-use defense. The second-order effect: a multi-billion-dollar licensing market for high-quality text could emerge almost overnight, reshaping who profits from the AI boom.