Humans destroyed millions of printed books to build their AI models

But if you’re not very familiar with the AI industry and copyright, you might be wondering: Why would a company spend millions of dollars to destroy them on books? Behind these weird legal exercises is a more fundamental driver: the AI industry is indefinitely hungry for high-quality texts.

Competition for high-quality training data

To understand why humans want to scan millions of books, it is important to know that AI researchers build large language models (LLMs) just like those who feed billions of words into neural networks. During the training period, the AI system repeatedly processes the text, establishing a statistical relationship between words and concepts in the process.

The quality of training data powered to the neural network directly affects the functionality of the resulting AI model. Models that train well-trained books and articles tend to produce models that are trained than those trained in YouTube random comments (such as random YouTube comments).

Publishers legally control what AI companies are desperately looking for, but AI companies don’t always want to negotiate licenses. The first doctrine of selling offers a solution: Once you buy a physics book, you can do what you want with that copy – including destroying it. This means buying physical books provides a legal solution.

However, even if it is legal, buying things is expensive. Therefore, like many AI companies before, humans initially chose a quick and easy path. The court filed a lawsuit that in order to seek high-quality training data, humans first chose to accumulate digital pirated books to avoid what CEO Dario Amodei calls “legal/practice/business/business barriers” (complex licensing negotiations with publishers. But by 2024, for legal reasons, anthropomorphism has become “less than gungho” to use “using pirated e-books” and requires a safer source.

What's Hot

Hewlett Packard Enterprise $14B acquisition of Juniper, the judiciary clears after settlement

Unlock performance: Accelerate Pandas operation using Polars

Anker recalls five more electric banks to achieve fire risk

Humans destroyed millions of printed books to build their AI models

Dave’s diver in the Jungle DLC won’t arrive until 2026, but Godzilla is back

Authors call on publishers to limit their use of AI

Looking for Friday night you might see boots shooting stars

Get a 10TB secure cloud storage lifespan subscription for $280

“Ironheart” has a tangle to exercise, but it flies high enough

Openai’s first AI device with Jony Ive won’t be wearable

Smart Home Décor : Technology Offers a Slew of Options

Edifier W240TN Earbud Review: Fancy Specs Aren’t Everything

Review: Xiaomi’s New Mobile with Hi-fi and Home Cinema System

Hewlett Packard Enterprise $14B acquisition of Juniper, the judiciary clears after settlement

Unlock performance: Accelerate Pandas operation using Polars

Anker recalls five more electric banks to achieve fire risk

Dave’s diver in the Jungle DLC won’t arrive until 2026, but Godzilla is back

Our Picks

Hewlett Packard Enterprise $14B acquisition of Juniper, the judiciary clears after settlement

Unlock performance: Accelerate Pandas operation using Polars

Anker recalls five more electric banks to achieve fire risk

Top Reviews

Smart Home Décor : Technology Offers a Slew of Options

Edifier W240TN Earbud Review: Fancy Specs Aren’t Everything

Review: Xiaomi’s New Mobile with Hi-fi and Home Cinema System

Subscribe to Updates

What's Hot

Humans destroyed millions of printed books to build their AI models

Competition for high-quality training data

Related Posts