Skip to main content

Reported EU legislation to disclose AI training data could trigger copyright lawsuits

Reported EU legislation to disclose AI training data could trigger copyright lawsuits


A late provision reportedly added to the EU’s forthcoming AI Act would force companies like OpenAI to disclose their use of copyrighted training data. Already, a number of high-profile AI firms have been hit by copyright lawsuits.

Share this story

A circle of 12 gold stars representing the European Union.
Illustration: Alex Castro / The Verge

The current AI boom, from Bing and Midjourney, relies on free access to training data, much of it scraped from the web and often protected by copyright. The use of this data has led to both criticism and lawsuits, particularly in the art world, with rights owners arguing that their work is being exploited without their permission.

Some of the AI world’s biggest players, like OpenAI, have avoided scrutiny by simply refusing to detail the data used to create their software. But legislation proposed in the EU to regulate AI (the long-building and far-reaching AI Act) could force companies to disclose this information, according to reports from Reuters and Euractiv.

The amendment was reportedly a late addition to the draft AI Act

Reuters says late amendments to the AI Act, which was approved in draft form by legislators earlier this week, will require “companies deploying generative Al tools, such as ChatGPT ... to disclose any copyrighted material used to develop their systems.” Earlier this month, Euractiv reported on the same provision, saying companies would have to “make publicly available a summary disclose the use of training data protected under copyright law.” Reuters, citing “sources familiar with the discussions,” says the amendment was “a late addition drawn up within the past two weeks.”

The details of this requirement are unknown, and the law may change during the coming closed-door negotiations, known as trilogues, needed to finalize the act. But if AI companies are forced to disclose the sources of their training data, it could open the door to numerous lawsuits that would affect some of the biggest names in tech.

Already, companies like Getty Images are suing image-generating AI for scraping their data without permission, while there are a small number of class action lawsuits targeting image- and code-generating AI. However, the biggest name in AI today — OpenAI, maker of ChatGPT, GPT-4, and DALL-E and the power behind Microsoft’s AI push — is extremely secretive about its data sources. The reported legislation could change this, giving evidence for lawsuits and leverage to discussions between organizations like media companies, whose data is being used and referenced by numerous chatbots.

Although the potential impact of the law will depend on its details, the rest of the EU’s AI Act is certain to have similarly broad effects on the fast-changing AI landscape.

The act will classify AI systems based on their perceived risk and require the companies responsible for building the most impactful tools to disclose important data about safety, interpretability, performance, and so on. As with previous tech regulation pushed by the EU, the AI Act will undoubtedly have a global effect on how tech companies do business. Lawmakers in the EU will continue to discuss details of the act throughout the year, though compliance for companies will likely not come into force until 2025 or later.