One of the fallouts of the AI Action Plan is the battle over copyrighted material. Reports reflect a brewing conflict between OpenAI and Google versus Hollywood. OpenAI, producers of ChatGPT, and Google's Gemini want the right for generative AI to use copyrighted material for their machine learning training data, while Hollywood says "hands off" our products and our celebrities.
More than 400 members of the entertainment industry have signed a letter opposing OpenAI and Google's proposal to allow AI models to train on copyrighted content. The letter claims that both companies are arguing for a special government exemption that would allow them to freely access the products of creative industries.
There is no reason to weaken or eliminate the copyright protections that have helped America flourish, not when AI companies can use our copyrighted material by simply doing what the law requires: negotiating appropriate licenses with copyright holders -- just as every other industry does.
The letter is signed by stars like Ben Stiller, Mark Ruffalo, Cynthia Eviro, Cate Blanchett, Taika Waititi, Ayo Edebiri, Aubrey Plaza, Guillermo del Toro, Natasha Lyonne, Paul McCartney, and many others.
Using copyrighted material for AI machine learning training data can be legally complex and depends on several factors, including jurisdiction, the nature of the material, and how it is used. Here are some key considerations:
In the United States, the fair use doctrine allows limited use of copyrighted material without permission for purposes such as criticism, comment, news reporting, teaching, scholarship, or research. Factors considered for fair use include:
AI training might qualify as fair use if it is transformative (e.g., using the material to create something new, like a model that generates original content).
If the material is not covered by fair use, you may need to obtain a license or permission from the copyright holder. Some datasets are explicitly licensed for AI training (e.g., Creative Commons licenses, open datasets).
Copyright laws vary by country. For example, in the EU, the Text and Data Mining (TDM) exception allows the use of copyrighted works for machine learning, but only for research purposes and under certain conditions. Other countries may have stricter or more lenient rules.
Works in the public domain (e.g., expired copyrights) can be freely used for training. Open datasets (e.g., Common Crawl, Wikipedia) are often available under permissive licenses.
Even if legally permissible, using copyrighted material without consent may raise ethical concerns, especially if it harms the original creators or their market.
Legal cases, such as Authors Guild v. Google (the Google Books case), have set precedents for the use of copyrighted works in large-scale digitization projects. However, the legal landscape for AI training is still evolving, and outcomes may vary.
Some jurisdictions are introducing new laws to address AI and copyright. For example, the EU's AI Act may impose additional requirements on AI training data.
arl.org/blog/training-generative-ai-models-on-copyrighted-works-is-fair-use
en.wikipedia.org/wiki/Artificial_intelligence_and_copyright
btlj.org/wp-content/uploads/2023/02/0003-36-4Quang.pdf
dykema.com/news-insights/the-battle-over-ai-training-data-copyright-fair-use-and-the-future-of-genai
copyrightalliance.org/copyrighted-works-training-ai-fair-use
reddit.com/r/legaladviceofftopic/comments/194e8pu/why_is_openai_allowed_to_use_copyrighted_material
forbes.com/sites/roomykhan/2024/10/04/ai-training-data-dilemma-legal-experts-argue-for-fair-use