Human-Content-to-Machine-Data_Final - Flipbook - Page 21
and artists—for instance, noticing the style in which musicians arrange notes, or building on
surrealist styles initiated by visual artists. Likewise, scientists and researchers build on past
discoveries and the existing literature to gain a better understanding of how the world works
and to progress ideas. Human progress is enabled by the ability to build on the past.
This is why we continue to believe that copyright should be balanced in order to facilitate
TDM. At its core, TDM is a way to study and analyze existing works, using machines, in order to
create new insights and materials. AI training involves forms of TDM. While much of the
discourse around TDM as applied to AI has focused on the creation of artistic works, TDM and
AI have uses that can help generate advances across science, education, healthcare, and
other domains of signiûcant importance to society.83 In general, we think using existing works
in order to derive uncopyrightable elements or make otherwise non-infringing uses should be
permissible under copyright law, even if it involves making a copy of a whole work as an
intermediate step, such as through TDM.
There are certainly scenarios where AI training and deployment constitutes copyright
infringement; the lines here vary by jurisdiction and context and are actively undergoing
litigation.84 However, we know the current state of copyright law around the world does not
grant rightsholders universal authority to control use of their works for AI training. In the
United States, the doctrine of fair use generally protects analysis of existing works to extract
non-copyrightable elements. The European Union (EU) has an exception for TDM for certain
research and cultural heritage institutions, while allowing others to perform TDM so long as
they abide by speciûc, machine-readable reservations made by rightsholders.
More to the point, copyright shouldn9t grant universal authority to control use of works for AI
training in all scenarios. This would mean granting a monopoly over ideas, genres, and facts.
Expanding property rights risks further concentration of power, both in AI development and
beyond. And, given that many content creators and artists sign away their copyrights to large
companies, the main beneûciaries of more restrictive copyright laws and licensing deals
would be large rightsholders, not creators themselves.
83
Rucic, H. (2024, April 23). KR21 Principles on Artificial Intelligence, Science and Research.
Knowledge Rights 21.
https://www.knowledgerights21.org/news-story/kr21-principles-on-artificial-intelligence-science-and
-research/
84
OECD. (2025, February 9). Intellectual property issues in artificial intelligence trained on
scraped data. OECD.
https://www.oecd.org/en/publications/intellectual-property-issues-in-artificial-intelligence-traine
d-on-scraped-data_d5241a23-en.html
21