Human-Content-to-Machine-Data_Final - Flipbook - Page 18
The quality and safety of AI models would also suffer from mass enclosure of information. The
issue of Western-centricity in training data72 and resultant risks to deploying many large AI
models in non-English languages,73 for example, is unlikely to improve through restrictive
licensing practices. Beneûcial uses of AI are more likely to emerge with engagement from and
the ability to access content generated by academia, civil society, and other actors, rather
than by ceding the ûeld to commercial interest alone.74
The good news? It is not too late to do something. Although so much AI training has already
happened, there will undoubtedly be new models created in the future. And new models of
equity and cooperation are needed to sustain access to content and information that
deployed models now rely on. Widely accessible corpora of high-quality, current data, like
Wikipedia, will have a vital role in providing the 8factual netting975 for deployed large AI models
and the products they underpin, just as they do for the wider web.
A Collaborative Intervention
We9re at a watershed moment. AI has broken the social contract that had governed the way
we, and machines, share and access knowledge. In the face of this disruption, the instinct to
limit access to information is understandable. But blunt enclosure will not serve the public
interest in the long term and ultimately puts the commons at risk.
We Need a New Social Contract for Machine Reuse
To protect and grow the commons, there must be a new social contract to govern how AI
models—and the tools, products and products they increasingly underpin—engage with it.
We believe creator consent is a core value of and a key component to a new social contract.
This view is both an ethical and a pragmatic position, rather than a legalistic one. There are
many scenarios in which creator consent may not be legally required under copyright law.
72
Longpre, S., Mahari, R., Lee, A., Lund, C., Oderinwale, H., Brannon, W., Saxena, N., Obeng-Marnu,
N., South, T., Hunter, C., Klyman, K., Klamm, C., Schoelkopf, H., Singh, N., Cherep, M., Anis, A.,
Dinh, A., Chitongo, C., Yin, D., & Sileo, D. (2024, July 24). Consent in Crisis: The Rapid Decline
of the AI Data Commons. ArXiv.org. https://arxiv.org/abs/2407.14933
73
Jain, D., Kumar, P., Gehman, S., Zhou, X., Hartvigsen, T., & Sap, M. (2024).
PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language
Models. ArXiv.org. https://arxiv.org/abs/2405.09373
74
Hansen, G. W. (2025). AI deals underscore importance of open access (opinion). Inside Higher Ed.
https://www.insidehighered.com/opinion/views/2025/01/07/ai-deals-underscore-importance-open-accessopinion
75
Gertner, J. (2023, July 18). Wikipedia9s Moment of Truth. The New York Times.
https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html
18