Human-Content-to-Machine-Data_Final - Flipbook - Page 4
Summary
Recent advances in AI have been driven by the use of large amounts of data, including from
across the web.
This isn9t entirely new. Over the past two decades, machines have been used to access and
compile web content to do things like build search engines and create digital archives.
Machine reuse of web data has largely been governed by informal norms and standards. This
social contract was based on a degree of reciprocity, and generally aligned with people9s
reasonable expectations for how their works would be used when they shared them publicly.
However, it9s increasingly clear that the social contract that underpinned machine use of web
data in the past no longer holds. Today, machines don9t just crawl the web to make it more
searchable or to help unlock new insights—they feed algorithms that fundamentally change
(and threaten) the web we know.
In response, some creators are choosing to take their content ofüine. Others are trying to
block machines from accessing their works and erecting paywalls. Large rightsholders are
pushing for legislators to expand the scope of intellectual property rights.
This isn9t sustainable, and it isn9t leading to the future we want. The impact of large AI models,
combined with this understandable backlash, risks creating a world where people are no
longer able or willing to share their works. Knowledge and creativity could be further locked
up, and decades of progress made by the open movement reversed.
This matters, as universal access to knowledge and culture is a human right, and vital to our
ability to address our most pressing challenges going forward. At this critical juncture, we
believe CC must intervene to help drive towards a more equitable digital future.
We9re working on a ûrst iteration of a preference signals framework, which we9re provisionally
calling CC signals. CC signals are designed to offer a new way for stewards of large collections
of content to indicate their preferences as to how machines (and the humans controlling
them) should contribute back to the commons when they reuse and beneût from using the
content.
Our intervention is based on the beliefs that there are many legitimate purposes for machine
reuse of content that must be protected, and that an ecosystem that better addresses the
legitimate concerns of those creating and stewarding human knowledge is both possible and
necessary.
4