Human-Content-to-Machine-Data_Final - Flipbook - Page 8
companies behind early versions of search engines and sought to require that they get
explicit consent before crawling.20 Had the publishers won, the web may have been stiüed in
its infancy.
Instead, over the last 25 years, publishers, crawlers, and other stakeholders have worked
together to establish norms about the appropriate use of web data.
These informal norms represent a form of social contract. Social contracts are not codiûed in
laws or strict rules, but shape behavior based on shared values and a desire to act in ways
that are mutually beneûcial. They are expected to be followed and carry weight where the law
doesn9t necessarily require adherence,21 with consequences for noncompliance. They are
continuously negotiated and evolve over time, and while they don9t necessarily resolve
differences and tensions, they provide a degree of mediation based on shared expectations
that enables different groups to cooperate.
A simple yet important component of the social contract that has governed machine use of
web data has been the Robots Exclusion Protocol (robots.txt).22 Websites could use
robots.txt—as well as other indexing protocols such as HTML meta tags and HTTP header
directives—to indicate their preferences to crawlers (including to disallow them) in a scalable,
machine-readable format. Crawlers, by and large, respected preferences expressed using
robots.txt, and the protocol supported the rapid growth of the web over the 2000s and
2010s. The process of developing and maintaining robots.txt and other protocols also
contributed to the forming of the social contract, in that it involved the input and
collaboration of affected stakeholders over a number of years.
20
Gasser, U. (2006, January 1). REGULATING SEARCH ENGINES: TAKING STOCK AND LOOKING AHEAD. Yale
Journal of Law & Technology. https://yjolt.org/sites/default/files/gasser-8-yjolt-201.pdf
21
Masiello, B. & Slater, D.. (2023, September 19). Beyond Copyright: Tailoring Responses to
Generative AI & The Future of Creativity. Tech Policy Press.
https://www.techpolicy.press/beyond-copyright-tailoring-responses-to-generative-ai-the-future-of-cr
eativity
22
Wikipedia Contributors. (2025, March 16). robots.txt. Wikipedia; Wikimedia Foundation.
https://en.wikipedia.org/wiki/Robots.txt
8