Cloudflare unveils new tools to help creators control AI use
Cloudflare has introduced new tools designed to help online content creators control how artificial intelligence (AI) companies use their published material.
The company's new Content Signals Policy builds on the widely-used robots.txt mechanism, a file frequently employed by website owners to specify what parts of their sites web crawlers can access. This latest policy allows owners to not only declare which parts of their sites can be indexed, but also to specify how AI search engines and other AI tools may use their content-including the option to permit display in AI-generated answers but deny the use of the material for AI model training.
Changing online landscape
AI-based search technology departs significantly from traditional search engines: where old models directed users to external sites via a series of links, modern AI search often provides direct answers synthesised from third-party content. Even when these AI engines include source links, the volume of clickthroughs to the original websites is substantially lower, presenting a challenge for publishers and content creators who depend on user visits for their business models.
Cloudflare's announcement notes that while robots.txt files have enabled website operators to set limits on crawler access, there has been no standardised method for them to express specific preferences about how their content should be used after being accessed. The new Content Signals Policy establishes a machine-readable framework for signalling these preferences, providing clarity for AI companies regarding permissible uses of content.
Industry perspective
Matthew Prince, Co-Founder and Chief Executive Officer at Cloudflare, said:
"The Internet cannot wait for a solution, while in the meantime, creators' original content is used for profit by other companies. To ensure the web remains open and thriving, we're giving website owners a better way to express how companies are allowed to use their content. Robots.txt is an underutilised resource that we can help strengthen, and make it clear to AI companies that they can no longer ignore a content creator's preferences."
According to Cloudflare, more than 3.8 million domains already use its managed robots.txt service to express that they do not want their content used for AI training. With the new policy, these preferences can be refined further, allowing users to clarify whether their content may be used for search summaries, AI input, or broader AI training purposes.
The Content Signals Policy presents these options in straightforward terms: a "yes" grants permission, a "no" withholds it, and no signal indicates no explicit preference. The policy also explains these options, describes the ways crawlers can use content, and reminds all parties that such stated preferences in robots.txt files could have legal implications.
Support from content platforms
The policy has received statements of support from several major technology and content companies.
Danielle Coffey, President and Chief Executive Officer of the News/Media Alliance, stated:
"We are thrilled that Cloudflare is offering a powerful new tool, now widely available to all users, for publishers to dictate how and where their content is used. This is an important step towards empowering publishers of all sizes to reclaim control over their own content and ensure they can continue to fund the creation of quality content that users rely on. We hope that this encourages tech companies to respect content creators' preferences. Cloudflare is showing that doing the right thing isn't just possible, it's good business."
Ricky Arai-Lopez, Head of Product at Quora, said:
"We applaud Cloudflare's leadership and support their efforts in building controls and protocols to help publishers manage how their content is accessed."
Chris Slowe, Chief Technology Officer of Reddit, commented:
"For the web to remain a place for authentic human interaction, platforms that empower communities must be sustainable. We support initiatives that advocate for clear signals protecting against the abuse and misuse of content."
Eckart Walther, Co-Founder of the RSL Collective, explained:
"We are excited to partner with Cloudflare on the launch of the Cloudflare Content Signals Policy, an essential step forward in allowing publishers to assert their rights and clearly define how companies may use their content. The open RSL standard, developed in cooperation with the Internet's leading publishers, is designed to complement the Content Signals protocol by enabling content owners to not only protect their rights, but also define machine-readable licensing and compensation terms for those use cases. Together, the RSL Collective and Cloudflare are advancing a shared vision: a sustainable open web where publishers and creators thrive and are fairly compensated by AI companies."
Prashanth Chandrasekar, Chief Executive Officer of Stack Overflow, added:
"The nature of the Internet and its implicit agreement with content publishers has changed quite dramatically over the past couple of years. With our large corpus of ~70 billion tokens of data, Stack Overflow is proud to partner with the leading AI labs and cloud providers on the data licensing front and we applaud Cloudflare for playing a central role to empower and protect content creators to build a scalable system for the internet in this new AI era."
Implementation and availability
Cloudflare has begun automatically updating robots.txt files with the new policy language for customers using its managed service. It is also releasing tools for those who want to implement custom policies, allowing content owners the ability to define their stance on content use by AI in a machine-readable format. The company states that while robots.txt files cannot technically prevent all scraping, the updated language aims to more effectively communicate website owners' preferences to bot operators, encouraging greater adherence to these stated wishes.