Cloudflare Introduces AI Auditing for Websites to Monitor and Block AI Crawlers
Cloudflare, the internet service provider, has announced the immediate availability of AI auditing capabilities for all websites, including those hosted for free. Currently in beta, this feature allows for the analysis of data scraping by AI companies.
The new AI Audit tool by Cloudflare provides insights into when AI company crawlers visit a site, what data they scrape, the frequency of their visits, and other analytical data.
Content creators and website administrators can now easily check which AI companies are scraping their content without permission. If they find unauthorized scraping objectionable, they can block it with a single click.
Distinct from Cloudflare's existing option to block all AI crawlers, the AI Audit tool allows administrators to conduct targeted audits and block specific crawlers.
For instance, if a website has an agreement with OpenAI allowing content scraping, the administrator can permit the GPTBot crawler while blocking all other known or unknown, unauthorized crawlers.
In demonstrations by Cloudflare, crawlers that can be identified include those from major AI developers like OpenAI, Meta, ByteDance, Common Crawl (a general-purpose crawler not associated with a specific company), Anthropic, Amazon, Perplexity, and others.
If a website has not entered into a content licensing agreement with any AI company, it can also block crawlers based on their scraping frequency. This helps prevent high-frequency scraping that consumes server bandwidth and affects the user experience for regular visitors.
Moreover, Cloudflare plans to launch a marketplace next year, allowing website administrators to set their own prices for content scraping. AI companies willing to pay can obtain scraping rights, while those unwilling can be blocked by Cloudflare with a single click.
To access the AI Audit tool, log in to the Cloudflare dashboard, select a website, and click on AI Audit in the left navigation panel. Given its recent launch, it's likely that data for most websites is still being populated.