Famous Repair Website iFixit Complains About Claude Launching DDoS Attacks, Accessing Millions of Times in One Day to Train AI
Claude is an artificial intelligence application developed by Anthropic, which, like most AI developers, dispatches crawlers to search and fetch massive amounts of content from the internet every day to train AI models.
iFixit, a well-known teardown and repair website, has a wealth of text and image-based teardown articles, so Anthropic's crawlers also launched an intense scraping effort on iFixit.
The site administrator complained on X/Twitter: I know you're desperate for data, and Claude is really smart, but do you really need to attack our servers a million times in 24 hours? Not only do you steal our content without paying, but you also consume our development and operational resources, which is really uncool.
Website logs show that ClaudeBot accesses iFixit thousands of times per minute, negatively impacting iFixit's servers because such scraping not only consumes server CPU resources but also consumes network bandwidth, which no site wants to see.
In an interview with 404media, iFixit stated:
We are the world's largest database of repair information. If they take away all the information without permission and cause our servers to crash... iFixit currently has millions of links, including various repair guides, repair revision history, blogs, news posts, research, forums, community-contributed repair guides, and Q&A, etc.
In response to the complaint, Anthropic's support team did not apologize and responded as follows:
In accordance with industry standards, Anthropic uses various data sources for model development, such as publicly available data on the internet collected through web crawlers. Our crawling should not be intrusive or destructive, and our aim is to respect crawl delays where appropriate to minimize disruption.
The simplest way for websites is to directly block the Claude crawler. Landian.news also faced a DDoS attack from the Claude crawler, which indeed fetched at a frequency of thousands of times per minute, impacting Landian.news server, so we blocked the Claude crawler early on.
To block, you can add the following content to your robots.txt:
- User-agent: ClaudeBot
- Disallow: /
Of course, to be safe, we also used regular expressions on Nginx to match the ClaudeBot crawler. If the ClaudeBot crawler does not comply with the robots.txt protocol and continues to fetch, it can be directly intercepted.
To ensure the crawler can't fetch the robots.txt file, it's recommended for webmasters to update the robots.txt first. If the website logs still show records of ClaudeBot fetching non-robots.txt files a few days later, it means the protocol has not been complied with, and you can directly return HTTP 444 with Nginx to drop the connection and reduce server load.