Cloudflare's new free tool stops bots from scraping your website content to train AI

AI bots accessed around 39% of the top one million 'internet properties' using Cloudflare in June of 2024, according to the company.
Written by Artie Beaty, Contributing Writer
AI bot concept
Andriy Onufriyenko/Getty Images

If you're worried about AI bots scraping your website content to train AI, Cloudflare can help you fight back.

The company, which claims to proxy about 20% of the web, has introduced a new tool that blocks all AI bots from scraping a site's text. Cloudflare says the tool is available to all customers, even those on the free tier.

Also: Do you still need to pay for antivirus software in 2024?

With the rise in generative AI, companies need content to train chatbots. Many are turning to web scrapers that pull text from sites for analysis (like ChatGPT is doing with your Reddit posts). Some companies are upfront and honest about web-scraping bots, but some aren't.

Cloudflare released a feature last September for users to block "bad" AI web crawlers, or ones that scrape sites without permission. Naturally, some companies found a way around this by having scrapers that pretend to be authentic ones. That's why this new tool blocks all AI crawlers, even ones that follow proper protocol for scraping.

For June 2024, AI bots accessed around 39% of the top one million "internet properties" using Cloudflare, the company said. Less than 3% of those properties took measures to block AI bots. According to Cloudflare, the top four bots scraping its sites were Bytespider, Amazonbot, ClaudeBot, and GPTBot. 

Bytespider, owned by Bytedance, the company that owns TikTok, is used to gather training data for its large language models, including ChatGPT rival Doubao. Amazonbot is used to train the question-answering side of Alexa, ClaudeBot trains Claude AI, and GPTBot trains ChatGPT.

Also: 5 ways Amazon can make an AI-powered Alexa subscription worth the cost

If you're a Cloudflare user, using the tool is simple. Just head to the settings section of your dashboard, then click "Security" and "Bots." From there, you'll see a toggle button labeled "AI Scrapers and Crawlers." Turn it on, and AI bots will no longer have access to your content.

Of course, AI bots are constantly evolving. Cloudflare says this feature will automatically evolve too as it detects the "fingerprints" of offending bots.

The new tool is available now for all Cloudflare users starting today. 

Editorial standards