• Join Administrata today and get 15 free posts!

    Register now and claim a free content order to boost your community activity instantly.

    Register Now

Cloudflare builds an AI to lead AI scraper bots into a horrible maze of junk content

Cpvr

Community Advisor
Moderator
Cloudflare has created a bot-busting AI to make life hell for AI crawlers.

The network-taming company built the tool after noticing that almost one percent of all requests to access web content that it can see now come from AI crawler bots. Those bots are probably scraping data that’s gathered up to train AI models.

Web site operators can in theory block AI crawlers using various means such as a robots.txt file or changing web server settings to disallow visits from bots. Some even use CAPTCHAs to test whether visitors to a site are human, or adopt software designed to stymie bots.


In reality crawler operators ignore the instructions in robots.txt files, or work around CAPTCHAs and web server settings. The result is a lot of unwanted crawler traffic consuming resources, and info fed into training data without creators’ permission – a contentious practice currently being tested in court amidst allegations of copyright abuse.

No human would go four links deep into a maze of AI-generated nonsense
Cloudflare’s response is to let crawler bots in and use generative AI to create junk content for them to devour in what the company has termed an “AI Labyrinth”.

“When we detect unauthorized crawling, rather than blocking the request, we will link to a series of AI-generated pages that are convincing enough to entice a crawler to traverse them,” explained Cloudflare’s Reid Tatoris, Harsh Saxena, and Luis Miglietti. Cloudflare uses its own serverless Workers to create the content.

The trio wrote that the content is “real looking” but “not actually the content of the site we are protecting, so the crawler wastes time and resources.” The content is also “real and related to scientific facts” because Cloudflare doesn’t want to inadvertently create misinformation.

The AI slop is also designed not to mess with sites’ reputations or search engine optimization efforts.

It is, however, designed to act as a deterrent to crawler operators, by keeping their bots busy and thereby increasing the cost of operating content scrapers.

“I wanted to do it,” he said. “AI is the reason we're not. I mean, terribly sadly, it's just too much of an X-ray and too easily absorbed.”

“Why help the fucking robots any more than you can?”

“So, it was an ego thing. It was vanity that makes you want to do it, and the downside is real. So, vanity loses.”
Cloudflare thinks this stuff is also a useful tool to detect bot activity.

“No real human would go four links deep into a maze of AI-generated nonsense,” Cloudflare’s trio wrote. “Any visitor that does is very likely to be a bot, so this gives us a brand-new tool to identify and fingerprint bad bots, which we add to our list of known bad actors.”

This sort of thing usually creates an arms race and Cloudflare is already thinking about what it will take to stay ahead.


“In the future, we’ll continue to work to make these links harder to spot and make them fit seamlessly into the existing structure of the website they’re embedded in,” its authors wrote.

Cloudflare customers can enable the AI Labyrinth in their management consoles.

Source: https://www.theregister.com/2025/03/21/cloudflare_ai_labyrinth/

 

Users who are viewing this thread

Back
Top