• Join Administrata today and get 15 free posts!

    Register now and claim a free content order to boost your community activity instantly.

    Register Now

Do you block AI crawlers from indexing your website?

Cpvr

Community Advisor
Moderator
Blocking AI crawlers from indexing your site can prevent it from appearing in certain AI-driven searches, particularly ChatGPT, now that they’ve rolled out their search engine nationwide. It’s also a good idea to set up Bing webmaster’s tools, so Chatgpt can index your content faster.

However, some AI bots, like Bytedance, don’t have a traditional search engine but do operate web crawlers that collect data. Similarly, Facebook has an AI spider that crawls and indexes website content, and they’re reportedly planning to launch their own search engine in the future. This means your content could appear in Facebook’s AI-generated responses. If you don’t want them accessing your site, it’s best to block their crawler.

Twitter also has an AI bot that feeds its AI system, Gonk.

So, do you allow AI crawlers to index your content? If so, which AI bots do you permit, and which ones do you block?
 
I personally don't really see any reason to block AI crawlers from indexing sites, unless the crawlers are operating to such a degree that they're affecting site performance, which is unlikely to be the case.

I think we'd all like AI to become better, more advanced and more useful over time, so I'd say there's somewhat of a responsibility on us as admins to allow our sites to be indexed to help improve such technologies if it doesn't cause us issues ourselves.
 
With the direction AI is moving and the integration of AI into search engines, I can't imagine how blocking these crawlers would be a good idea. Content that I don't want displayed publicly will be locked down. Building on fdk's remarks, I think our responsibility as admins is even greater than simply allowing crawlers to crawl. Our responsibility is to ensure quality content so AI learns correctly (or as correctly as possible.) I can imagine how damaging it would be if AI attributed misinformation to your project.
 
I do block them. Common ones at robots.txt and then I applied Cloudflare to block their known AI bots at the zone level. I consider them non-attributing content scrapers.

edit to add: That's why they call it "Training" for the AI. It uses your stuff to train itself to "know it".
 
Last edited:
I have never done this and probably I will never do this. Search engines are using AI overview in their search results, and by allowing AI crawlers I will have a chance to get quoted on AI over view.
 
I have never done this and probably I will never do this. Search engines are using AI overview in their search results, and by allowing AI crawlers I will have a chance to get quoted on AI over view.
I do not mind standard AI searches from the large search engines. They don't send out AI scrapers. They already know your content by means of indexing. And their AI insights are typically alongside their SERPs

AI scrapers usually only scrape-to-teach. Oh sure they announce maybe a search engine in the future. But that's not my point. My point is they aren't giving search results alongside, just only their 'insights' as if it was theirs, with zero attribution.

So I don't like them patronizing my site.
 
I have never done this and probably I will never do this. Search engines are using AI overview in their search results, and by allowing AI crawlers I will have a chance to get quoted on AI over view.
I don’t either. I’d rather keep my content available for the AI crawlers as long it shows up on the ai search engines.

There a lot of people are using Chatgpt and a lot of AI systems to find information & by blocking them, we’re leaving a direct source of traffic off the table.

Chatgpt is on the rise as is perplexity. Both systems link to sites with their sources.


Google and Bing aren’t the only search engines around these days, so it’s a good idea that we focus on all the search engines(including ai overviews) if possible.


LLMs Drive More Website Traffic and Brand Awareness​

Yes, search behavior is happening more and more on AI platforms. But that doesn’t mean brands can’t benefit from these conversations.

For example, ChatGPT isn't just answering questions anymore—it's actually sending significant traffic to websites.

The numbers tell us an interesting story.

According to Semrush's analysis of 80 million clickstream records, the number of domains getting traffic from ChatGPT jumped from 10,000 to over 30,000 in just a few months during 2024. That's a huge shift in how content gets discovered.

AD_4nXf2JyEubPNA3vi8SAm7VvZ7x2K0YSDCiPRymOAaD9CH7ELqr7v04NcTqseF3pfr40AhCAzYY75OJduLnIJW4xgWh1R-JY7UPdkgMTn01csWBpydVeBpmJ2zuhcupAl65dFj5n1AzQ?key=m1yVKAeCCGU5s9kBkX8rhj3Q

What's even more interesting is how people use ChatGPT differently from how they use Google.

In our research, about 70% of ChatGPT queries were completely unique—things people wouldn't typically search for on Google. We're seeing a whole new way of discovering content.

AD_4nXe3zVXP5pIIGPoZ4wBHZHi6lvBUAubxPAI29TPUWPllcalYKCz9LON2rvmuDWhhs92BpN7HFSpwkr4DusFG6TWu4Qugql5sdB7UdE0V4_6unj8YRVu7kbHcIa9YtL3Rf9V9CHe4?key=m1yVKAeCCGU5s9kBkX8rhj3Q
 
They're blocked by default according to the universal robots.txt that all Jcink forums have.
 
The only traffic increase is the ai bots imo. And with an accuracy rate around 60%, that would be misdirected traffic in 40% of the cases, if there were traffic that resulted.

Until i see definitive results that directly produced real targeted traffic to websites, I am out. So sayeth my robots.txt and cloudflare filters.
 
Sorry for the addon post,...

But how many active domains are there?
The chart above represents 30k unique domains. Thats barely a fraction.
 

Users who are viewing this thread

Back
Top