Strong communities are built on shared goals and trust.
  • Join Administrata today and get 15 free posts!

    Register now and claim a free content order to boost your community activity instantly.

    Register Now

Do you block AI crawlers from indexing your website?

Cpvr

Community Advisor
Community Moderator
Rating - 0%
Blocking AI crawlers from indexing your site can prevent it from appearing in certain AI-driven searches, particularly ChatGPT, now that they’ve rolled out their search engine nationwide. It’s also a good idea to set up Bing webmaster’s tools, so Chatgpt can index your content faster.

However, some AI bots, like Bytedance, don’t have a traditional search engine but do operate web crawlers that collect data. Similarly, Facebook has an AI spider that crawls and indexes website content, and they’re reportedly planning to launch their own search engine in the future. This means your content could appear in Facebook’s AI-generated responses. If you don’t want them accessing your site, it’s best to block their crawler.

Twitter also has an AI bot that feeds its AI system, Gonk.

So, do you allow AI crawlers to index your content? If so, which AI bots do you permit, and which ones do you block?
 
I personally don't really see any reason to block AI crawlers from indexing sites, unless the crawlers are operating to such a degree that they're affecting site performance, which is unlikely to be the case.

I think we'd all like AI to become better, more advanced and more useful over time, so I'd say there's somewhat of a responsibility on us as admins to allow our sites to be indexed to help improve such technologies if it doesn't cause us issues ourselves.
 
With the direction AI is moving and the integration of AI into search engines, I can't imagine how blocking these crawlers would be a good idea. Content that I don't want displayed publicly will be locked down. Building on fdk's remarks, I think our responsibility as admins is even greater than simply allowing crawlers to crawl. Our responsibility is to ensure quality content so AI learns correctly (or as correctly as possible.) I can imagine how damaging it would be if AI attributed misinformation to your project.
 
I do block them. Common ones at robots.txt and then I applied Cloudflare to block their known AI bots at the zone level. I consider them non-attributing content scrapers.

edit to add: That's why they call it "Training" for the AI. It uses your stuff to train itself to "know it".
 
Last edited:
I have never done this and probably I will never do this. Search engines are using AI overview in their search results, and by allowing AI crawlers I will have a chance to get quoted on AI over view.
 
I have never done this and probably I will never do this. Search engines are using AI overview in their search results, and by allowing AI crawlers I will have a chance to get quoted on AI over view.
I do not mind standard AI searches from the large search engines. They don't send out AI scrapers. They already know your content by means of indexing. And their AI insights are typically alongside their SERPs

AI scrapers usually only scrape-to-teach. Oh sure they announce maybe a search engine in the future. But that's not my point. My point is they aren't giving search results alongside, just only their 'insights' as if it was theirs, with zero attribution.

So I don't like them patronizing my site.
 

Users who are viewing this thread

Back
Top