One major question is whether an organization should block their site in some way to prevent their site content from showing in the SGE answer.
Answer: To put it bluntly, there is no easy way to do this without harming your site. To remove your content from AI Overviews, you need to block Googlebot itself (not just Google-Extended), which would result in your site no longer ranking and losing all of your organic traffic from Google.
If there is content that you don’t want in AI Overviews, and you don’t want it to rank either, we recommend blocking those individual pages. If you have several pages you wish to block, create a subfolder containing all of them; this folder can easily be blocked in robots.txt without impacting the entire site.
To clarify, let’s look at where Google gets the information it uses to generate AI Overview answers.
How do Google’s AI Overviews get information?
AI Overviews don’t use a special user agent to crawl and fetch data; it relies on Googlebot. When you consider that Google already has all of this information thanks to Googlebot’s continual crawling of sites, it makes sense.
Additionally, relying on Googlebot limits the control websites can place over how the overviews gain information. If users could block Google from using their information, their AI tools would potentially be quickly hamstrung and couldn’t possibly work as well as Google would like. By relying on Googlebot, Google can acquire as much information as possible without fear of missing out, as blocking Googlebot entirely would mean your site would lose organic traffic and rankings.
This ultimately means that AI Overviews can surface any information about your site that Googlebot can normally access. The only way to hide information is to block Googlebot from accessing those pages via a robots.txt rule.
What if we have proprietary information on our site that we don't want available to the general public?
Proprietary information would not typically be available on a company’s public website. AI Overviews are powered by the information that Googlebot crawls. The only information it factors in is what you put on your website and allow to be crawled (plus anything others write about your brand, but we’re not addressing that here).
Again, if you have content on specific pages you don’t want to be seen by Google, you can block Googlebot from accessing those specific pages through robots.txt rules that prevent crawling.
Be very careful with this, though. Double-check to make sure those pages aren’t indexed—if they are, and they are ranking, you need to put a noindex tag on them first and wait for Google to remove them. Once that’s done, you’re safe to put in the robots.txt block. Remember, this will only apply to those specific pages, and you don’t want to do this to your whole site or your important pages.