Here is how to block OpenAI from using your web content for ChatGPT



summary
Summary

OpenAI’s GPTBot crawls the web for content that can be used by AI models. If you do not want this, you can block the bot.

The content that GPTBot visits can be used to improve future AI models, according to OpenAI. Those who give GPTBot access to their content are helping to make AI models more accurate, capable, and safe, the company writes.

Block GPTBot from crawling your site

If you do not want to share your content with OpenAI’s models for free, you can block GPTBot. By configuring “User-agent: GPTBot,” you can either block the bot from visiting your site altogether or from visiting individual folders or categories on your site. Similar to blocking a Google crawler, you can control GPTBot by adding it to your robots.txt with the following commands

User agent token: GPTBot
Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +

Ad

User-agent: GPTBot
Disallow: /

Example:
User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/

According to OpenAI, content behind paywalls, pages that request personal identification information, or that violate OpenAI’s content guidelines are automatically filtered out. Full instructions are available here.

ChatGPT and the Content Dilemma

With the launch of ChatGPT’s web browsing feature, OpenAI announced that website owners such as publishers could block the crawling bot if they did not want their content to be used within or for ChatGPT.

Blocking the bot, however, means not being present in a potentially emerging content ecosystem – a dilemma similar to (non-)indexing in Google search, where content providers inadvertently become both suppliers to and financially dependent on a third-party ecosystem.

Recommendation

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top