An emerging chatbot ecosystem builds on existing web content and could displace traditional websites. At the same time, licensing and financing are largely unresolved.
OpenAI offers publishers and website operators an opt-out if they prefer not to make their content available to chatbots and AI models for free. This can be done by blocking OpenAI’s web crawler “GPTBot” via the robots.txt file. The bot collects content to improve future AI models, according to OpenAI.
Major media companies including the New York Times, CNN, Reuters, Chicago Tribune, ABC, and Australian Community Media (ACM) are now blocking GPTBot. Other web-based content providers such as Amazon, Wikihow, and Quora are also blocking the OpenAI crawler.
According to an analysis by Originality.ai, 9.2 percent of the top 1000 websites were blocking GPTBot at the end of August, with a weekly growth rate of five percent. Out of 759 robots.txt files analyzed, 69 had the block installed. Among the top 100 sites, the blocking percentage is 15 percent.
The largest German news portals Bild.de, t-online.de and n-tv.de have not yet blocked GPTbot. Spiegel Online still allows OpenAI on its site. Other online news portals such as sueddeutsche.de, zeit.de and welt.de have modified their robots.txt to exclude GPTBot. The German public broadcaster SWR also blocks GPTbot.
Chatbots vs. WWW
Blocking the GPTBot is only half the battle: blocking the ChatGPT user agent may be more relevant. This is because ChatGPT plugins like OpenAI’s browsing feature use it to access web pages, pull content from a web page into the chat, and discuss it there.
This removes the click-through to the website and thus the monetization – a direct loss for the website operator, even if the content is not stored long-term and used for AI training. So in most cases, anyone who blocks GPTBot should also have an interest in blocking the ChatGPT user agent.
On the other hand, OpenAI is on the retreat in AI browsing anyway. Officially, because it allows paywalls to be circumvented, an unintended side effect. Unofficially, the unresolved rights situation in the direct processing of third-party content probably plays a bigger role.
Nevertheless, Microsoft continues to offer Bing Chat, with slightly reformulated website content in the chat window. Google’s AI search, which is currently being tested, also uses similar methods.
None of the major AI companies has yet presented a blueprint for how the WWW content ecosystem will not fall victim to the success of chatbots. So far, company leaders like Microsoft’s Satya Nadella have only paid lip service.
The whole legal situation will probably have to be settled in court, most likely between the big publishers and the big AI companies like Google, Microsoft, and OpenAI. The New York Times is said to be preparing a lawsuit against OpenAI that could set the trend for the entire industry.