shutterstock 2345835809 450x300 JYXIT7

Several French media block OpenAI’s GPTBot over data collection concerns

​ ​ 

Following steps by many English-language media, a series of French media groups including Radio France and France24 have decided to block a feature by OpenAI’s GPTBot from collecting their content online.

Artificial intelligence (AI) research and deployment company OpenAI is best known as the creator of ChatGPT, the generative AI tool that made a splash following its launch in November 2022, gathering over 100 million users in its first two months of public release.

GPTBot is the Microsoft-backed company’s web crawler, which scrapes publicly accessible data online to feed into efforts to improve ChatGPT’s accuracy – which may include copyrighted material. The chatbot uses a deep-learning language model for language processing and text generation.

A blog post by OpenAI says that “allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety”. On 8 August, the company announced that the tool will automatically collect data from the entire internet, to train its GPT-4 and GPT-5 models.

However, according to the same blog post, it will filter out paywall-restricted sources, any source that violates OpenAI’s policies, or those that gather personally identifiable information. The latter refers to any type of information that can be linked to an individual and can reveal their identity.

France says no

Radio France and TF1 have now blocked the tool from gathering data from their websites (did they announce this? If so, when/where?). However, they are not the first to do so: according to the French newspaper Les ?chos, all the France M?dias Monde websites, such as,, or, also blocked GPTBot.

Vincent Fleury, Director of Digital Environments at France M?dias Monde, told EURACTIV that they made the decision because “as a public service, we invest money and people in creating content. We don’t want our data to train the model for free. We don’t want OpenAI to allow other businesses to create value with our content […] without getting something in return.”

He also said that they do not want their content to be associated with incorrect responses that may be given by the chatbot. Fleury added that this is a preventative measure and that they would like to reach an agreement in the future.

Les ?chos also reported that Le Monde contacted OpenAl and Google (because of its rival AI chatbot, Bard) to start negotiations. According to the same article, the Vice President of the Alliance de la Presse d’Information G?n?rale also expressed that he was in favour of a ‘new deal’ with AI companies.

Moreover, Les ?chos mentioned that newspaper Le Figaro said they are also looking forward to an agreement with platforms – however, if one cannot be reached, they are also planning to block access.

Previously, The New York Times, CNN, Reuters, Chicago Tribune, ABC (the Australian Broadcasting Corporation), and other Australian Community Media brands such as the Canberra Times and the Newcastle Herald, have all disallowed the tool.

A Reuters spokesperson said that since “intellectual property is the lifeblood of our business, it is imperative that we protect the copyright of our content”.

OpenAI first clashed with regulators in March, when the Italian data regulator Garante temporarily shut the chatbot down domestically, accusing the company of flouting European privacy rules. ChatGPT returned to Italy after OpenAI instituted new privacy measures for users.

Following this decision, the European Data Protection Board, which gathers all EU data regulators, established a task force to ensure consistent enforcement in April.

In May, the French data protection watchdog, the National Commission on Informatics and Liberty, also published an action plan addressing privacy concerns related to Artificial Intelligence, particularly generative applications like ChatGPT.

[Edited by Nathalie Weatherald]

Read more with EURACTIV



Leave a Reply

Your email address will not be published. Required fields are marked *