OpenAI’s regulatory troubles are just beginning

Photo of author

By Webdesk

OpenAI managed to satisfy Italian data authorities and lift the country’s effective ban on ChatGPT last week, but the battle against European regulators is far from over.

Earlier this year, OpenAI’s popular and controversial ChatGPT chatbot ran into a major legal snag: an effective ban in Italy. Italy’s data protection authority (GPDP) accused OpenAI of violating EU data protection rules, and the company agreed to limit access to the service in Italy while it attempted to resolve the issue. On April 28, ChatGPT returned to the country, with OpenAI lightly addressing GPDP’s concerns without making major changes to its service – an apparent victory.

The GPDP has said it “welcomes” the changes ChatGPT has made. The company’s legal troubles — and those of companies building similar chatbots — are likely just getting started. Regulators in several countries are looking into how these AI tools collect and produce information, citing a range of concerns from unlicensed companies collecting training data to chatbots’ tendency to spread misinformation. In the EU, they apply the General Data Protection Regulation (GDPR), one of the world’s strongest legal privacy frameworks, the effects of which are likely to extend far beyond Europe. Meanwhile, lawmakers on the bloc are drafting a law that will specifically target AI — likely ushering in a new era of regulation for systems like ChatGPT.

ChatGPT’s various misinformation, copyright, and data protection issues have put a target on its back

ChatGPT is one of the most popular examples of generative AI – a general term for tools that produce text, images, video and audio based on user prompts. The service reportedly became one of the fastest growing consumer applications in history after reaching 100 million monthly active users just two months after its launch in November 2022 (OpenAI has never confirmed these numbers). People use it to translate text into different languages, write college essays and generate code. But critics, including regulators, have pointed to ChatGPT’s unreliable output, confusing copyright issues, and shady data protection practices.

Italy was the first country to take a step. On March 31, the company highlighted four ways OpenAI violated the GDPR according to the company: allowing ChatGPT to provide false or misleading information, failing to notify users of its data collection practices, failing to comply with one of six possible legal justifications for the processing personal data, and not adequately preventing children under the age of 13 from using the service. It ordered OpenAI to immediately stop using personal information collected from Italian citizens in its training data for ChatGPT.

No other country has taken such measures. But since March, at least three EU countries — Germany, France, and Spain — have launched their own investigations into ChatGPT. Meanwhile, on the other side of the Atlantic, Canada is reviewing privacy concerns under its Personal Information Protection and Electronic Documents Act, or PIPEDA. The European Data Protection Board (EDPB) has even set up a special task force to help coordinate investigations. And if these agencies demand changes from OpenAI, they could affect how the service works for users around the world.

Regulators’ concerns can be broadly divided into two categories: where ChatGPT’s training data comes from and how OpenAI delivers information to its users.

ChatGPT uses OpenAI’s GPT-3.5 or GPT-4 large language models (LLMs), which are trained on large amounts of human-produced text. OpenAI is coy about exactly what training text is used, but says it is based on “a variety of licensed, crafted, and publicly available data sources, which may include publicly available personal information.”

This potentially creates huge problems under the GDPR. The law came into effect in 2018 and covers any service that collects or processes data from EU citizens, regardless of where the responsible organization is located. GDPR rules require companies to have explicit consent before collecting personal data, have a legal justification for why it is being collected, and be transparent about how it is used and stored.

European regulators argue that the secrecy surrounding OpenAI’s training data means there is no way to confirm whether the personal information entered into it was initially given with the user’s consent, and the GPDP specifically argued that OpenAI was “in the first place” no legal basis” for collecting it. . OpenAI and others have gotten away with little research so far, but this claim adds a big question mark to future data scraping efforts.

Then there’s the GDPR’s “right to be forgotten,” which allows users to demand that companies correct or completely delete their personal data. OpenAI has preemptively updated its privacy policy to accommodate those requests, but there has been debate over whether it’s technically possible to handle them given how complex it can be to separate specific data once it’s turned into these big language models.

OpenAI also collects information directly from users. Like any internet platform, it collects a set of standard user data (e.g. name, contact details, card details, etc.). But more importantly, it records interactions users have with ChatGPT. As mentioned in a FAQ, this data can be reviewed by OpenAI’s staff and used to train future versions of the model. Given the intimate questions people ask ChatGPT – using the bot as a therapist or doctor – that means the company collects all kinds of sensitive data.

At least some of this data may have been collected from minors, because while OpenAI’s policy states that it “does not knowingly collect personal information from children under the age of 13,” there is no strict age verification gate. That doesn’t sit well with EU rules, which prohibit the collection of data from people under the age of 13 and (in some countries) require parental consent for minors under the age of 16. absolutely inappropriate reactions with regard to their degree of development and self-awareness.”

OpenAI retains wide latitude to use that data, which has worried some regulators, and storing it poses a security risk. Companies like Samsung and JPMorgan have banned employees from using generative AI tools for fear they will upload sensitive data. And in fact, Italy announced its ban shortly after ChatGPT suffered a serious data breach, exposing users’ chat history and email addresses.

ChatGPT’s propensity to provide false information can also pose a problem. The GDPR regulation stipulates that all personal data must be accurate, something the GPDP emphasized in its announcement. Depending on how that’s defined, it can spell trouble for most AI text generators, which are prone to “hallucinations”: a cute industry term for factually incorrect or irrelevant answers to a question. This has already had some real ramifications elsewhere, as a regional Australian mayor has threatened to sue OpenAI for defamation after ChatGPT falsely claimed he served time in prison for bribery.

ChatGPT’s popularity and current dominance in the AI ​​market make it a particularly attractive target, but there’s no reason why its competitors and collaborators, such as Google with Bard or Microsoft with its OpenAI-powered Azure AI, shouldn’t be scrutinized as well will be taken. Before ChatGPT, Italy banned chatbot platform Replika from collecting information on minors – and it has remained banned until now.

While GDPR is a powerful set of laws, it was not created to address AI-specific issues. Arranges that Doing, However, may be on the horizon.

In 2021, the EU submitted its first draft of the Artificial Intelligence Act (AIA), legislation that will work alongside the GDPR. The law regulates AI tools based on their perceived risk, from “minimal” (things like spam filters) to “high” (AI tools for law enforcement or education) or “unacceptable” and therefore prohibited (like a social credit system). After the explosion of large language models like ChatGPT last year, lawmakers are now racing to add rules for “base models” and “General Purpose AI Systems (GPAIs)” — two terms for large-scale AI systems that include LLMs — and possibly classify them as high-risk services.

The provisions of the AIA go beyond data protection. A recently proposed amendment would force companies to disclose all copyrighted material used to develop generative AI tools. That could one day expose classified datasets and leave more companies vulnerable to infringement proceedings, which are already affecting some services.

Laws specifically designed to regulate AI may not take effect in Europe until late 2024

But passing it can take a while. EU lawmakers reached a tentative AI Act deal on April 27. A committee will vote on the draft on May 11, and the final proposal is expected in mid-June. Subsequently, the European Council, the Parliament and the Commission will have to resolve any remaining disputes before the law is implemented. If everything goes smoothly, it could be adopted in the second half of 2024, slightly behind the official target of the May 2024 European elections.

For now, the row between Italy and OpenAI offers an early look at how regulators and AI companies might negotiate. The GPDP offered to lift the ban if OpenAI met several proposed resolutions by April 30. That included informing users how ChatGPT stores and processes their data, asking for explicit permission to use this data, facilitating requests to correct or remove false personal information generated by ChatGPT, and requiring Italian users to confirm they are over 18 when registering for an account. OpenAI did not catch on all of those provisions, but it complied enough to appease Italian regulators and get access to ChatGPT reinstated in Italy.

OpenAI still has goals to meet. It has until September 30 to create a tougher age limit to keep out minors under 13 and require parental consent for older teen minors. If it fails, it may see itself blocked again. But it is an example of what Europe considers acceptable behavior for an AI company – at least until new laws are on the books.

Source link

Share via
Copy link