Researchers Propose a Better Way to Report Dangerous AI Flaws

At the end of 2023, a team of third -party researchers discovered a worrying mistake in Openai’s widespread artificial intelligence Model GPT-3.5.

When the model was asked to repeat certain words a thousand times, it began to repeat the word over and over again, then suddenly Switch on the spit out Incohere text and excerpts of personal data from the training data, including parts of names, telephone numbers and e -mail addresses. The team that discovered the problem with Openaai to ensure that the error was fixed before it publicly revealed. It is just one of problems of problems that have been found in large AI models in recent years.

In A Passion published todayMore than 30 prominent AI researchers, including some who found the GPT 3.5 error, say that many other weak points that affect popular models are reported in a problematic way. They propose a new program supported by AI companies that give outsiders permission to examine their models and the possibility of publicly disclosed errors.

“At the moment it’s a bit of the Wild West,” says Shayne LongpresA doctoral student on and the main author of the proposal. According to Longpre, some so-called jailbreaker share their methods to break the social media platform X, which endangers models and users. Other jailbreaks are shared with only one company, although they could affect many. And some defects, he says, are kept secret because they fear that they are forbidden or to face the persecution because they violate the terms of use. “It is clear that there are terrifying effects and uncertainties,” he says.

The safety and security of AI models is very important because the technology is now used and how it can penetrate countless applications and services. Powerful models must be stressed or red, since they can have harmful distortions and since certain entries can lead them to it Wicherds freed And produce unpleasant or dangerous answers. This includes encouraging users in need of protection to do harmful behavior or to help a bad actor to develop cyber, chemical or biological weapons. Some experts fear that models could help cybercriminals or terrorists and can even even Turn on people how they progress.

The authors propose three main measures to improve third-party disclosure process: the introduction of standardized AI error reports to optimize the reporting process; So that large AI companies provide infrastructure for researchers from third-party providers that disclose errors; and for the development of a system that can be used to shared the error between different providers.

The approach is borrowed from the cyber security world, where there are legal protective measures and defined standards for external researchers to disclose errors.

“Ki researchers do not always know how to disclose an error and cannot be sure that their disclosure of their good faith is not exposed to the legal risk,” says Ilona Cohen, Chief Legal and Policy Officer HackeroneA company that organizes bug bounties and a co -author in the report.

Large AI companies are currently carrying out extensive security tests for AI models before their publication. Some also close with external companies to carry out further investigations. “Are there enough people in this [companies] To address all problems with general AI systems used by hundreds of millions of people in applications that we have never dreamed of? “Longpre asks. Some AI companies have started organizing Ki -Bug -Bounties. According to Longpre, however, independent researchers risk breaking the terms of use if they take on performance to examine powerful AI models.

Leave a ReplyCancel Reply

Trending now