AI Safety Evaluations, Some Issues Exist

August 14, 2024

Ah, corporate self regulation. What could go wrong? Well, as TechCrunch reports, “Many Safety Evaluations for AI Models Have Significant Limitations.” Writer Kyle Wiggers tells us:

“Generative AI models … are coming under increased scrutiny for their tendency to make mistakes and generally behave unpredictably. Now, organizations from public sector agencies to big tech firms are proposing new benchmarks to test these models’ safety. Toward the end of last year, startup Scale AI formed a lab dedicated to evaluating how well models align with safety guidelines. This month, NIST and the U.K. AI Safety Institute released tools designed to assess model risk. But these model-probing tests and methods may be inadequate. The Ada Lovelace Institute (ALI), a U.K.-based nonprofit AI research organization, conducted a study that interviewed experts from academic labs, civil society and those who are producing vendors models, as well as audited recent research into AI safety evaluations. The co-authors found that while current evaluations can be useful, they’re non-exhaustive, can be gamed easily and don’t necessarily give an indication of how models will behave in real-world scenarios.”

There are several reasons for the gloomy conclusion. For one, there are no established best practices for these evaluations, leaving each organization to go its own way. One approach, benchmarking, has certain problems. For example, for time or cost reasons, models are often tested on the same data they were trained on. Whether they can perform in the wild is another matter. Also, even small changes to a model can make big differences in behavior, but few organizations have the time or money to test every software iteration.

What about red-teaming: hiring someone to probe the model for flaws? The low number of qualified red-teamers and the laborious nature of the method make it costly, out of reach for smaller firms. There are also few agreed-upon standards for the practice, so it is hard to assess the effectiveness of red-team projects.

The post suggests all is not lost—as long as we are willing to take responsibility for evaluations out of AI firms’ hands. Good luck prying open that death grip. Government regulators and third-party testers would hypothetically fill the role, complete with transparency. What a concept. It would also be good to develop standard practices and context-specific evaluations. Bonus points if a method is based on an understanding of how each AI model operates. (Sadly, such understanding remains elusive.)

Even with these measures, it may never be possible to ensure any model is truly safe. The write-up concludes with a quote from the study’s co-author Mahi Hardalupas:

“Determining if a model is ‘safe’ requires understanding the contexts in which it is used, who it is sold or made accessible to, and whether the safeguards that are in place are adequate and robust to reduce those risks. Evaluations of a foundation model can serve an exploratory purpose to identify potential risks, but they cannot guarantee a model is safe, let alone ‘perfectly safe.’ Many of our interviewees agreed that evaluations cannot prove a model is safe and can only indicate a model is unsafe.”

How comforting.

Cynthia Murrell, August 14, 2024

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta