A Reliability Test for General-Purpose AI

August 1, 2024

A team of researchers has developed a valuable technique: “How to Assess a General-Purpose AI Model’s Reliability Before It’s Deployed.” The ScienceDaily article begins by defining foundation models—the huge, generalized deep-learning models that underpin generative AI like ChatGPT and DALL-E. We are reminded these tools often make mistakes, and that sometimes these mistakes can have serious consequences. (Think self-driving cars.) We learn:

“To help prevent such mistakes, researchers from MIT and the MIT-IBM Watson AI Lab developed a technique to estimate the reliability of foundation models before they are deployed to a specific task. They do this by considering a set of foundation models that are slightly different from one another. Then they use their algorithm to assess the consistency of the representations each model learns about the same test data point. If the representations are consistent, it means the model is reliable. When they compared their technique to state-of-the-art baseline methods, it was better at capturing the reliability of foundation models on a variety of downstream classification tasks. Someone could use this technique to decide if a model should be applied in a certain setting, without the need to test it on a real-world dataset. This could be especially useful when datasets may not be accessible due to privacy concerns, like in health care settings. In addition, the technique could be used to rank models based on reliability scores, enabling a user to select the best one for their task.”

Great! See the write-up for the technical details behind the technique. This breakthrough can help companies avoid mistakes before they launch their products. That is, if they elect to use it. Will organizations looking to use AI for cost cutting go through these processes? Sadly, we suspect that, if costs go down and lawsuits are few and far between, the AI is deemed good enough. But thanks for the suggestion, MIT.

Cynthia Murrell, August 1, 2024

Written by Stephen E. Arnold · Filed Under AI, Business process, News

Comments

Got something to say?

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.