AI Maybe Should Not Be Accurate, Correct, or Reliable?

September 26, 2024

green-dino_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Okay, AI does not hallucinate. “AI” — whatever that means — does output incorrect, false, made up, and possibly problematic answers. The buzzword “hallucinate” was cooked up by experts in artificial intelligence who do whatever they can to avoid talking about probabilities, human biases migrated into algorithms, and fiddling with the knobs and dials in the computational wonderland of an AI system like Google’s, OpenAI’s, et al. Even the book Why Machines Learn: The Elegant Math Behind Modern AI ends up tangled in math and jargon which may befuddle readers who stopped taking math after high school algebra or who has never thought about Orthogonal matrices.

The Next Web’s “AI Doesn’t Hallucinate — Why Attributing Human Traits to Tech Is Users’ Biggest Pitfall” is an interesting write up. On one hand, it probably captures the attitude of those who just love that AI goodness by blaming humans for anthropomorphizing smart software. On the other hand, the AI systems with which I have interacted output content that is wrong or wonky. I admit that I ask the systems to which I have access for information on topics about which I have some knowledge. Keep in mind that I am an 80 year old dinobaby, and I view “knowledge” as something that comes from bright people working of projects, reading relevant books and articles, and conference presentations or meeting with subjects far from the best exercise leggings or how to get a Web page to the top of a Google results list.

Let’s look at two of the points in the article which caught my attention.

First, consider this passage which is a quote from and AI expert:

“Luckily, it’s not a very widespread problem. It only happens between 2% to maybe 10% of the time at the high end. But still, it can be very dangerous in a business environment. Imagine asking an AI system to diagnose a patient or land an aeroplane,” says Amr Awadallah, an AI expert who’s set to give a talk at VDS2024 on How Gen-AI is Transforming Business & Avoiding the Pitfalls.

Where does the 2 percent to 10 percent number come from? What methods were used to determine that content was off the mark? What was the sample size? Has bad output been tracked longitudinally for the tested systems? Ah, so many questions and zero answers. My take is that the jargon “hallucination” is coming back to bite AI experts on the ankle.

Second, what’s the fix? Not surprisingly, the way out of the problem is to rename “hallucination” to “confabulation”. That’s helpful. Here’s the passage I circled:

“It’s really attributing more to the AI than it is. It’s not thinking in the same way we’re thinking. All it’s doing is trying to predict what the next word should be given all the previous words that have been said,” Awadallah explains. If he had to give this occurrence a name, he would call it a ‘confabulation.’ Confabulations are essentially the addition of words or sentences that fill in the blanks in a way that makes the information look credible, even if it’s incorrect. “[AI models are] highly incentivized to answer any question. It doesn’t want to tell you, ‘I don’t know’,” says Awadallah.

Third, let’s not forget that the problem rests with the users, the personifies, the people who own French bulldogs and talk to them as though they were the favorite in a large family. Here’s the passage:

The danger here is that while some confabulations are easy to detect because they border on the absurd, most of the time an AI will present information that is very believable. And the more we begin to rely on AI to help us speed up productivity, the more we may take their seemingly believable responses at face value. This means companies need to be vigilant about including human oversight for every task an AI completes, dedicating more and not less time and resources.

The ending of the article is a remarkable statement; to wit:

As we edge closer and closer to eliminating AI confabulations, an interesting question to consider is, do we actually want AI to be factual and correct 100% of the time? Could limiting their responses also limit our ability to use them for creative tasks?

Let me answer the question: Yes, outputs should be presented and possibly scored; for example, 90 percent probable that the information is verifiable. Maybe emojis will work? Wow.

Stephen E Arnold, September 26, 2024

Comments

Got something to say?





  • Archives

  • Recent Posts

  • Meta