Word Problems Are Tricky for AI Language Models

October 27, 2022

If you have trouble with word problems, rest assured you are in good company. Machine-learning researchers have only recently made significant progress teaching algorithms the concept. IEEE Spectrum reports, “AI Language Models Are Struggling to ‘Get’ Math.” Writer Dan Garisto states:

“Until recently, language models regularly failed to solve even simple word problems, such as ‘Alice has five more balls than Bob, who has two balls after he gives four to Charlie. How many balls does Alice have?’ ‘When we say computers are very good at math, they’re very good at things that are quite specific,’ says Guy Gur-Ari, a machine-learning expert at Google. Computers are good at arithmetic—plugging numbers in and calculating is child’s play. But outside of formal structures, computers struggle. Solving word problems, or ‘quantitative reasoning,’ is deceptively tricky because it requires a robustness and rigor that many other problems don’t.”

Researchers threw a couple datasets with thousands of math problems at their language models. The students still failed spectacularly. After some tutoring, however, Google’s Minerva emerged as a star pupil, having achieved 78% accuracy. (Yes, the grading curve is considerable.) We learn:

“Minerva uses Google’s own language model, Pathways Language Model (PaLM), which is fine-tuned on scientific papers from the arXiv online preprint server and other sources with formatted math. Two other strategies helped Minerva. In ‘chain-of-thought prompting,’ Minerva was required to break down larger problems into more palatable chunks. The model also used majority voting—instead of being asked for one answer, it was asked to solve the problem 100 times. Of those answers, Minerva picked the most common answer.”

Not a practical approach for your average college student during an exam. Researchers are still not sure how much Minerva and her classmates understand about the answers they are giving, especially since the more problems they solve the fewer they get right. Garisto notes language models “can have strange, messy reasoning and still arrive at the right answer.” That is why human students are required to show their work, so perhaps this is not so different. More study is required, on the part of both researchers and their algorithms.

Cynthia Murrell, October 27, 2022


Comments are closed.

  • Archives

  • Recent Posts

  • Meta