Probability: Who Wants to Dig into What Is Cooking Beneath the Outputs of Smart Software?

May 30, 2023

Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

The ChatGPT and smart software “revolution” depends on math only a few live and breathe. One drawer in the pigeon hole desk of mathematics is probability. You know the coin flip example. Most computer science types avoid advanced statistics. I know because my great uncle Vladimir Arnold (yeah, the guy who worked with a so so mathy type named Andrey Kolmogorov, who was pretty good at mathy stuff and liked hiking in the winter in what my great uncle described as “minimal clothing.”)

When it comes to using smart software, the plumbing is kept under the basement floor. What people see are interfaces and application programming interfaces. Watching how the sausage is produced is not what the smart software outfits do. What makes the math interesting is that the system and methods are not really new. What’s new is that memory, processing power, and content are available.

If one pries up a tile on the basement floor, the plumbing is complicated. Within each pipe or workflow process are the mathematics that bedevil many college students: Inferential statistics. Those who dabble in the Fancy Math of smart software are familiar with Markov chains and Martingales. There are garden variety maths as well; for example, the calculations beloved of stochastic parrots.

MidJourney’s idea of complex plumbing. Smart software’s guts are more intricate with many knobs for acolytes to turn and many levers to pull for “users.”

The little secret among the mathy folks who whack together smart software is that humanoids set thresholds, establish boundaries on certain operations, exercise controls like those on an old-fashioned steam engine, and find inspiration with a line of code or a process tweak that arrived in the morning gym routine.

In short, the outputs from the snazzy interface make it almost impossible to understand why certain responses cannot be explained. Who knows how the individual humanoid tweaks interact as values (probabilities, for instance) interact with other mathy stuff. Why explain this? Few understand.

To get a sense of how contentious certain statistical methods are, I suggest you take a look at “Statistical Modeling, Causal Inference, and Social Science.” I thought the paper should have been called, “Why No One at Facebook, Google, OpenAI, and other smart software outfits can explain why some output showed up and some did not, why one response looks reasonable and another one seems like a line ripped from Fantasy Magazine.

In a nutshell, the cited paper makes one point: Those teaching advanced classes in which probability and related operations are taught do not agree on what tools to use, how to apply the procedures, and what impact certain interactions produce.

Net net: Glib explanations are baloney. This mathy stuff is a serious problem, particularly when a major player like Google seeks to control training sets, off-the-shelf models, framing problems, and integrating the firm’s mental orientation to what’s okay and what’s not okay. Are you okay with that? I am too old to worry, but you, gentle reader, may have decades to understand what my great uncle and his sporty pal were doing. What Google type outfits are doing is less easily looked up, documented, and analyzed.

Stephen E Arnold, May 30, 2023

Written by Stephen E. Arnold · Filed Under AI, News, Statistics, Text analytics, Text processing

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.