Google, Smart Software, and Prime Mover for Hyperbole

May 17, 2022

In my experience, the cost of training smart software is very big problem. The bigness does not become evident until the licensee of a smart system realizes that training the smart software must take place on a regular schedule. Why is this a big problem? The reason is the effort required to assemble valid training sets is significant. Language, data types, and info peculiarities change over time; for example, new content is fed into a smart system, and the system cannot cope with the differences between the training set that was used and the info flowing into the system now. A gap grows, and the fix is to assemble new training data, reindex the content, and get ready to do it again. A failure to keep the smart software in sync with what is processed is a tiny bit of knowledge not explained in sales pitches.

Accountants figure out that money must be spent on a cost not in the original price data. Search systems return increasingly lousy results. Intelligence software outputs data which make zero sense to a person working out a surveillance plan. An art history major working on a PowerPoint presentation cannot locate the version used by the president of the company for last week’s pitch to potential investors.

The accountant wants to understand overruns associated with smart software, looks into the invoices and time sheets, and discovers something new: Smart software subject matter experts, indexing professionals, interns buying third-party content from an online vendor called Elsevier. These are not what CPAs confront unless there are smart software systems chugging along.

The big problem is handled in this way: Those selling the system don’t talk too much about how training is a recurring cost which increases over time. Yep, reindexing is a greedy pig and those training sets have to be tested to see if the smart software gets smarter.

The fix? Do PR about super duper even smarter methods of training. Think Snorkel. Think synthetic data. Think PowerPoint decks filled with jargon that causes clueless MBAs do high fives because the approach is a slam dunk. Yes! Winner!

I read “DeepMind’s Astounding New ‘Gato’ AI Makes Me Fear Humans Will Never Achieve AGI” and realized that the cloud of unknowing has not yet yield to blue skies. The article states:

Just like it took some time between the discovery of fire and the invention of the internal combustion engine, figuring out how to go from deep learning to AGI won’t happen overnight.

No kidding. There are gotchas beyond training, however. I have a presentation in hand which I delivered in 1997 at an online conference. Training cost is one dot point; there are five others. Can you name them? Here’s a hint for another big issue: An output that kills a patient. The accountant understands the costs of litigation when that smart AI makes a close enough for horseshoes output for a harried medical professional. Yeah, go catscan, go.

Stephen E Arnold, May 17, 2022

Written by Stephen E. Arnold · Filed Under AI, News

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.