Algorithms: Thresholds and Recycling Partially Explained

April 19, 2019

Five or six years ago I prepared a lecture about the weaknesses in widely used algorithms. In that talk, which I delivered to intelligence operatives in Western Europe and the US, I pointed out two points which were significant to me and my small research team.

There are about nine or 10 algorithms which are used again and again. One example is k means. The reason is that the procedure is a fixture in many university courses, and the method is good enough.
Quite a bit of the work on smart software relies on cutting and pasting. In 1962, I discovered the value of this approach when I worked on a small project at my undergraduate university. Find a code snippet that does the needed task, modify it if necessary, and bingo! Today this approach remains popular.

I thought about my lectures and these two points when I read another part of the mathy series “Untold History of AI: Algorithmic Bias Was Born in the 1980s.” IEEE Spectrum does a reasonable job of explaining one case of algorithmic bias. The story is similar to the experience Amazon had with one of its smart modules. The math produced wonky results. The word “bias” is okay with me, but the outputs from systems which happily chug away and deliver “outputs” to clueless MBAs, lawyers, and marketers may be incorrect.

Several observations:

The bias in methods goes back before I showed up at the university computer center to use the keypunch machines. Way back in fact.
Developers today rely on copy and paste, open source, and the basic methods taught by professors who may be thinking about their side jobs as consultants.
Training data may be skewed, and no one wants to spend the money or take the time to create training data. Why bother? Just use whatever is free, cheap, or already on a storage device. Close enough for horseshoes.
Users do not know [a] what’s going on behind the point and click interfaces, nor do most users care. As a result, a good graphic is “correct.”

The chatter about the one percent focuses on money. There is another, more important one percent in my opinion. The one percent who take the time to look at a sophisticated system will find the same nine or 10 algorithms, the same open source components, and some recycled procedures that few think about. Quick question: How many smart software systems rely on Thomas Bayes’ methods? Give up? Lots.

I don’t have a remedy for this problem, and I am not sure too many people care, want to talk about the “accuracy” of a smart system’s outputs. That’s a happy thought for the weekend. Imagine bad outputs in an autonomous drone or a smart system in a commercial aircraft? Exciting.

Stephen E Arnold, April 19, 2019

Stephen E Arnold,

Written by Stephen E. Arnold · Filed Under algorithms, News

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.