The Building Blocks of Smart Software: Combine Them to Meet Your Needs

January 25, 2021

I have a file of listicles. One called “Top 10 Algorithms in Data Mining” appeared in 2007. I spotted another list which is, not surprisingly, quite like the Xindong Wu et al write up. The most recent listing is “All Machine Learning Algorithms You Should Know in 2021.” And note the “all.” I included a short item about a book of business intelligence algorithms in the DarkCyber for January 26, 2021, at this link. That book had more than 600 pages, and I am reasonably confident that the authors did not use the word “all” to describe their effort.

What’s the line up of “all” you ask? In the table below, I present the list from 2008 in red and the list from 2021 in blue.

2008 Xindong Dong et al 2021 “All” KDNuggets’
1 Decision Trees Linear regression
2 k-means Logistic regression
3 Support Vector Machines k nearest neighbor
4 A priori Naive Bayes
5 Expectation-Maximization (EM) Support vector machines
6 Page Rank (voting) Decision trees
7 Ada Boost Random forest
8 k nearest neighbor classification AdaBoost
9 Naive Bayes Gradient boost
10 Classification and Regression trees XGBoost

The KDNuggets’ opinion piece also includes LightGMB (a variation of XGBoost) and CatBoost (is a more efficient gradient boost). Hence, I have focused on 10 algorithms. I performed a similar compression with Xindong Dong et al’s labored discussion of rules and cases grouped under “decision trees” in the table above.

Several observations are possible from these data:

  1. “All” is misleading in the KDNuggets’ title. Why not skip the intellectually shallow “all”?
  2. In the 14 years between the scholarly article and the enthusiastic “all” paper, the tools of the smart software crowd have not advanced if the data in these two write ups are close enough for horse shoes
  3. Modern systems’ similarity in overall approaches is understandable because a limited set of tools are used by energetic “inventors” of smart software.

Net net: The mathematical recipes are evolving in terms of efficiency due to more machine horsepower and more data.

How about the progress in accuracy? Did IBM Watson uncover a drug to defeat Covid? How are those Google search results working for you? What about the smart cyber security software which appear to have missed entirely the SolarWinds’ misstep.

Why? Knowing algorithms is not the same as developing systems which most work. Marketers, however, can seize on these mathy names and work miracles. Too bad the systems built with them don’t.

Stephen E Arnold, January 25, 2021


Got something to say?

  • Archives

  • Recent Posts

  • Meta