Oh, Oh, Somebody Has Blown the Whistle on the Machine Learning Fouls

July 3, 2020

Wonder why smart software is often and quite spectacularly stupid? You can get a partial answer in “On Moving from Statistics to Machine Learning, the Final Stage of Grief.” There’s some mathiness in the write up. However, the author who tries to stand up to heteroskedastic errors, offers some useful explanations and good descriptions of the short cuts some of the zippy machine learning systems take.

Here’s a passage I found interesting:

As you can imagine, machine learning doesn’t let you side-step the dirty work of specifying your data and models (a.k.a. “feature engineering,” according to data scientists), but it makes it a lot easier to just run things without thinking too hard about how to set it up. In statistics, bad results can be wrong, and being right for bad reasons isn’t acceptable. In machine learning, bad results are wrong if they catastrophically fail to predict the future, and nobody cares much how your crystal ball works, they only care that it works.

Also this statement:

I like showing ridge regression as an example of machine learning because it’s very similar to OLS, but is totally and unabashedly modified for predictive purposes, instead of inferential purposes.

One problem is that those individuals who most need to understand why smart software is stupid are likely to struggle to understand this quite helpful explanation.

Math understanding is the problem. That lack of mathiness is why smart software is likely to remain like a very large, eager wet Newfoundland water dog shaking in the kitchen. Yep, the hairy beast is an outlier heteroskedastically speaking, of course.

Stephen E Arnold, July 3, 2020


Got something to say?

  • Archives

  • Recent Posts

  • Meta