Does Math Make Distinctions?

August 8, 2015

I read “What Does a Data Scientist Do That a Traditional Data Analytics Team Can’t?” Good marketing question. Math, until the whole hearted embrace of fuzziness, was reasonably objective. Survivors of introductory statistics learned about the subjectivity involved with Bayesian antics and the wonder of fiddling with thresholds. You remember. Above this value, do this; below this value, do that. Eventually one can string together numerical recipes which make threshold decisions based on inputs. In the hands of responsible, capable, and informed professionals, the systems work reasonably. Sure, smart software can drift and then run off the rails. There are procedures to keep layered systems on track. They work reasonably well for horseshoes. You know. Close enough for horseshoes. Monte Carlo’s bright lights beckon.

The write up takes a different approach. The idea is that someone who does descriptive procedures is an apple. The folks who do predictive procedures are oranges. One lets the data do the talking. Think of a spreadsheet jockey analyzing historical pre tax profits at a public company. Now contrast that with a person who looks at data and makes judgments about what the data “mean.”

Close enough for horse shoes.

Which is more fun? Go with the fortune tellers, of course.

The write up also raises the apparent black-white issue of structured versus unstructured data. The writer says:

Unstructured or “dirty” data is in many ways the opposite of its more organized counterpart, and is what data scientists rely on for their analysis. Data of this type is made up of qualitative rather than quantitative information — descriptive words instead of measurable numbers — and comes from more obscure sources such as emails, sentiment expressed in blogs or engagement across social media. Processing this information also involves the use of probability and statistical algorithms to translate what is learned into advanced applications for machine learning or even artificial intelligence, and these skills are often well beyond those of the average data analyst.

There you go. One does not want to be average. I am tempted to ask mode, median, or mean?

Net net: If the mathematical foundation is wrong, if the selected procedure is inappropriate, if the data are not validated—errors are likely and they will propagate.

One does not need to be too skilled in mathematics to understand that mistakes are not covered or ameliorated with buzz words.

Stephen E Arnold, August 8, 2016

Written by Stephen E. Arnold · Filed Under Analytics, News

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.