More Predictive Silliness: Coding, Decisioning, Baloneying

June 18, 2012

It must be the summer vacation warm and fuzzies. I received another wild analytics news release today. This one comes from 5WPR, “a top 25 PR agency.” Wow. I learned from the spam: PeekAnalytics “delivers enterprise class Twitter analytics and help marketers understand their social consumers.”

What?

Then I read:

By identifying where Twitter users exist elsewhere on the Web, PeekAnalytics offers unparalleled audience metrics from consumer data aggregated not just from Twitter, but from over sixty social sites and every major blog platform.

The notion of algorithms explaining anything is interesting. But the problem with numerical recipes is that those who use outputs may not know what’s going on under the hood. Wide spread knowledge of the specific algorithms, the thresholds built into the system, and the assumptions underlying the selection of a particular method is in short supply.

Analytics is the realm of the one percent of the population trained to understand the strengths and weaknesses of specific mathematical systems and methods. The 99 percent are destined to accept analytics system outputs without knowing how the data were selected, shaped, formed, and presented given the constraints of the inputs. Who cares? Well, obviously not some marketers of predictive analytics, automated indexing, and some trigger trading systems. Too bad for me. I do care.

When I read about analytics and understanding, I shudder. As an old goose, each body shake costs me some feathers, and I don’t have many more to lose at age 67. The reality of fancy math is that those selling its benefits do not understand its limitations.

Consider the notion of using a group of analytic methods to figure out the meaning of a document. Then consider the numerical recipes required to identify a particular document as important from thousands or millions of other documents.

When companies describe the benefits of a mathematical system, the details are lost in the dust. In fact, bringing up a detail results in a wrinkled brow. Consider the Kolmogorov-Smirnov Test. Has this non parametric test been applied to the analytics system which marketers have presented to you in the last “death by PowerPoint” session? The response from 99.5 percent of the people in the world is, “Kolmo who?” or “Isn’t Smirnov a vodka?” Bzzzz. Wrong.

Mathematical methods which generate probabilities are essential to many business sectors. When one moves fuel rods at a nuclear reactor, the decision about what rod to put where is informed by a range of mathematical methods. Special training experts, often with degrees in nuclear engineering plus post graduate work handle the fuel rod manipulation. Take it from me. Direct observation is not the optimal way to figure out fuel pool rod distribution. Get the math “wrong” and some pretty exciting events transpire. Monte Carlo anyone? John Gray? Julian Steyn? If these names mean nothing to you, you would not want to sign up for work in a nuclear facility.

Why then would a person with zero knowledge of how numerical recipes, oddball outputs from particular types of algorithms, and little or know experience with probability methods use the outputs of a system as “truth.” The outputs of analytical systems require expertise to interpret. Looking at a nifty graphic generated by Spotfire or Palantir is NOT the same as understand what decisions have been made, what limitations exist within the data display, and what are the blind spots generated by the particular method or suite of methods. (Firms which do focus on explaining and delivering systems which make it clear to users about methods, constraints, and considerations include Digital Reasoning, Ikanow, and Content Analyst. Others? You are on your own, folks.)

Today I have yet another conference call with 30 somethings who are into analytics. Analytics is the “next big thing.” Just as people assume coding up a Web site is easy, people assume that mathematical methods are now the mental equivalent of clicking a mouse to get a document. Wrong.

The likelihood of misinterpreting the outputs of modern analytic systems is higher than it was when I entered the workforce after graduate school. These reasons include:

  1. A rise in the “something for nothing” approach to information. A few clicks, a phone call, and chit chat with colleagues makes many people expert in quite difficult systems and methods. In the mid 1960s, there was limited access to systems which could do clever stuff with tricks from my relative Vladimir Ivanovich Arnold. Today, the majority of the people with whom I interact assume their ability to generate a graph and interpret a scatter diagram equips them as analytic mavens. Math is and will remain hard. Nothing worthwhile comes easy. That truism is not too popular with the 30 somethings who explain the advantages of analytics products they sell.
  2. Sizzle over content. Most of the wild and crazy decisions I have learned about come from managers who accept analytic system outputs as a page from old Torah scrolls from Yitzchok Riesman’s collection. High ranking government officials want eye candy, so modern analytic systems generate snazzy graphics. Does the government official know what the methods were and the data’s limitations? Nope. Bring this up and the comment is, “Don’t get into the weeds with me, sir.” No problem. I am an old advisor in rural Kentucky.
  3. Entrepreneurs, failing search system vendors, and open source repackagers are painting the bandwagon and polishing the tubas and trombones. The analytics parade is on. From automated and predictive indexing to surfacing nuggets in social media—the music is loud and getting louder. With so many firms jumping into the bandwagon or joining the parade, the reality of analytics is essentially irrelevant.

The bottom line for me is that the social boom is at or near its crest. Marketers—particularly those in content processing and search—are desperate for a hook which will generate revenues. Analytics seems to be as good as any other idea which is converted by azure chip consultants and carpetbaggers into a “real business.”

The problem is that analytics is math. Math is easy as 1-2-3; math is as complex as MIT’s advanced courses. With each advance in computing power, more fancy math becomes possible. As math advances, the number of folks who can figure out what a method yields decreases. The result is a growing “cloud of unknowing” with regard to analytics. Putting this into a visualization makes clear the challenge.

Stephen E Arnold, June 18, 2012

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta