Analytics Troubled by Bottlenecks. Impossible.

September 30, 2014

The hyperbole artists have painted themselves into a corner. I am not sure too many folks know this. The idea that one can crank out killer analyses with a couple of swipes or a mouse click are raising expectations. Like so much in content processing, reality is a just little bit different.

You know that the slips twixt cup and lip must be cropping up in numerous organizations. The Harvard Business Review does not write too much science fiction compared to MIT’s Technology Review across the river.

Beware the Analytics Bottleneck” adopts the same MBA tone that makes Wall Street bankers and lawyers so beloved by the common man and delivers what might be a downside.

The write up states:

“Don’t be overwhelmed. Start slower to go faster.” I think that runs counter to the baloney in the Eric Schmidt Google tome.

Next the HBR wants to keep life simple for the busy one percenters:

Technology doesn’t have to be exposed. Keep the complexity behind the curtain. Definitely good advice if one does not know whether the data are valid and the numerical recipes are configured in an appropriate manner.

Then the golden piece of advice for the go go MBA looking for a payday so he or she can pursue his or her dream of helping people or just spending money:

Make faster decisions for faster rewards.

That’s a sure fire way to break through bottlenecks. Use the outputs to support really fast decisions. Forget that pondering stuff. Just guess.

What’s scary is that when some folks have a tiny bit of knowledge, their deliberations can yield disastrous decisions. Need some examples. Well, do some thinking. How about GM and ignition switches? What about IRS actions and email mysteries? Or multi billion dollar acquisitions that lead to multi billion dollar write offs shortly after handing over the dump trucks filled with cash?

My take on this write up is that the “expert” did not focus on the bottlenecks that Big Data often produce like sex crazed hamsters:

  1. The time and cost to normalize and validate data
  2. The complexity of updating indexes so that reports reflect the most recent data, not stale data
  3. Dealing with the configuration decisions that generate outputs that are just plain wrong
  4. The money spent to get a system back online when it crashes either an old fashioned on premises flame out or one of the nifty new cloud systems that are virtual and allegedly fool proof.

In short, Big Data and analytics pose some very significant challenges for vendors, licensees, and those who use the systems. The good news is that guessing will probably produce better results than reasoning through a decision based on flawed information. The bad news is that fancy content processing systems are likely to gobble budgets and increase certain operational costs.

The HBR obviously does not agree. Well, the fellows around the cast iron stove in Harrod’s Creek, Kentucky, find my observations directly on point.

Stephen E Arnold, September 30, 2014

Big Data, the NFL Fan, and Music

September 28, 2014

The yada yada about Big Data drones on. I found “Billboard Crunches Facebook Data to Chart Music Tastes of NFL Team Fan Bases.” Analytics can be useful. Forget Ebola, the data about ISIS/ISIL, and precision/recall scores for a Google search.

The article explains a dramatic application for number crunching. I learned:

Billboard asked Facebook’s data crunchers to tally the music pages “liked” by fans of each NFL team and figured out who likes to blast what from their stereos.

Here are three insights generated by the intrepid data scientists. How many of these outputs resonate with you?

  1. Oakland Raiders’ fans like Snoop Dogg
  2. Buffalo Bills’ fans like The Beatles.
  3. Chicago Bears’ fans like Michael Jackson.

And my favorite. Arizona Cardinals’ fans prefer Pitbull. (Who? A bird and a dog type?)

Essential insights for decision makers. Yeah, yeah, yeah.

Stephen E Arnold, September 28, 2014

NSA Catalog Available

September 27, 2014

Short honk: I you want a copy of National Security Agency 2014 Technology Catalog: Technology Transfer Program, you can download it for now from this link. If found pages 26 to 40 fascinating. Will IDC issue its own version of this document, using its surfing technical demonstrated by Dave Schubmehl with my content? I will keep my eye open.

Stephen E Arnold, September 27, 2014

Useful Knowledge Visualization

September 25, 2014

I find visualization endlessly entertaining. I read “A Visual Data Mining Framework for Convenient Identification of Useful Knowledge.” The authors of the paper are Kaidi Zhao and Bing Liu (University of Illinois, Chicago) and Thomas M. Tirpak and Weimin Xiao (Motorola Labs). The paper illustrates the effort going in to making sense of available data. What struck me was the illustration on page 6 of the PDF.


I hope that progress is rapid with their approach. Useful knowledge is helpful particularly when the visualization method is crystal clear. When I look closely at a magnified view of page 6’s diagram, I can spot “values.” In Column C, Row 6, there are three unexplained values.

Stephen E Arnold, September 25, 2014

Watson = Google. IBM, Of Course!

September 22, 2014

I read “How IBM’s Watson Could Do for Analytics What Search Did for Google.” I urge you to flip through a math book like Calculus on Manifolds: A Modern Approach to Classical Theorems of Advanced Calculus. Although an older book, some of its methods are now creeping into the artificial intelligence revolution that seems to be the next big thing. Then read the Datamation write up.

IBM is rolling out a “freemium model to move Watson, their [sic] English language AI interface for analytics, into the market more aggressively.” What could be more aggressive than university contents, recipes for Bon Appétit, and curing cancer?

The article points out that the only competitor to Watson is Google. Well, that’s an interesting assertion.

Google put an interface on search I learned. The rest is Google’s dominance. Now IBM wants to put an interface on analytics, and—I assume it follows to the thinkers at IBM—IBM’s dominance will tag along.

The article asserts:

We often talk about analytics needing data scientists who have a unique skill set, allowing them to get out the answers needed from highly complex data repositories.  Since the results of the analysis are supposed to lead to better executive decisions the ideal skill set would have been an MBA Data Scientist, yet I’ve actually never seen one of those. Folks who are good at deep analysis and folks that are good at business tend to be very different folks, and data scientists are in very short supply at the moment.

Well, someone has to:

  • Select numerical recipes
  • Set thresholds
  • Select process sequences
  • Select data and ensure that they are valid
  • Set up outputs, making decisions about what to show and what not to show
  • Modify when the outputs do not match reality. (I realize that this step is of little interest to some analytics users.)

The article concludes:

The Freemium model has similar advantages. So if you wrap a product that line executives should prefer with an economic model that removes most of the financial barriers, you should end up with a solution that does for IBM what Search did for Google. And that could do some interesting things to the analytics market, creating a similar set of conditions to those that put IBM on top of technology in the last century.

What’s a freemium model? What’s the purpose of the analysis? What’s the method to validate results? What controls does a clueless user have over the Watson system?

Oh, wait. Watson is a search system. Google is a search system that people use. Watson is a search system that few use. Also, IBM still sells mainframes. This is a useful factoid to keep in mind.

Stephen E Arnold, September 22, 2014

Palantir and Its Funding

September 18, 2014

I read “Palantir May Have Raised More Than We Thought, Perhaps $165 million.” The article presented a revisionist view of how much money is in the Palantir piggy bank. Here’s the number I circled: $165 million since February 2014. I also marked this paragraph:

The Palo Alto company led by CEO Alex Karp disclosed in a Securities and Exchange Commission filing on Friday that it had raised more than $440 million in a funding round that began last November.

The numbers add up. The write up asserted:

The company co-founded by Karp, Peter Thiel, Joe Lonsdale and others in 2004 has raised a total of about $1 billion, with some of that funding coming from In-Q-Tel, the venture arm of U.S. intelligence agencies.

This works out to a $9 billion valuation.

The question now becomes, “How long will it take Palantir to generate sufficient revenue to pay back the investors and turn a profit?” The reason I ask is that IBM is chasing this market along with a legion of other firms.

Terrorism, war fighting, and Fancy Dan analytics are growth buttons. Will there be enough customers to feed the appetites of the outfits chasing the available money?

My hunch is that some of the competitors in this segment will come up empty.

Also, the tonnage of money Palantir has had dropped in its bank account makes the separate injections of $30 million funding into three firms— Attivio, BA Insight, and Coveo—look modest indeed. Perhaps there is more to the Big Data pitch than just words?

Stephen E Arnold,

Cloud Based Mathematica from Wolfram

September 16, 2014

I read “Launching Today: Mathmatica Online.” The interface is similar to the desktop application. The benefits of having the Mathematica tool accessible on non desktop devices and without requiring a local installation of the program are many; for example, notebooks work on tablets. With refreshing candor, Dr. Wolfram notes:

There are some tradeoffs of course. For example, Manipulate can’t be as zippy in the cloud as it is on the desktop, because it has to run across the network. But because its Cloud CDF interface is running directly in the web browser, it can immediately be embedded in any web page, without any plug-in…

Worth a look at

Stephen E Arnold, September 16, 2014

Short Honk: Goggle Intervention-Objectivity?

September 15, 2014

Short honk: Navigate to “How Google’s Autonomous Car Passed the First U.S. State Self-Driving Test.” Do you find this statement interesting?

Google chose the test route and set limits on the road and weather conditions that the vehicle could encounter, and that its engineers had to take control of the car twice during the drive.

I do. With intervention it is much easier to pass a test. The same method of shaping characterizes Google’s approach to modeling for “nowcasting.” I discuss this hand crafting of methods to deliver an acceptable result in my next KMWorld article.

Stephen E Arnold, September 15, 2014

ExtraHop Offers Streaming into Additional Analytics Platforms

September 11, 2014

The article titled ExtraHop Helps to Make Data Free–Streams Into MongoDB And ElasticSearch on discusses the broad coverage available through ExtraHop’s metrics. With all of the growing complexity of current IT applications, ExtraHop can help both traditional and non-traditional users through their real-time analytics and Open Data Stream. In fact, ExtraHop recently began offering the possibility of streaming data sets directly into analytic solutions including MongoDB and Elasticsearch. The article explains,

“Customers can leverage ExtraHop’s skills in delivering the most relevant and useful monitoring visualizations. But at the same time that can use that same data in ways that ExtraHop could have never thought of. It gives them the ability to deliver richer and deeper insights, but it also gives them more control over where data is stored and how it is queried and manipulated. It also opens up the possibility for organizations to use multiple monitoring solutions in parallel, simply because they can.”

Gartner is quoted as saying that the importance of these ITOA technologies lies in their ability to aid the explorative and creative processes. By having these insights available, more and more users will be able to realize their ideas and perhaps even make their dreams into realities.

Chelsea Kerwin, September 11, 2014

Sponsored by, developer of Augmentext

Nowcasting: Lots of Behind the Scenes Human Work Necessary

September 10, 2014

Some outfits surf on the work of others. A good example is the Schubmehl-Arnold tie up. Get some color  and details here.

Other outfits have plenty of big thinkers and rely on nameless specialists to perform behind the scenes work.

A good example of this approach is revealed in “Predicting the Present with Bayesian Structural Time Series.” The scholarly write up explains a procedure to perform “nowcasting.” The idea is that one can use real time information to help predict other now happenings.

Instead of doing the wild and crazy Palantir/Recorded Future forward predicting, these Googlers focus on the now.

I am okay with whatever outputs predictive systems generate. What’s important about this paper is that the authors document when humans have to get involved in the processes constructed from numerical recipes known to many advanced math and statistics whizzes.

Here are several I noted:

  1. The modeler has to “choose components for the modeling trend.” No problem, but it is tedious and important work. Get this step wrong and the outputs can be misleading.
  2. Selecting sampling algorithms, page 6. Get this wrong and the outputs can be misleading.
  3. Simplify by making assumptions, page 7. “Another strategy one could pursue (but we have not) is to subjectively segment predictors into groups based on how likely the would be to enter the model.”
  4. Breaking with Bayesian, page 8. “Scaling by “s^2/y”* is a minor violation of the Bayesian paradigm because it means our prior is data determined.”

There are other examples. These range from selecting what outputs from Google Trends and Correlate to use to the sequence of numerical recipes implemented in the model.

My point is that Google is being upfront about the need for considerable manual work in order to make its nowcasting predictive model “work.”

Analytics deployed in organizations depend on similar human behind the scenes work. Get the wrong thresholds, put the procedures in a different order, or use bad judgment about what method to use and guess what?

The outputs are useless. As managers depend on analytics to aid their decision making and planners rely on models to predict the future, it is helpful to keep in mind that an end user may lack the expertise to figure out if the outputs are useful. If useful, how much confidence should a harried MBA put in predictive models.

Just a reminder that ISIS caught some folks by surprise, analytics vendor HP seemed to flub its predictions about Autonomy sales, and the outfits monitoring Ebola seem to be wrestling with underestimations.

Maybe enterprise search vendors can address these issues? I doubt it.

Note: my blog editor will not render mathematical typography. Check the original Google paper on page 8, line 4 for the correct representation.

Stephen E Arnold, September 10, 2014

« Previous PageNext Page »