The Fragmentation of Content Analytics

October 29, 2012

I am in the midst of finalizing a series of Search Wizards Speak interviews with founders or chief technology officers of some interesting analytics vendors. Add to this work the briefings I have attended in the last two weeks. Toss in a conference which presented a fruit bowl of advanced technologies which read, understand, parse, count, track, analyze, and predict who will do what next.


From a distance, the analytics vendors look the same. Up close, each is individual and often not identical. Pick up the wrong shard and a cut finger or worse may result.

A happy quack to

Who would have thought that virtually every company engaged in indexing would morph into next-generation, Euler crazed, and Gauss loving number crunchers. If the names Euler and Gauss do not resonate with you, you are in for tough sledding in 2013. Math speak is the name of the game.

The are three very good reasons for repackaging Vivisimo as a big data and analytics player. I choose Vivisimo because I have used it as an example of IBM’s public relations mastery. The company developed a deduplication feature which was and is, I assume, pretty darned good. Then Vivisimo became a federated search system, nosing into territory staked out by Deep Web Technologies. Finally, when IBM bought Vivisimo for about $20 million, the reason was big data and similarly bright, sparkling marketing lingo. I wanted to mention Hewlett Packard’s recent touting of Autonomy as an analytics vendor or Oracle’s push to make Endeca a business analytics giant. But IBM gets the nod. Heck, it is a $100 billion a year outfit. It can define an acquisition any way it wishes. I am okay with that.

Now the reasons.

First, talking about Big Data and analytics is the equivalent of ratatouille  and Pouilly fumé. You just have to have both together to be hip. As a result, a company which offers a text retrieval system will assert that its technology handles Big Data and delivers analytics. A sumptuous marketing feast.

Second, buyers are tired of licensing systems which have not delivered useful functionality to their organization. Don’t believe me. Try to pitch a basic search and retrieval system to a company struggling with litigation, price competition, and internal systems which don’t work particularly well. Sizzle is needed.

Third, the volume of digital data has reached a tipping point for individuals and most organizations with which I am familiar. When information was on paper, it was mostly out of sight and tough to find. Now, the digital data is coursing through the organization’s digital pipes, choking servers, and increasing what I call information pressure.

Let’s look at information pressure. Paper can be put in boxes, labeled, and shoved into a warehouse. For most people in an organization, the notion of finding a proposal from a year ago is an amusing comment. Now, digital information can be located using software. It seems to be so easy to index content and then pinpoint the exact part of a particular document.

The problem is that digital information brings with it the psychological awareness that information should be findable. When it is not, there is no file box in a closet. The hunt for the digital document becomes an act of desperation for many professionals.

The impact of knowing that a paper copy is available versus the assumption that a digital instance is findable is agonizing. What if the digital document has been deleted? What if a legal discovery process finds a document on a remote server and the same document is not in the company’s archive? What if the person who created the document is working at another firm? Questions like these increase the “information pressure.”

Big Data means bigger pressure. Analytics and fancy math promise to relieve the pressure. Not surprisingly, the mere hint that old wine in new bottles tastes much better than any other beverage has some appeal.

Unfortunately, run of the mill ratatouille and a lousy Pouilly fumé will not solve a very difficult problem in our digital world. Modern analytics systems can help. But the deeper problem is that short cuts are not likely to work in a complex environment. My view is that no silver bullet exists for many of the information challenges we face today.

In real life there are antacids. In findability and understanding, there are just every increasing costs and complexity. But analytics solutions are selling. Will they solve problems in mismanagement or judgment? Not likely.

Stephen E Arnold, October 29, 2012


Comments are closed.

  • Archives

  • Recent Posts

  • Meta