Big Data and Search

January 1, 2013

A new year has arrived. Flipping a digit on the calendar prompts many gurus, wizards, failed Web masters, former real journalists, and unemployed English majors to identify trends. How can I resist a chrome plated, Gangnam style bandwagon? Big Data is no trend. It is, according to the smart set:

that Big Data would be “the next big chapter of our business history.

My approach is more modest. And I want to avoid silver-numbered politics and the monitoring business. I want to think about a subject of interest to a small group of techno-watchers: Big Data and search.

My view is that there has been Big Data for a long time. Marketers and venture hawks circle an issue. If enough birds block the sun, others notice. Big Data is now one of the official Big Trends for 2013. Search, as readers of this blog may know, experiences the best of times and the worst of times regardless of the year or the hot trends.

As the volume of unstructured information increases, search plays a part. What’s different for 2013 is that those trying to make better decisions need a helping hand, crutches, training wheels, and tools. Vendors of analytics systems like SAS and IBM SPSS should be in the driver’s seat. But these firms are not. An outfit like Palantir claims to be the leader of the parade. The company has snazzy graphics and $150 million in venture funding. Good enough for me I suppose. The Palantirs suggest that the old dudes at SAS and SPSS still require individuals who understand math and can program for the “end user”. Not surprisingly, there are more end users than there are SAS and SPSS wizards. One way around the shortage is to make Big Data a point-and-click affair. Satisfying? The marketers say, “For sure.”

A new opportunity arises for those who want the benefits of fancy math without the cost, hassle, and delay of dealing with intermediaries who may not have an MBA or aspire to be independently wealth before the age of 30. Toss in the health care data the US Federal government mandates, the avalanche of fuzzy thinking baloney from blogs like this one, and the tireless efforts of PR wizards to promote everything thing from antique abacuses to zebra striped fabrics. One must not overlook e-mail, PowerPoint presentations, and the rivers of video which have to be processed and “understood.” In these streams of real time and semi-fresh data, there must be gems which can generate diamond bright insights. Even sociology major may have a shot at a permanent job.

The biggest of the Big Berthas are firing away at Big Data. Navigate to “Sure, Big Data Is Great. But So Is Intuition.” Harvard, MIT, and juicy details explain that the trend is now anchored into the halls of academe. There is even a cautionary quote from an academic who was able to identify just one example of Big Data going somewhat astray. Here’s the quote:

At the M.I.T. conference, a panel was asked to cite examples of big failures in Big Data. No one could really think of any. Soon after, though, Roberto Rigobon could barely contain himself as he took to the stage. Mr. Rigobon, a professor at M.I.T.’s Sloan School of Management, said that the financial crisis certainly humbled the data hounds. “Hedge funds failed all over the world,” he said. THE problem is that a math model, like a metaphor, is a simplification. This type of modeling came out of the sciences, where the behavior of particles in a fluid, for example, is predictable according to the laws of physics.

Sure Big Data has downsides. MBAs love to lift downsides via their trusty, almost infallible intellectual hydraulics.

My focus is search. The trends I wish to share with my two or three readers require some preliminary observations:

  1. Search vendors will just say they can handle Big Data. Proof not required.  It is cheaper to assert a technology than actually develop a capability.
  2. Search vendors will point out that sooner or later a user will know enough to enter a query. Fancy math notwithstanding, nothing works quite like a well crafted query. Search may be a commodity, but it will not go away.
  3. Big Data systems are great at generating hot graphics. In order to answer a question, a Big Data system must be able to display the source document. Even the slickest analytics person has to find a source. Well, maybe not all of the time, but sometimes it is useful prior to a deposition.
  4. Big Data systems cannot process certain types of data. Search systems cannot process certain types of data. It makes sense to process whatever fits into each system’s intake system and use both systems. The charm of two systems which do not quite align is sweet music to a marketer’s ears. If a company has a search system, that outfit will buy a Big Data system. If a company has a Big Data system, the outfit will be shopping for a search system. Nice symmetry!
  5. Search systems and Big Data systems can scale. Now this particular assertion is true when one criterion is met; an unending supply of money. The Big Data thing has a huge appetite for resources. Chomp. Chomp. That’s the sound of a budget being consumed in a sprightly way.

Now the trends:

Trend 1. Before the end of 2013, Big Data will find itself explaining why the actual data processed were Small Data. The assertion that existing systems can handle whatever the client wants to process will be exposed as selective content processing systems. Big Data are big and systems have finite capacity. Some clients may not be thrilled to learn that their ore did not include the tonnage that contained the gems. In short, say hello to aggressive sampling and indexes which are not refreshed in anything close to real time.

Trend 2. Big Data and search vendors will be tripping over themselves in an effort to explain which system does what under what circumstances. The assertion that a system can do both structured and unstructured while uncovering the meaning of the data is one I want to believe. Too bad the assertion is mushy in the accuracy department’s basement.

Trend 3.The talent pool for Big Data and search is less plentiful than the pool of art history majors. More bad news. The pool is not filling rapidly. As a result, quite a few data swimmers drown. Example: the financial crisis perhaps? The talent shortage suggests some interesting cost overruns and project failures.

Trend 4. A new Big Thing will nose into the Big Data and search content processing space. Will the new Big Thing work? Nah. The reason is that extracting high value knowledge from raw data is a tough problem. Writing new marketing copy is a great deal easier. I am not sure what the buzzword will be. I am pretty sure vendors will need a new one before the end of 2013. Even PSY called it quits with Gangnam style. No such luck in Big Data and search at this time.

Trend 5. The same glassy eyed confusion which analytics and search presentations engender will lead to greater buyer confusion and slow down procurements. Not even the magic of the “cloud” will be able to close certain deals. In a quest for revenue, the vendors will wrap basic ideas in a cloud of unknowing.

I suppose that is a good thing. Thank goodness I am unemployed, clueless, and living in a rural Kentucky goose pond.

Stephen E Arnold, January 1, 2012

Another Beyond Search analysis for free


3 Responses to “Big Data and Search”

  1. Martin White on January 3rd, 2013 6:40 am

    An excellent analysis which I totally support. Some of the statements being made on vendor web sites are beyond belief.

  2. Dinesh Vadhia on January 6th, 2013 9:56 am

    As a category name, “Big Data” could not be a worse choice. Organizations who have been dealing with data and databases for a long time will hopefully have already recognized the shallowness of the marketing message.

    At its most extreme, Big Data covers two distinct areas: data processing and predictive computing (previously called analytics). The former has been ongoing for decades. Batch-based Hadoop is but another TFL method but not the only one. Predictive computing is the application of machine learning methods (on pre-processed data) and supposes that an application or use has been identified beforehand.

    Only disagree with Trend 2 ie. “The assertion that a system can do both structured and unstructured while uncovering the meaning of the data is one I want to believe” is possible wrt a machine learning method operating on features (from data) that originate from both structured and unstructured data.

  3. Jim Gracely on January 8th, 2013 10:56 am

    As your third reader, I want to thank you for a good chuckle to start 2013. I have sent the last few months trying to get my head around all these topics, and it is nice to see that I am not the only one thinking that companies are spending far more money on great product positioning statements than they are solutions.

  • Archives

  • Recent Posts

  • Meta