60 Months, Minimal Search Progress

January 1, 2012

When I was writing the Enterprise Search Report, I was younger, less informed, and slightly more optimistic. I wrote in August 2005 “Recent Trends in Enterprise Search”:

The truth is that nothing associated with locating information is cheap, easy or fast.

I omitted one item: accurate. About five years after writing this sentence, I have come to my senses. The volume of information flushing through the “tubes” continues to increase. To explain what petabytes means to the average liberal arts major now working at a services firm, someone coined the phrase “big data.” Simple. Tidy. Inaccurate.

That’s why the notion of accurate information is on my mind. I am tough to motivate in general, and burro like when I have to admit that something I wrote in one of my addled states is incomplete, stupid, or just plain wrong.

Let me start the New Year correctly. Here are four observations which will probably annoy the “real” experts, the self appointed search mavens, and the failed middle school teachers now consulting in the fields of ontology, massive parallelization in virtual environments, and “big data.” I don’t plan to alter my rhetorical approach, so too bad about giving some of these rescued Burger King workers some respite. Won’t happen.

First observation: Even a person as wild-and-wonderful as Jason Calacanis, the much admired innovator who makes a retreating Russian army’s scorched earth policy look green, wants to limit Internet content. “Jason Calacanis: Blogging Is Dead & Why Stupid People Shouldn’t Write” captures his take on accuracy. If one assumes stupid people should not write, then one reason may be that stupid people produce inaccurate information. Sounds okay to me, so let’s go with the stupid angle. In the era of “big data”, trimming out the stupid people should result in higher value information. Keep in mind I am addled. I am not sure where to stand on the “stupid” thing.

Image source: http://www.northernsun.com/Boldly-Going-Nowhere-T-Shirt-(8257).html

Second observation: Disinformation is becoming easier for me to spot. For you? I am not so sure. Let me give you a couple of examples. Navigate to the now out of date list of taxonomy systems prepared by Will Power. The page is available from Willpower Information in Middlesex. Now scan the description of the taxonomy system called MTM. Here’s a snippet:

MTM is the software for multilingual thesauri building and maintenance. It has been designed as a configurable system assisting a user in creating concepts, linking them by means of a set of predefined relations, and controlling the validity of the thesaurus structure…

The main features of the software are inter alia:

  • thesaurus maintenance and support system;
  • KWOC and full tree representation and navigation tools available on-line;
  • KWIC, KWOC and full tree printouts (in an alphabetic and systematic order);
  • defining and customization of up to 100 conceptual relationship types;
  • management of facets, codes (top classification), sources, regional variants, historical notes, etc.;
  • support of the various types of authority files;
  • computer assisted merging;
  • thesauri comparison by means of windows;
  • support of the various alphabets;
  • support of linguistic and orthographic variants;
  • sorting facilities consistent with national standards;
  • variable length data handling;
  • flexibility in defining input and output forms;
  • versatility in terms of relative ease of configuring the software for the various sets of languages;
  • flexibility in defining data structures needed for a given application;
  • a possibility to exchange data with other organizations and systems through exporting and importing terms and relations.

From the terminal user standpoint MTM fulfills the following criteria:

  • user-friendliness when entering, updating, deleting, checking data;
  • intelligent prompting of the end user whenever in doubt;
  • powerful validation facilities covering proper structuring of a thesaurus (e.g. maintenance of relationship isomorphism between languages);
  • features for documenting (“keeping track”) the history of the thesaurus evolution;
  • availability of data protection facilities;
  • availability of self-training and demonstration facilities;
  • provision of a thesaurus publishing facilities at the professional level;
  • modularity and openness to the further development.

You can access the developer’s Web site at the Institute for Computer and Information Engineering at http://www.icie.com.pl/.

The point is that the description of this system which was created in 1990 foreshadows the marketing baloney output in 2011. Progress? I think not. Are these assertions “accurate”? Got me. When it comes to search and content processing, writing about a function is similar to a scriptwriter’s work on the next installment of a Dr. Who episode.

Third observation: Vendors are in flux. I thought there was flip flopping underway in the period when I was writing the 500 plus page encyclopedia of search and my various monographs about enterprise search for Galatea and Pandia. Was I off base. Navigate to the Overflight service and click through the auto-generated profiles of the vendors on that page. The public Overflight makes it easy to track more than 60 vendors of search and content processing software. About half of the companies on the list have repositioned themselves. Examples range from ISYS Search Software getting into the connector business to Vivisimo’s becoming a vendor of “information optimization.” Oracle has sucked up search systems from Endeca, InQuira, and RightNow which helps it reach parity with OpenText’s arsenal of solutions from BASIS Technology, Bibliographic Retrieval Systems, Fulcrum, and Nstein. What can you do with flip flop technology? Well, anything. Look at SharePoint. Can you explain what SharePoint does? Microsoft uses descriptions which run the gamut from content management to search, from business intelligence to Web site services. When vendors morph, how can their systems deliver precision and recall across so many functions? They cannot. Ergo: outputs are not accurate. Content is either not indexed, incorrectly tagged, or displayed in a jumble so manual methods have to be used. This is not progress in my opinion.

Fourth observation: Getting money is job one. Whether one is using a Web search system or an in house system, everyone in the chain is in the money game. Licensees don’t want to spend money. As a result, many search failures are a result of the licensee’s unwillingness to invest beyond a certain limit. Search is expensive. Big data keeps on getting bigger which means that the spending for search is open ended. Vendors themselves are partly to blame because marketers assert the system can do “anything.” Well, no system can. So failure is a co pilot for most deployments. The users, poor folks, are stuck with not one flawed system. Users have to cope with dozens of findability systems. These range from command line systems which will be in place long after I bite the dust to whizzy “apps” which deliver a canned output with a tap. Maybe the canned output is not what is needed to make a decision. Could dumbing down be a contributing factor to certain executive decisions? MH Global can’t find $1.2 billion. Is an enterprise search system  user be able to find a purchase order?

My revised observation is:

The truth is that nothing associated with locating information is accurate, cheap, easy or fast.

Search vendors and licensees are welcome to prove me wrong. Just keep in mind the four horses galloping across the marketing noise: Too much lousy writing, marketing falsehoods, vendors who reinvent themselves in order to win jobs, and a need for cash invokes flawed installations.

Happy New Year to the “real” experts, the inept coders turned marketers, and the MBAs who are fueling America’s economic pilgrimage to a new third world economy. Hopefully the next 60 months will deliver “real” search progress. You know. Precision. Recall. Accuracy.

Stephen E Arnold, January 1, 2012

Sponsored by Pandia.com


One Response to “60 Months, Minimal Search Progress”

  1. 60 Months, Minimal Search Progress « Another Word For It on January 1st, 2012 5:52 pm

    […] 60 Months, Minimal Search Progress […]

  • Archives

  • Recent Posts

  • Meta