Key Word Search Vendors: Panting Laggards
March 31, 2008
In September 2003, I gave an invited lecture at LANL, an acronym for Los Alamos National Laboratories for those of you who don’t keep up with some of the US government’s most interesting research nomenclature. I poked around my digital warehouse today when I saw an announcement that a major search-and-retrieval vendor was now officially in the “information access business”. I used to work for Ziff Communications Co., and we owned an outfit called Information Access Co. That was a great company name, but the whole shooting match was sold to the giant Thomson Corporation and the name Information Access fell into disuse or so I thought.
I marvel at the “back from the dead” certain terminology demonstrates. IAC, as Information Access was known for more than 15 years, allowed a person to search for electronic information. The idea was a good one, and IAC had revenues of more than $100 million at the time of the sale. The idea was simple. We used bibliographic records or what today would be called “structured metadata”, full text of articles or what today would be called content, and proprietary scripts to generate reports or what today would be called business intelligence. The user of our General Business File product in 1990 would pick from a menu of options; for example, look for a job. Then the user would pick from one of the major cities whose employment opportunities we indexed (now tagged) and the system would display job openings. A mouse click sent the report to the printer, and we had happy users. We sold more than 1,000 of these systems in less than nine months in 1990. Considering each system was in the $20,000 plus range, the General Business File would be a success in our Googley world.
The LANL group wanted to know about the future of search and “The Information Implications of Social Software”. Now in 2003, there wasn’t the popular awareness of social software because MySpace.com, Facebook.com, the Web 2.0 “revolution”, and AJAX were dreams or oddities known to a handful of code bangers.
One of the key points in my presentation was that “information access” was an umbrella term for a bundle of activities and functions. These separate entities were now able to interact to form new, often quite surprising products and services. Social software–which I defined as the use of network technology for communication, collaboration, and combination–was a terrible term, but we were stuck with it. (To learn more about my annoyance with information terminology, Searcher Magazine is running an features story that updates to my 1999 article and my year 2000 article about technology convergence. Sorry. I don’t have a publication date yet, but the editor, Barbara Quint, is working on my lousy prose now.)
Take a look at one diagram from my lecture. Keep in mind that I prepared this five years ago, but for our purpose it is, I hope, useful to you.
Someone complained that I was copyrighting my work on this Web log. Okay, I won’t put the copyright symbol on this graphic. If you want to recycle my work, please, send me an email and get permission. I get annoyed when certain individuals borrow with neither attribution nor permission. Right, Mr. Hermans?
Let’s take a quick tour of this diagram, and then I will close with some observations about the “panting laggard” that is behind-the-firewall search.
Yellow Spheres
Notice the “yellow spheres”. You may have to click on the small image in order to read the notations on this diagram. The heading is “Enabling”. The idea is that each of the “yellow spheres” represents a category of technology that makes online information more useful. For example, “Converting Creating Content” refers to content authoring and content transformation. Behind-the-firewall systems have to take different file types and homogenize them so the system can manipulate them. If a search or content processing system can’t “read” a file, the system won’t process it. The idea, then, is to get the content regardless of its form and format into the search and content processing system. The bottom “yellow ball” is labeled “Spidering, Indexing, and Searching”. You recognize these ideas because 90 percent of a search vendor’s sales pitch talks about this “yellow ball”. In terms of this diagram, it’s easy to see that these three operations–spidering, indexing, and search–are just a cog in a much larger system. Vendors who pitch you about these three features are “panting laggards”. These vendors are almost out of the race and almost certainly won’t win in the long run in my opinion.
Purple Spheres
The “purple spheres” are identified as “Analysis”. Each of these four spaces are now mainstream. Vendors offer these services because each is easier for a manager to assess in terms of a payoff. Few people in an organization want to see laundry lists of information. Filtering eliminates information that rules, methods, or user-defined specifications say, “I don’t want information about enterprise search. I want information about predictive analytics.” Clustering is a catch-all term. In it reside classification, grouping, categorization, and any thing to do with today’s idĂ©es du jour–taxonomies and ontologies. The idea is that the system groups similar documents in a meaningful way. If you don’t know what you really want to review, you scan the category labels and browse the results. The third “purple sphere” is data mining. Companies like SPSS and SAS Institute are familiar to you if you took advanced statistics in college. These companies are not in the business of text processing and offering a burgeoning array of features and functions designed to whip unstructured content into shape. SAS Institute bought Teragram, and their PR team told me that SAS will become an “enterprise search company”. I detest this term, but the move is a good one. SAS wants to chop up text, pull out the juicy bits, count them, crunch them, and generate reports for users. The final “purple sphere” is labeled “static / video imaging”. Most organizations are awash in digital information, but most of that is text. Not for long will it be text. “Going forward”, I said in 2003, “behind-the-firewall search systems will have to come to grip with the information-charged binary files–chemical structures, engineering drawings, audio recordings, and video.” Now five years later, only Autonomy has a reasonable solution to video. The other data types remain “outside” the behind-the-firewall system vendors capabilities.
Gray Bar
The “gray bar” was intended to be a spectrum. My lousy Photoshop skills produced this blah “gray bar”. The idea is that “Enabling” and “Analysis” are two distinct types of pressure on search and content processing opportunities. As the “yellow spheres” get bigger, they will exert pressure on the folks in the “gray bar”. Similarly, as the “purple spheres” exert their influence on users, a catalytic reaction occurs in the “gray bar”. In 2003, I identified three significant changes in the way employees will interact with digital information.
First, instead of a search box, people looking for information want some sort of information finder “landing page”. For want of a better term, I used the word portal for the notion of gaining access to information in a search and content processing system.
Second, I identified the shift from getting laundry lists of “hits” to a type of collaborative work. Vendors often forget that documents are created by people, unless you are lucky enough to live inside some hyper-advanced culture like Google’s. But the GOOG is an anomaly, so think about your company. You want to accomplish a work task. Many work tasks require working with one or more colleagues. So, the world of search and retrieval becomes an enabler of collaborative interaction.
Third, the search system is a means of keeping track of what’s been done and how information has changed. In my new study, Beyond Search, published by the Gilbane Group, I talk about one of Google’s most interesting acquisitions data management acquisitions in 2006. (A discussion of this company and its technology appears in Beyond Search.) This company was working is this type of hyper-search space, and if Google does more than launch betas, the technology could revolutionize its enterprise applications division. The point is that search is simply one facet of a much more significant set of processes coming about as the “yellow spheres” and the “purple spheres” expand and change the “pressure” for next-generation applications.
Going Nuclear at LANL
To wrap up, I was making explicit that key word search was a dead end. The action was in the “yellow spheres” and the “purple spheres”. As these various functional and technical areas grew more robust and fell in price, the notion of key words is irrelevant to the real opportunities in the “gray bar”.
In my discussion of the prescient Sagemaker technology here, I make it clear that the flabby key word search had short comings that were well known a decade ago. Now many leaders in search and retrieval are repositioning themselves–actually distancing themselves–from key word search. Not only is it a commodity, the financial difficulties of some of the highest profile vendors make it clear that generating revenue is not easy to do. You can snag Lucene (discussed here) or Flax (discussed here) and save yourself some money.
The LANL folks were not thrilled with my talk. I thought some in the audience would explode. Webmasters and government marketers had just completed a redesign of the LANL Web site. Key word search was offered, but it was slow as molasses. I think it’s been improved now. None of the functions I identified as important in the “gray bar” were available on the LANL’s public-facing or employee-only Web site.
These wizards invited a guy from rural Kentucky, and I did the intellectual equivalent of tracking mud on their white carpet. Competition for clicks among the national labs is fierce. LANL, long the number one research facility, had suffered some security disappointments and the wily wizards at Oak Ridge National Lab had rolled out a niftier Web site. Believe it or not, a high-traffic Web site makes a difference at budget time on Capitol Hill. Here I was making a mess of the new white carpet. I turned in my fancy badge and high-tailed it back to Kentucky.
Most vendors of search and content processing systems have been slow to provide the functionality shown on my amateurish diagram. These vendors are now charging forward with new positioning, new buzzwords, and new ways to explain the benefits of their systems. Like the out-of-shape athlete, some of these folks are coming into our offices looking much the worse for wear. Most are “panting laggards”–not fit for serious information access duty and several years too late.
Stephen Arnold, April 1, 2008