Search Evaluation in the Wild
March 26, 2013
If you are struggling with search, you may be calling your search engine optimization advisor. I responded to a query from an SEO expert who needed information about enterprise search. His clients, as I understood the question, were seeking guidance from a person with expertise in spoofing the indexing and relevance algorithms used by public Web search vendors. (The discussion appeared in the Search-Based Applications (SBA) and Enterprise Search group on LinkedIn. Note that you may need to be a member of LinkedIn to view the archived discussion.)
The whole notion of turning search into marketing has interested me for a number of year. Our modern technology environment creates a need for faux information. The idea, as Jacques Ellul pointed out in Propaganda, is that modern man needs something to fill a void.
How can search deliver easy, comfortable, and good enough results? Easy. Don’t let the user formulate a query. A happy quack to Resistance Quotes.
It, therefore, makes perfect sense that a customer who is buying relevance in a page of free Web results would expect an SEO expert to provide similar functionality for enterprise search. Not surprisingly, the notion of controlling search results based on an externality like key word stuffing or content flooding is a logical way to approach enterprise search.
Precision, recall, hard metrics about indexing time, and the other impedimenta of the traditional information retrieval expert are secondary to results. Like the metrics about Web traffic, a number is better than no number. If the number’s flaws are not understood, the number is better than nothing. In fact, the entire approach to search as marketing is based on results which are good enough. One can see the consequences of this thinking when one runs a query on Bing or on systems which permit users’ comments to influence relevancy. Vivisimo activated this type of value adding years ago and it still is a good example of trying to make search useful. A result which delivers a laundry list of results which forces the user to work through the document list and determine what is useful is gone. If a document has internal votes of excellence, that document is the “right” one. Instead of precision and recall, modern systems are delivering “good enough” results. The user sees one top hit and makes the assumption that the system has made decisions more informed.
There are some downsides to the good enough approach to search which deliver a concrete result which, like Web traffic statistics, looks so solid, so meaningful. That downside is that the user consumes information which may not be accurate, germane, or timely. In the quest for better search, good enough trumps the mentally exhausting methods of the traditional precision and recall crowd.
To get a better feel for the implications of this “good enough” line of thinking, you may find the September 2012 “deliverable” from Promise whose acronym should be spelled PPromise in my opinion, “Tutorial on Evaluation in the Wild.” The abstract for the document does not emphasize the “good enough” angle, stating:
The methodology estimates the user perception based on a wide range of criteria that cover four categories, namely indexing, document matching, the quality of the search results and the user interface of the system. The criteria are established best practices in the information retrieval domain as well as advancements for user search experience. For each criterion a test script has been defined that contains step-by-step instructions, a scoring schema and adaptations for the three PROMISE use case domains.
The idea is that by running what strike me as subjective data collection from users of systems, an organization can gain insight into the search system’s “performance” and “all aspects of his or her behavior.” (The “all” is a bit problematic to me.)
The methodology considers a number of interesting factors. One example is the notion of “completeness.” If content is not in an index, that content cannot be found. There are some interesting aspects of “completeness”; for instance, if a search system pumps content into an exception file, the index is incomplete. The reason may be technical. If content is not available to the system due to a server time out or some other externality, the index is not complete. However, the user does not know what should be in the index, so the user is not in a position to determine if an index is complete.
The idea of metadata is a fascinating one. For many years, consultants have been “teaching” eager listeners about taxonomies, user provided indexing, and automated systems which identify entities and perform synonym expansion. It is with some confidence that I say, “Most licensees of enterprise search systems have limited indexing resources. As a result, indexing is talked about often and done well infrequently.” For users, indexing takes second place to entering two or three keywords into a search box and looking at what the system outputs. For more advanced systems, the user looks at canned reports often with visualizations to make the important items easily spottable. The problem is that talk about metadata does not improve metadata. Asking a user about metadata is interesting but not particularly productive unless the user understands the subject of the question. When it comes to indexing, academics and consultants know their yogurt. The average user may not have the appetite for the food.
One final point. If a procurement team has the time, money, and expertise, a formal evaluation of search systems is highly desirable. The use of a checklist such as that developed with considerable academic effort can be useful.
The reality of enterprise search, however, is that information about search is sought from many places, including SEO experts and similar content spoofers. When the results of a formal head to head comparison become available to a procurement team, will these hard data take precedence over the combination of “good enough” and cost? In my experience, no. A system—often any system—which is different and cheaper will get the nod.
Who benefits from a Promise-style checklist or the one which appeared in Successful Enterprise Search Management? I am just not sure. Good enough and SEO inputs trump the rigor of delivering search which actually works, free of litigation, cost spikes, and legions of “we know every system” consultants.
Enterprise search is in a pickle. The size of the pickle can be measured by the length of the checklist. Elitists like me talk about figuring out each system’s strengths and weaknesses. I am now leaning toward the opinion, “Who cares?” Grant chasers, some venture capitalists, and some researchers do. Users? Nope. No interest in complexity. Deliver a tasty user interface via predictive methods. Good enough. Sad.
Stephen E Arnold, March 26, 2013