Search: A Kitchen Sink and the Carcassonne Problem

March 25, 2008

As I worked on my keynote for the upcoming Buying and Selling eContent Conference in April 2008, I flipped through PowerPoint decks in search of examples. I came across a presentation I delivered in the summer of 2006. In that talk, I described behind-the-firewall search as following an interesting trajectory. Humans have a tendency to elaborate, embroider, and complicate.

Let me give you an example. My mother and father recently moved from their home to a condominium-style dwelling. The “space” was a blank canvas. After a year, I noticed that the white space was filled in. Some of the objects were family mementos like the hand-carved ebony elephant that has been in the Arnold family for a century. But other acquisitions were plaques identifying my mother as a “red hat lady”. My father had taped instructions for replacing the cartridge in his printer next to his flat panel monitor. In short, the white space was being filled in.

I noticed a similar “stuffing” when I was in Carcassonne, the walled city in Aude. Every square inch inside the city walls had been put to use. Carcassonne might strike some Texans who are used to wide open spaces as claustrophobic. I thought that it made fungible a human’s instinct to fill blank spaces out of necessity or desire to make more dense her environment.

image

This is an aerial view of Carcassonne.

Looking at the PowerPoint deck created three years ago, the thought struck me that vendors of search and content processing systems have been elaborating their systems. In one sense, increasing complexity fascinates the eye. When I looked at the intricate mosaics in a small mosque in Istanbul, the patterns were hypnotic. My appreciation was superficial because I could not understand the message in the script which formed a portion of the pattern. On the other hand, I noticed that as my eyes struggled to follow the interlocking elements, I found myself getting lost. I had to look for a point of reference, and then I would begin my visual exploration again.

Complexity is both fascinating and somewhat disorienting. Search and content processing, I think, have been unwittingly moving from the simple–the technical equivalent of white space–to the more complex. Behind-the-firewall search (my preferred term for enterprise search, which I think misleading) used to mean:

Basic Search

For many years, an inverted index and Boolean queries defined behind-the-firewall search. When key word search did not work, paper files usually were available. When search did not work, a manual Easter egg hunt was a fall back.

Vendors understood the limitations of key word search even if some did not communicate that insight to their customers. Over time, key word systems began to accumulate new functions and features. Customers had an appetite for systems that offered a way out of what I call the “shackles of the search box”. I discuss this idea in my new study, Beyond Search.

Here’s a simple depiction of the enhancements that have now entered the mainstream search systems. You no longer license a system that can “do” key word search. You get a more robust search system, illustrated below:

New functions

I could stuff more technologies into this “red ball”. The present fascination with social search is one example, and the buzz surrounding Radar Networks Twine service. Another is NLP or natural language processing. There are many definitions for this concept. The idea is that the user can submit to a search system a question such as “What’s the height of the Washington Monument?” The system “understands” this question and generates an answer. Try this yourself by clicking this link. Microsoft’s Live.com search system can understand “natural language”? If you try this query on Google, you’ll see that Google doesn’t do a very good job on this query. A little experimentation reveals that Google does understand airline schedules, however. Two technical giants “sort of” understand natural language. We’re making progress, but I don’t think the nuances are as important as the elaboration of the search systems that’s taking place.

There are some interesting implications to this movement from the simple key word search to the more complex techniques now proliferating within vendors’ offerings. First, basic key word search is reasonably simple, or, at least, as easy as search can be. Stuffing in these additional functions bumps up the complexities of the system. Complex systems cost more to build, configure, tune, and maintain. Cost overruns often are a direct consequence of this “snowball” approach to search.

Second, the more processes, the more computational horsepower a system requires. I hear a great deal of talk about the benefits of massively parallel systems. The fly in the ointment is that programming for multi-core, parallel processing is time consuming and therefore expensive. Some vendors take short cuts and fall back on the decades old way to speed up software–throw more hardware at the problem. With more iterations across a document, more opportunities for performance hot spots arise.

Third, stuffing additional functions into search makes it possible to talk about “value adds”. What’s a search “value add”? Instead of a laundry list of results, the additional functions allow a system user to get “actionable intelligence”. This phrase and any buzzword with “knowledge” as a component connotes a more important function. What we now have available from dozens of vendors are what I call “kitchen sink” systems.

Kitchen sink

We are now firmly in the grip of a “kitchen sink” approach to behind-the-firewall search. Let’s do a quick run down of the upside and downside of this phenomenon.

The upsides are easy–almost too easy–to rattle off:

  1. Users like point-and-click interfaces. Some vendors call this approach “assisted navigation”. No search is needed. The user looks for a likely suspect word or phrase, clicks it, and receives information.
  2. Graphical interfaces make the information access experience easier to enhance. Let’s face it. Laundry lists are tough to make enticing.
  3. Sizzle sells. Vendors with enhanced functions can give killer demonstrations. (Many customers can’t tell the difference between a demo and the “real” system.)

The downside of these enhanced search systems are also easy to identify. For reasons of prudence or ignorance, the negatives are not often discussed at professional gatherings or in trade publications or Web logs. Making waves is not a good idea if you don’t want your tiny boat swamped by an angry pass of a giant search vendor bearing down in a super-charged speed boat. I will take my chances, since I’m over the hill and working from my redoubt in the wilds of Kentucky:

  1. The complicated functions are often fiendishly difficult to get working the way a licensee wants the functions to work. The “demo” doesn’t necessarily map to the licensee’s information reality
  2. The add-ons are not well-integrated. Oh, these functions can be integrated, but I have heard reports of vendors who say, in effect, “Well, you can pay us to hook these functions into your system.”
  3. When the enhancements work, the incoming content is rejected by these systems. The licensee says, “What’s going on?” The vendor says, “You will need to ‘normalize’ your content.” Often, the “content” is not an issue. Well, the source content is an issue because organizations have a fruit cake of file types, formats, and document versions. Without normalization, enhanced search systems reject non-conforming content, so the enhanced system is a keyword system with incomplete indexing.

If you and I were in Carcassonne, we would be able to walk around with little danger of getting lost. The city is walled, and if you stumble around long enough, you will be find your way back to your starting point. The challenge becomes when you try to explain where the print making shop is located.

Several years ago, I was in Venice. In one of the book shops, a number of maps of the city were available for sale. What struck me was the complexity in the small space available. Human ingenuity created incredible complexity in a small space. Search, in many ways, is a small “space” in a large organization. Human ingenuity is again crafting complexity in order to maximize utility. When I tried to walk around Venice, I found myself losing my way. A native can move through the city without confusion. The problem surfaces when two people try to explain to one another how to find the print maker. A local can find the shop. But the local to whom I spoke could not explain how to get from where I was to the print maker’s shop.

As search becomes more complex and more functions squeeze into this “small space”, the seasoned search traveler will enjoy few wrong turns. At this time, most people engaged in behind-the-firewall search are new to the territory.

Here is the announcement that ProQuest had licensed the Fast Search & Transfer search system. ProQuest has a great deal of content by commercial database measurements, but it has a small amount of information compared to Google, for example. I wonder, “Might this be an example of the ‘kitchen sink’ approach to search plus my Carcassonne problem? In a small space, no one can easily explain how to answer a trivial question.”

I find it fascinating that “owners” of aggregated content are embracing systems that offer a wide range of enhanced features. I wonder if it is competitive pressure from newcomers to sci-tech content like Microsoft and Yahoo? Maybe Thomson, owner of Westlaw is making moves I have not yet discerned? Perhaps it is the implicit challenge of “the Internet” as a killer of gatekeepers?

When I think about my use of professional information services, I am comfortable with key word queries, Boolean logic, and the use of controlled term lists. These are the equivalent of a basic, even primitive, search system. I am not certain that “kitchen sink” functions deliver the type of pay back that obviously the senior management of ProQuest and Fast Search & Transfer have projected.

Information for professionals is very different from information for the everyday Web user like my mom. Professional information is jargonized. The value of an advanced degree or significant industry experience is that a person learns the lingo of the business. ECC means one thing to a computer scientist and another to a nuclear engineer. Outsiders can’t make head nor tail of certain professional, sci-tech information. Disciplines are clubs, and you have to prove that you are worthy to enter the sacred precinct of knowledge.

Many chemists embrace structure searching. Information professionals enjoy crafting Boolean statements. Historians delight in knowing the synonymy of Zeno and Xeno. Maybe the powerful search features that answer questions, generate reports, and provided assisted navigation will help most users? We’ll know when the new ProQuest system becomes available.

Fast Search & Transfer’s system includes numerous features, amply documented in my profile of ESP (the Fast Search Enterprise Search Platform) in the third edition of the Enterprise Search Report. I haven’t seen the Fast Search chapter in the fourth edition, so I can’t speak to what the publisher included in the most recent edition. But in the third edition I documented 10 “products” ranging from InPerspective to ProPublish. I also described eight core functions that included classification, navigation and drill down, and rules, among five others.

To conclude, let me pose some rhetorical questions. I invite your answers as well.

First, how will the functions map to the content domain in a way that avoids escalating complexity?

Second, will the human tendency to “fill in the white spaces” take precedence over a user’s need to get from Point A (an information need) and Point B (an answer)?

Third, will the time and effort costs of the options available inhibit exploration?

Finding one’s way through a beautiful, historic city is fun. Will the “kitchen sink” approach to information retrieval enhance a user’s travels in an information space? Will the journey become so complicated that movement within the content domain becomes a challenge in itself? I look forward to the opportunity to use the enhanced ProQuest system and hear from other users about their journeys.

Let me ask you, “Have you retained simplicity in your search and content processing system?” With user disenchantment with search creeping upwards, giving users what they need may be more important than elaborating the design.

Stephen Arnold, March 25, 2008

Comments

3 Responses to “Search: A Kitchen Sink and the Carcassonne Problem”

  1. sperky undernet on March 26th, 2008 11:57 am

    Two totally different items come to my mind in the simplicity -complexity conundrum:
    Italo Calvino’s “Invisible Cities” which I am looking at as a conceptual search vision long after reading/studying it to prepare for a trip to Venice, and the experience of the original Predicasts PROMT which was based on the assumption that there is more to business information than global PR releases and so the value in abstracts in English of foreign language items and in intensive indexing. The first is based on non-verbal meaning, the second on translation and abstracts just before English, full text and B&I became dominant.

    During a long nighttime canal boat journey in Venice, it was the second and less used tour book with the simplified graphical London underground type-map with all canal stations listed (and not the geographical based one without all the stops identified) that made the difference between lost and found.

    Finally, the chamber of commerce organization at which I worked for over a decade decided to go the “enhanced” graphical route after tiring of the original “primitive” DOS-format solution which after all was/is a solution. After a long transition and without a bigtime budget but with bigtime goals, I am not sure the project ever got off the ground. But in the meantime, attention to actual service product was reinvested in dreamy buzz words which, even when attained, would serve more to manage customers than to fulfill them.

  2. Stephen E. Arnold on March 26th, 2008 5:11 pm

    When a vendor lands a customer, it is important to keep that customer on the reservation. New features might do this. See the Autonomy “pan enterprise” story on the news section of Beyond Search. Is this what you think a control play might look like?
    Stephen Arnold, March 26, 2008

  3. sperky undernet on March 27th, 2008 12:53 pm

    If by “control play” you mean whatever razzle dazzle it takes to entrap the customer, then yes, it could look like that. I would suggest, however, that there are many subjective factors why management decides on a particular vendor, non of which factor in what is necessary for a solution. The wrongful assumption being that, since all systems “work”, why not hire the brother-in-law’s friend, whose solution may indeed work for a totally different type of of organization. For small and middle sized enterprises, I believe this happens not infrequently.

  • Archives

  • Recent Posts

  • Meta