Search: Simplicity and Information Don’t Mix

December 1, 2008

In a conversation with a bright 30 something, I learned that a person insisted that the Google Search Appliance was “simple and easy”. I asked the person, “Did the speaker understand that information is inherently difficult so search is not usually simple?”

The 30 something did not hesitate. “Google makes the difficult look easy.”

The potential search system customer might hear the word “simple” and interpret the word and its intent based on the listener’s experience, knowledge, and context. “Simple”, like transparency, is a word that covers a multitude of meanings.

My concern is that search has to deliver information to a user with a need for fact, opinion, example, or data. None of these notions is known to the software, electrical devices, and network systems without considerable technical work. Computers are generally pretty predictable. Smart software improves the gizmo, but the smarter software becomes the less simple it is.

So, when a system like the Google Search Appliance or any search system for that matter is described as simple, I have questions. I don’t think the GSA is simple. The surface interface is simplified. The basic indexing engine is locked up and accessible via point and click interfaces or scripts that conform to the OneBox API. But anyone who has tried to cluster GSAs and integrate the system into proprietary file types knows that the word “simple” is pretty much wrong.

Now what about search becoming “simple and easy”?

Search is simple because of the browser and the need to type some words in a search box or look at a list of links and click one. Search is not simple. I would go so far as to say that any system that purports to allow a user to access digital information is one of the most complex technical undertakings engineering, programmers, and other specialists have undertaken.

That’s why search is generally annoying to most of the people who have to use the systems.

Now let’s consider the notion of a “transparent search system.” I have to tell you that I don’t know why the word “transparency” has become a code word for “not secret”. When someone tells me that a company is transparent, I don’t believe them. A company cannot be transparent. Most outfits have secrets, market with ferocity first and facts second, and wheel and deal to the best of their ability. None of this “information” becomes available unless there’s a legal matter, a security breach, or a very careless executive.

Are search systems transparent? Nope. Consider Autonomy, Google, or any of the high profile vendors of information access systems. Google does not allow licensees to poke around the guts of the GSA. Autonomy keeps the inner workings of IDOL under wraps. I have heard one Autonomy wizard say,”Sometimes we need to get Mike Lynch to work some of his famous magic to resolve an issue.” I track about 350 companies in the search and content processing space. I make my living trying to figure out how these sytems work. Sue Feldman and I wrote a 10-page paper about one small innovation that interests Google. Nothing about that innovation was transparent, nor was it “simple” I might add.

What’s Up?

I think that consultants and parvenues need an angle on search, content processing, text mining, and information access. Since search is pretty complicated, who can blame a young person with zero expertise for looking at the shopping list of issues that are addressed in Successful Enterprise Search Management, and deciding to go the “simple” route.

I understand this. I worked at a nuclear consulting firm for a number of years. I always thought I was pretty good in math, physics, and programming (if the type of programming done in 1971 could be considered sophisticated). Was I wrong? I was so wrong it took me one year to understand that I knew zero about the recent work in nuclear physics. By the end of the second year, I had a new appreciation for the role of Monte Carlo calculations in nuclear fuel rod placement. For example, you don’t inspect nuclear rods in an online reactor. You would have some helath problems. So, you used math, and you needed to be confident that when you moved those bundles of nuclear fuel around, you got the used up ones where they were supposed to go. Forget the modest health probem. The issue would be a tad more severe.

Search shares some complexity with nuclear physics. The essence of search today is hugely complex subsystgems that must perform so the overall system works. Okay, that applies to a nuclear reactor. You can’t really inspect what’s going on because there are too many data points. Yep, that’s similar to the need to know what’s happening in a reactor using math and models. A search system can exhibit issues that are tough to track down because no one human knows where a particular glitch may touch another function and cause a crash. Again, just like a nuclear reactor. Those control rooms you see in the films are complicated beasties for a reason. No one really knows what exactly is happening to cause an issue locally or remotely in the system.

Now who wants to say, “Nuclear engineereing is simple?” I don’t see too many people stepping forward. In fact, I think that most people know enough to not offer an opinion when it comes to nuclear engineering and the other disciplines required to keep the local power generation plant benign.

I can’t say the same for search. Serach is popular and it has attracted a lot of people who want to make money, be famous like a rock star, or who know one way to beat the financial down turn is to cook up an interesting spin on a hot topic. I congratulate these people, but I think the likelihood of creating trouble is going to be quite high.

I have learned in my 65 years one thing:

What looks simple isn’t.

Try and do what a professional does. You probably won’t be able to do it. Whether physical or intellectual, if you haven’t done the time, you can’t equal the professionals’. Period.

At a conference, a speaker mentioned that for a person to become accomplished, the individual has to work at a particular skill or task for 10,000 hours. I know quite a few people who have spent 10,000 or more hours working on search. I wrote a book with one of these people, Martin White. I am a partner with another, Miles Kehoe. I know maybe 50 other people in the same select group. Most of the consultants and experts I meet are not experts in search. These people are expert at being friendly or selling. Those are great compentencies, but they are not search related.

If you have read a few of my previous posts in this Web log, you know that any search or content processing system described as “simple” or “easy” is most definitely not either. Search is complicated. Marketing and sales “professionals” routinely go to meetings and say, “Search is simple. Our system is completely open. Your own technical team can maintain the system.” In most cases, I don’t believe the pitch.

That’s why the majority of users are annoyed with search in an organization. And why most of the search systems end up in quite a pickle. See the upside down and back wards engine in the picture below. How did this happen? I haven’t a clue, and that is how I react when I see a crazy search and information access system at an organization.

Let me give you an example. A large not for profit and government subsidized think tank had the following search systems: Microsoft SharePoint, Open Text, multiple Google Search Appliances, and a couple of legacy systems I had not encountered for a decade. Now the outfit wants to provide a single interface to the content processed by this grab bag of systems. What makes this tough is that one can use any of the systems to provide this access. The organization did not know how to do this and wanted to buy a new system to deliver the functionality. Crazy. What the outfit now has is another search system and the problem is just more complicated. The “real fix” required thinking about the needs of the users and performing the intensive informatoin audit needed to determine the scale of the project. This type of “grunt work” was not desirable. The person describing this situation to me said, “We want a simple solution.”

I am sure they do. I want to be 18 again and this time I want to look like Brad Pitt, not some troll from the catacombs in Paris. Won’t happen.

image

How did we get our search system in this predicament?

Three Types of Simple Search

Let me give you three examples:

  1. Boil the ocean easy. Some vendors pitch a platform. The idea is that a licensee plugs in information connectors, the system processes the content, and the user gets answers. Guano. In fact, double guano. This approach is managerially, technically, and financially complex. Boiling the ocean solutions are the core reason why such outfits as IBM, Microsoft, Oracle, and SAP give away search. By wrapping complexity inside of complexity, the fees just keep rolling in. The multi month or multi year deployment cycles guarantee that the staff responsible for this solution will have moved on. Search in most boil the ocean solutions only works for some of the users.
  2. Buy ’em all. Use Web services to hook ’em up easy. Quite a few vendors take this approach. The verbal acrobatics of “federated search” or “metasearch” gloss over the very real problems of acquiring disparate content without choking the network, building a fortune on a repository infrastructure, and transforming the content to a representation are happily ignored or marginalized. Unfortunately these federated solutions require investment, planning, and building. I wish I had a dollar every time I have heard one vendor struggling to make significant sales say the words “federated” and “easy” in the same sentence.
  3. Unpack it, plug it in, and just search easy. This argument is now coming from vendors who ship search appliances and from vendors who ship software described as appliances. Hello, earth. Are you sentient? Plugging in an appliance delivers one thing: toast. These gizmos have to be set up. You have to plan for failure which means two gizmos and maybe clusters of gizmos. In case you haven’t tried to create hot spares and fail over search systems, the work is not easy. And you haven’t tackled the problem of acquiring, transforming, and processing the content. You haven’t fiddled with the interface that marketing absolutely has to have or the MBAs throw a hissy fit. Get real. When a modern appliance breaks, you don’t fix it. You buy another one. You don’t open a black box iPod or BlackBerry and repair it. You get a new one. The same applies to search. What’s “easy” is the action you take when the system doesn’t work.

To sum up, simple search is the fool’s narcotic. The need to escape the actual complexities of information processing translates to what is little more than marketing baloney for a desperate customer. What’s scares me is that it works. I think it scares the people who are fired because they created a greater information access mess than before.

Wrap Up

I know I didn’t sell a consulting job to the publisher. I can’t take money from people who want to pay me to tell them that search is indeed simple. I can’t help a group who wants to buy a search appliance and wants me to write a memo to rubber stamp the decision. In short, I can’t tolerate the simple, transparent, and easy approach to making information accessible.

The work can be made tolerable, but it will not for the foreseeable future be easy. Let me close with a list of four reasons why search remains a challenge:

  1. You may go to court, provide a deposition, or get a nodding acqaintance with jail. The problem of knowing what’s in your email and then finding that information can have huge financial and personal consequences. Microsoft’s present challenges with regard to Vista capable machines is anchored in email. Managers cannot say, “Gee, I didn’t know that was in the email.” Today, systems have to make that information available before the attorney for the other side uses the information in your trial.
  2. You can go out of business because you make decisions that harm people. One major drug company had a lousy search system. Now that company is struggling to survive. The search vendor existed the enterprise search business. Good decision because I think this is one of the first major instances of a search vendor deploying a system that allowed harm to be done because information was not findable.
  3. You can get fired. When the financial screws tighten, financial wizards look for a way to make a list of bluebirds and canaries. Canaries are those who are responsible for software systems that cost lots of money and don’t work. You can find canaries by running queries on job search Web sites and searching for the names of well known search vendors. Why are these people looking for jobs? Call a few up and ask about the search system. I did. There was a correlation between unemployment and lousy search systems among the sample I polled.
  4. You can be ostracized. Here’s how this works. You go to a meeting about the search system and you take indirect hits in the form of comments about the search system. You are not named, but you have to take steps to fix timeliness, relevance, and performance. Some units may just smile and then go buy a separate systems and cut you out of the loop. The new system for the department will probably not be very good, but the control now lies in the hands of a group the enterpriser search system failed.

Before you swallow the narcotic labeled “simple”, step back. Evaluate. If search were simple, why would there be such a turmoil in this software sector? If vendors were transparent, would you know what was real and what was mere marketing rhetoric? If search were easy, why would you have to think about taxonomies, content transformation, and clustering?

Search is and will remain tough. Why? It’s the problem of information and human needs. Check out the discipline of epistemology. That’s what search is “about.” Technology is only a part of the challenge and a small part at that. What technology a vendor has is going to be kept close to the vest. So any silliness about transparency is search is like the smell of smoke at 3 am. The midnight oil of the marketers threatens to burn your budget. Flee and avoid the pain.

Stephen Arnold, December 1, 2008

Comments

10 Responses to “Search: Simplicity and Information Don’t Mix”

  1. sasha on December 1st, 2008 4:29 am

    History of past century shows that usage of technology don’t have to be complex,
    Video, PC, Mobile phones were tough to operate once
    Even creating complex queries can be made easy http://www.cloudtuner.com/websearch.swf
    Same goes for non-textual infromation
    http://www.cloudtuner.com/imagesearch.swf

  2. Stephen E. Arnold on December 1st, 2008 5:36 am

    Sasha,

    Thanks for your comment.

    Stephen Arnold, December 1, 2008

  3. Andreas Ringdal on December 1st, 2008 8:28 am

    @sasha The user interface is just the tip of the iceberg. Although the Cloudturner interface has some innovative ways of letting the users specify what parts of the search is important, a lot of search attempts fail long before any data reaches the users, even before it reaches the engines, and that is barely half the distance to get to the users.

    Andreas

  4. Stephen E. Arnold on December 1st, 2008 8:31 am

    Andreas Ringdal,

    The interface is what procurement teams see and understand. The interface is important. Often a great interface makes it easy for users to see the short comings of the plumbing otherwise hidden.

    Stephen Arnold, December 1, 2008

  5. CJ on December 1st, 2008 11:33 am

    Really lovely article, I enjoyed reading that very much.

    Gerard Salton said that any good system should have an element of magic, hence calling his system “Salton’s Magical Automatic Retriever of Text”.

    I’m finishing my PhD in Natural language generation and understanding, and have built conversational systems. I built chatbots to start with but the complexity of true NLG and U is phenomenal. Every time I think I have it sorted and turn a corner…there is a whole load more to learn.

    This was a nice read also, from Anna Patterson on why building a search engine is hard.

    http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=143

  6. David Eddy on December 1st, 2008 12:00 pm

    Fine article…

    Here’s a comment on the complexity of language (“Ambiguous Words”) by Dr. George Miller, the father of WordNet.

    A “simple” 13 word couplet generates 3.6 Trillion combinations. And that’s with
    simple, short real words with an average of 10 meanings per word.

    I’ve assembled a dictionary of 2,000 terms with 68,000 meanings… 34 meanings per term.

    There’s also the “guess the word” game… known as “the vocabulary problem”
    http://www.si.umich.edu/~furnas/Papers/vocab.paper.pdf

    At BEST you have a 20% chance of guessing the right word.

  7. CJ on December 2nd, 2008 11:59 am

    And get this…in computing we always have to extend WordNet because it isn’t specific enough to every domain! Nice George Miller quote.

  8. sperky on December 3rd, 2008 7:42 am

    I am reminded of tautologies (and other limitations) of hardware-software and that Google has stretched it most consistently the furthest. Then comes “soul of a new machine” Stephen Wallach with a different spin on field-programmable gate arrays and may produce hardware that might run different software or the same software differently – hardware with a different DNA. http://www.nytimes.com/2008/11/17/technology/business-computing/17machine.html?_r=1&oref=slogin
    Where will this put search, i wonder – or for that matter, Google – if, in fact, they are not already using this ?

  9. Transparency vs. Simplicity | The Noisy Channel on December 12th, 2008 10:30 pm

    […] I was a bit taken aback by a recent blog post in which Stephen Arnold seemed to attach the notion that an effective search engine could be […]

  10. Stephen E. Arnold on December 13th, 2008 5:49 pm

    Transparency vs Simplicity The Noisy Channel,

    Good. I am an addled goose and certainly less than qualified to comment about simplicity.

    Stephen Arnjold, December 13, 2008

  • Archives

  • Recent Posts

  • Meta