Searchenstein: Pensée d’escalier

May 1, 2008

At the Boston Search Engine Meeting, I spoke with a certified search wizard-ette. As you know, my legal eagle discourages me from proper noun extraction in my Web log essay. This means I can’t name the person, nor can I provide you with the name of her employer. You will have to conjure a face less wizard-ette from your imagination. But she’s real, very real.

Set up: the wizard-ette wanted to ask me about Lucene as an enterprise search system. But that was a nerd gambit. The real question was, “Will I be able to graft an add on to perform semantic processing or text mining system on top of Lucene and make the hybrid work?”

The answer is, “Yes but”. Most search and content processing systems are monsters. Some are tame; others are fierce. Only a handful of enterprise search systems have been engineered to be homogeneous.

I knew this wizard-ette wasn’t enthralled with a “yes but”. She wanted a definitive, simple answer. I stumbled and fumbled. Off she drifted. This short essay, then, contains my belated pensée d’escalier.

What Is a Searchenstein?

A searchenstein is a content processing or information access system that contains a great many separate pieces. These different systems, functions, and sub systems are held together with scripts; that is, digital glue or what the code jockeys call middleware. The word middleware sounds more patrician than scripts. (In my experience, a big part of the search and retrieval business reduces to word smithing.)

Searchenstein is a search and content processing system cobbled together from different parts. There are several degrees of searchensteinism. There’s a core system built to a strict engineering plan and then swaddled in bastard code. Instead of working to the original engineering plan, the MBAs running the company take the easier, cheaper, and faster path. Systems from the Big Three of enterprise search are made up of different parts, often from sources that have little knowledge or interest in the system onto which the extras will be bolted. Other vendors have an engineering plan, and the third-party components are more tastefully integrated. This is the difference between a car customization by a cash-strapped teen and the work of Los Angeles after market specialists who build specialized automobiles for the super rich.

searchenstein

This illustration shows the body parts of a searchenstein. In this type of system, it’s easy to get lost in the finger pointing when a problem occurs. Not only are the dependencies tough to figure out, it’s almost impossible to get one’s hand on the single throat to choke.

Another variant is to use many different components from the moment the company gets in the search and content processing business. The complexities of the system are carefully hidden, often in a “black box” or beneath a zippy interface. You can’t fiddle with the innards of the “black box.” The reason, according to the vendor, may be to protect intellectual property. Another reason is that the “black box” is easily destabilized by tinkering.

A third variant is to have a core stub that doesn’t work too well. The original search engine is little more than a digital appendix. When a search vendor buys another search company, the acquired company’s logo may be on the system and there may be a vestigial component buried somewhere. But the buyer substitutes her own search system and other bits and pieces. The licensee doesn’t know what’s in the system. (I am itching to name one vendor who not only has orphaned code but uses clunky third-party code and then shifts some other functions to a hosted service.) This approach is an example of what I call a semi-virtual searchenstein system.

I want to make clear that it is indeed possible for a vendor to use one component, original code, and third-party software to build a cohesive, well-designed system. The difference between a best-practice hybrid search system and a searchenstein boils down to these key points:

  1. Latency, stability, and performance are engineering issues that have been considered and addressed. More significantly, the legitimate vendor will explain what’s used in the system and what has been done to make the system suitable for prime time.
  2. The middleware is there but it has been engineered to operate within the constraints of a Web service. The approach may include a hack or two, but the idea is that engineering approach can be explained. An engineer coming fresh to a problem has a fighting chance of figuring out how to fix or customize a function.
  3. Two or more companies may have been merged. There are numerous examples of search and content processing vendors approaching death, being acquired, and then hooked into other systems. I can name three companies formed in this way. As long as the engineers have taken the time to smooth the edges of the integration, these systems can work in a production environment.

In short, there are white hat searchensteins and black hat searchensteins. And there are quite a few of each type available for you to license today.

You Have ‘Em but Don’t Recognize Them … Yet

How can you figure out if your system is a searchenstein and then see if the code is wearing a white hat or a black hat? Here are several points to investigate. (There’s more detail in my new study Beyond Search which is available from the Gilbane Group.)

  1. Ask the vendor for a list of what vendors’ software is under the hood. Ask if the search system, text processing, or visualization components are available as open source? A yes is not a negative. What’s important is how the vendor responds to your request.
  2. Run stress tests and identify bottlenecks. Probe the vendor on why the bottleneck surfaced. In some cases, a bottleneck can result from a script that hooks one part of a search system to another component. Again, the fact that scripts are used to make a system work is routine. The way the vendor handles your question reveals a great deal.
  3. Obtain a list of dependencies; for example, the Microsoft SharePoint search requires specific Microsoft services and frameworks. The more dependencies, the greater the chance for problems. At tuning or troubleshooting time, complexities equal unknown costs.

When the Monster Strikes

Let’s imagine a very rare scenario. Your search system doesn’t work. You know you have a searchenstein. What do you do? Unfortunately, there are a handful of options to exercise. Your reinstall and reindex. Your rip and replace. You yourself license another system and create your very own searchenstein.

The bullet proof fix is to rely on a vendor who operates a hosted service. I know of two vendors who can deliver top notch search in a matter of a day, maybe less. I want to ID these folks, but I have to tow the legal line. You can send me email, and if I’m sufficiently alert I can give you a couple of hints about whom to email. (I’m seaky2000 at yahoo dot com. Don’t call me. I don’t even want my mom to call me.)

If you can’t get your searchenstein under control, you are likely to face some interesting challenges going forward in your career at your present place of employment. Searchensteins can be most unpleasant. Vendors won’t tell you this. I just have.

Observations

Search is a tough problem even when the vendor has engineered meticulously and evidences the highest business standards. As search morphs into business intelligence or disappears into a customer support utility, the problems remain. When today’s complex systems are deconstructed, the challenges that bedevil so many organizations are easy to understand. Stay ahead of the curve by considering these actions:

  1. Cut through the marketing baloney and ignore the strong arm tactics some vendors delight in using. Focus on what you require and determining by tests if a system can deliver.
  2. Understand that search consists of many components. Searchensteins are not necessarily evil; searchensteins can become problematic when you haven’t managed them. A bad dog is a product of a bad owner. A bad search system is the product of a bad licensee–at least in some instances.
  3. Start small, move in incremental steps, and keep expectations of management and users in check. Years ago I wrote about the “cliff phenomena” in online. People expect so much most end up on the top of a high hill at the edge of a precipice. When expectations can’t be met, the hapless user crashes to the reality below leaving the expectations behind.

Maybe the wizard-ette will circle back to the old geezer in Kentucky? I’m a heck of a lot more reliable than some of the searchensteins out in the enterprise wilderness. I’m also harmless. Searchensteins aren’t.

Stephen Arnold, May 2, 2008

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta