Searchenstein: Pensée d’escalier

May 1, 2008

At the Boston Search Engine Meeting, I spoke with a certified search wizard-ette. As you know, my legal eagle discourages me from proper noun extraction in my Web log essay. This means I can’t name the person, nor can I provide you with the name of her employer. You will have to conjure a face less wizard-ette from your imagination. But she’s real, very real.

Set up: the wizard-ette wanted to ask me about Lucene as an enterprise search system. But that was a nerd gambit. The real question was, “Will I be able to graft an add on to perform semantic processing or text mining system on top of Lucene and make the hybrid work?”

The answer is, “Yes but”. Most search and content processing systems are monsters. Some are tame; others are fierce. Only a handful of enterprise search systems have been engineered to be homogeneous.

I knew this wizard-ette wasn’t enthralled with a “yes but”. She wanted a definitive, simple answer. I stumbled and fumbled. Off she drifted. This short essay, then, contains my belated pensée d’escalier.

What Is a Searchenstein?

A searchenstein is a content processing or information access system that contains a great many separate pieces. These different systems, functions, and sub systems are held together with scripts; that is, digital glue or what the code jockeys call middleware. The word middleware sounds more patrician than scripts. (In my experience, a big part of the search and retrieval business reduces to word smithing.)

Searchenstein is a search and content processing system cobbled together from different parts. There are several degrees of searchensteinism. There’s a core system built to a strict engineering plan and then swaddled in bastard code. Instead of working to the original engineering plan, the MBAs running the company take the easier, cheaper, and faster path. Systems from the Big Three of enterprise search are made up of different parts, often from sources that have little knowledge or interest in the system onto which the extras will be bolted. Other vendors have an engineering plan, and the third-party components are more tastefully integrated. This is the difference between a car customization by a cash-strapped teen and the work of Los Angeles after market specialists who build specialized automobiles for the super rich.

searchenstein

This illustration shows the body parts of a searchenstein. In this type of system, it’s easy to get lost in the finger pointing when a problem occurs. Not only are the dependencies tough to figure out, it’s almost impossible to get one’s hand on the single throat to choke.

Another variant is to use many different components from the moment the company gets in the search and content processing business. The complexities of the system are carefully hidden, often in a “black box” or beneath a zippy interface. You can’t fiddle with the innards of the “black box.” The reason, according to the vendor, may be to protect intellectual property. Another reason is that the “black box” is easily destabilized by tinkering.

Read more

Traditional Publishers: Patricians under Siege

April 19, 2008

This is an abbreviated version of Stephen Arnold’s key note at the Buying and Selling eContent Conference on April 15, 2008. A full text of the remarks is here.

Roman generals like Caesar relied on towers spaced about 3000 feet apart. Torch signals allowed messages to be passed. Routine communications used a Roman version of the “pony express”, based on innovations in Persia centuries before Rome took to the battlefield.

Today, you rely on email and your mobile phones. Those in the teens and tweens Twitter and use “instant” social messaging systems like those in Facebook and Google Mail. Try to Imagine how difficult it would be for Caesar to understand the technology behind Twitter. but how many of you think Caesar would have hit upon a tactical use of this “faster that flares” technology?

Read more

« Previous Page

  • Archives

  • Recent Posts

  • Meta