Search Vendors Working the Content Food Chain
January 13, 2010
In the last six months, I have noticed that three companies are making an effort to respond to ZyLAB’s success in the end-to-end content processing sector. There has been some uninformed and misleading discussion of search and content processing companies shift to vertical market solutions. I think this view distorts what some vendors are doing; namely, when one company finds a way to make sales, the other vendors pile into the Volkswagen. This is not so much “imitation as flattery”. What is happening is that sales are tough to make. When a company finds an angle, the stampede is on. In a short period of time, an underserved sector in search and content processing has more people stomping around than Lady Gaga.
Let’s go back in history, a subject that most of the poobahs, azure chip consultants, and self appointed experts avoid. The idea that certain actions have surfaced before is no fun. Identifying a “new” trend is easier, particularly when the trend spotter’s “history” extends to his / her last Google query.
The Mobius strip is non-orientable, just like search solutions that provide end-to-end solutions. A path on a Mobius strip can be twice as long as the original strip of paper. That’s a good way for me to think about end-to-end search and content processing systems. Costs follow a similar trajectory as well.
In the dim mists of time, one of the first outfits to offer and end-to-end solution to content acquisitions, indexing, and search was—believe it or not—Excalibur. The first demonstration I received of the Excalibur RetrievalWare technology included scanning, conversion of the scanned image’s text to ASCII, indexing of the ASCII for an image, and search. The information processed in that demonstration was a competitor’s marketing collateral. There were online search systems, but these were mostly small scale systems due to the brutal costs of indexing large domains of HTML. A number of companies were pushing forward with the idea of integrated scanning systems. Sure, in the 1990s you could buy a high end scanner and software. But in order to build a system that minimized the fiddly human touch, you had to build the missing components yourself. Excalibur hooked up with resellers of high end scanners from companies like Bell+Howell, Fujitsu, and others. The notion of taking a scanned image and then via an in memory processing performing optical character recognition of the page image and then indexing that ASCII was a relatively new method. UMI (a unit of Bell+Howell) had a sophisticated production process to do this work. Big outfits like Thomson were interested in this type of process because lots of information in the early 1990s was still in hard copy form. To make a long story short, the Excalibur engineers were among the first to create commercial product that mostly worked, well, sort of. The indexing was an issue. Excalibur embarked on a journey that required enhancing the RetrievalWare product, generating ready-to-use controlled vocabularies for specific business sectors like defense and banking. As you may know, Excalibur’s original vision did not work so the company mrophed into a search and content processing company with a focus on business intelligence. The firm renamed itself as Convera. The origins of the company were mostly ignored as the Convera package of services chased government work, commercial accounts like Intel and the National Basketball Association (data center SaaS functions for the former and video searching for the hoopsters). When those changes did not work out too well, Convera refocused to become a for fee version of the free Google custom search engine. That did not work out too well either, and the company has be semi-dissolved.
Why’s this important?
First, the history shows that end-to-end processing is not new. Like much of the hot search innovations, I find the discoveries of the azure chip crowd a “been there, done that” experience. Processing paper and making it searchable is a basic way to approach certain persistent problems.
Second, the synopsis of the Excalibur trajectory makes clear that senior managers of search and content processing companies scramble, following well worn paths. The constant repositioning and restating of what a technology allegedly does is a characteristic of search and content processing.
Third, the shifts and jolts in the path of the Excalibur / Convera entity are predictable. The template is:
- Start with a problem
- Integrate
- Sell
- Engineer fixes on the fly
- Fail
- Identify a new problem
- Rinse, repeat.
What has popped out of my Overflight intel system is that law firms are now looking for a solution to a persistent information problem; that is, when a legal matter fires up, most search systems work just fine with content in electronic form. The hitch is that a great deal of paper is produced. If something exists in digital form and one law firm must provide that information to another law firm, some law firms convert the digital information to paper, slap on a code, and have FedEx deliver boxes of paper. The law firm receiving this paper no longer has the luxury of paying minions to grind through the paper. The new spin on the problem is that the law firm’s information technology people want to buy a hardware-software combination that allows a box of paper to be put in one end and the magic between the hard copy and the searchable, electronic instance of the documents are magically completed.
Well, that’s the idea. Some of the arabesques that vendors slap on this quite difficult problem include:
- Audit records so a law firm knows who looked at what when and for how long
- A billing method. Law firms want to do invoices, of course
- A single point solution so there is “one throat to choke”.
What the companies want is what Excalibur asserted it had almost 20 years ago.
ZyLAB, under the firm hand of Johann Scholtes (a former Dutch naval officer), has made inroads in this market sector. You can read an interview with him in the Search Wizards Speak series, so I won’t recycle that information in this write up.
Autonomy was quick to move to build out its end-to-end solutions for law firms and other clients with a paper and digital content problem. In fact, Autonomy just received an award for its end-to-end eDiscovery platform.
Brainware offers a similar system. That company, a couple of years ago, told me that it had to add staff to handle the demand for its scanning and search solution. Among the firm’s largest customers were law firms and, not surprisingly, the Federal government. You can read an interview with a Brainware executive (who is an attorney) in the Search Wizards Speak series.
I learned that Recommind has inked a deal with Daeja Image Systems for its various document processing software components. The idea is to be able to provide an end-to-end solution to law firms, government agencies, and other outfits that need a system that provides access to paper based content and digital content.
Let’s step back.
What this addled goose sees in these recent announcements is that the “new” is little more than a rediscovery that law firms have not yet cracked the back of the paper to digital job and been able to get a search system that provides access to the source material. Sure, there were solutions 20 years ago, but those solutions don’t meet a continuing need. Notice that this problem has been around for a long time, and I don’t think the present crop of solutions will solve the problem fully.
Here’s why:
- Law firms have to deal with a broader range of content than simple paper, Adobe PDF files, and Word files. The email, the attachments, and sheer volume of stuff puts the law firm on the wrong end of a very big time and cost problem. In short, the volume of data to be processed and search expands to fill the available capacity. Once that capacity is reached, the cost of doing more becomes and issue. One reaction is to live with the problems of an existing system. Some firms do a rip and replace. Other firms turn to an outsourcing solution. And other law firms buy another system. In effect, they leave a legacy system in place and just go get a “new” solution. This increases the costs to the law firm, which is a problem as well.
- The availability of content from grassroots or social systems is an issue. Tweets, Facebook pages, and other effluvia of online communications cannot be ignored. Searching for key words in a 140 character tweet is often an exercise in frustration, so the indexing method has to do some serious contextualization. Lawyers don’t like surprises, so the pressure is on to handle social content.
- Lawyers are not too good at technology. If a technical decision goes wrong, the costs to the law firm can be significant. Hence, lawyers from the flashiest firm in Chicago, if teleported to Dickens’ England, most of them could go to work without missing a beat. Change, therefore, is sometimes glacial.
In my view, each of the vendors I have identified can improve information processing and access at any law firm or government agency. But the volume of content and what attorneys want to do with that content keeps changing. At some point, the Excalibur effect kicks in. The cycle begins again.
Is the legal market a new niche? No. Are the vendors delivering new solutions? Maybe. Will these solutions allow the law firm to deal with tomorrow’s content, discovery, and filing challenge? Possibly. What’s the outlook? I think an azure chip consulting firm will write a report profiling the end to end content processing vendors, pronounce the best of breed vertical search solution, and the world will chug along without much significant change.
Is a vertical solution new in search? Nope. Never will be.
Stephen E Arnold, January 13, 2010
Oyez, oyez, I was not paid to write this essay. I herewith report this unpleasant fact to the Department of Justice, an outfit with knowledge of how law, law firms, and eDiscovery vendors innovate.