Open Source Search: The Me Too Method Is Thriving
November 5, 2012
In the first three editions of The Enterprise Search Report (2003 to 2007), my team and I wrote, we made it clear that the commercial enterprise search vendors were essentially a bunch of me-too services.
The diagrams for the various systems were almost indistinguishable. Some vendors used fancy names for their systems and others stuck with the same nomenclature used in the SMART system. I pointed out that every enterprise search system has to perform certain basic functions: Content acquisition, indexing, query processing, and administration. But once those building blocks were in place, most of the two dozen vendors I profiled added wrappers which created a “marketing differentiator.” Examples ranged from Autonomy’s emphasis on the neuro linguistic processing to Endeca’s metadata for facets to Vivisimo’s building a single results list from federated content.
The rota fortunae of the medieval software licensee. A happy quack to http://www.artlex.com/ArtLex/Ch.html
The reality was that it was very difficult for the engineers and marketers of these commercial vendors to differentiate clearly their system from dozens of look-alikes. With the consolidation of the commercial enterprise search sector in the last 36 months, the proprietary vendors have not changed the plumbing. What is new and interesting is that many of them are now “analytics,” “text mining,” or “business intelligence” vendors.
The High Cost of Re-Engineering
The key to this type of pivot is what I call “wrappers” or “add ins.” The idea is that an enterprise search system is similar to the old Ford and GM assembly lines of the 1970s. The cost for changing those systems was too high. The manufacturers operated them “as is”, hoping that chrome and options would give the automobiles a distinctive quality. Under the paint and slightly modified body panels, the cars were essentially the same old vehicle.
Commercial enterprise search solutions are similar today, and none has been overhauled or re-engineered in a significant way. That is okay. When a company licenses an enterprise search solution from Microsoft or Oracle, the customer is getting the brand and the security which comes from an established enterprise search vendor.
Let’s face it. The RECON or SDC Orbit system is usable without too much hassle by a high school student today. The precision and recall are in the 80 top 85 percent range. The US government has sponsored a text retrieval program for many years. The results of the tests are not widely circulated. However, I have heard that the precision and recall scores mostly stick in the 80 to 85 percent range. Once in a while a system will perform better, but search technology has, in my opinion, hit a glass ceiling. The commercial enterprise search sector is like the airline industry. The old business model is not working. The basic workhorse of the airline industry delivers the same performance as a jet from the 1970s. The big difference is that the costs keep on going up and passenger satisfaction is going down.
Open Source: Moving to Center Stage
But I am not interested in commercial enterprise search systems. The big news is the emergence of open source search options. Until recently, open source search was not mainstream. Today, open source search solutions are mainstream. IBM relies on Lucene/Solr for some of its search functions. IBM also owns Web Fountain, STAIRS, iPhrase, Vivisimo, and the SPSS Clementine technology, among others. IBM is interesting because it has used open source search technology to reduce costs and tap into a source of developer talent. Attivio, a company which just raised $42 million in additional venture funding, relies on open source search. You can bet your bippy that the investors want Attivio to turn a profit. I am not sure the financial types dive into the intricacies of open source search technology. Their focus is on the payoff from the money pumped into Attivio. Many other commercial content processing companies rely on open source search as well.
The interesting development is the emergence of pure play search vendors built entirely on the Lucene/Solr code. Anyone can download these “joined at the hip” software from the Apache Foundation. We have completed an analysis of a dozen of the most interesting open source search vendors for a big time consulting firm. What struck the ArnoldIT research team was:
- The open source search vendors are following the same path as the commercial enterprise search vendors. The systems are pretty much indistinguishable.
- The marketing “battle” is being fought over technical nuances which are of great interest to developers and, in my opinion, almost irrelevant to the financial person who has to pay the bills.
- The significant differentiators among the dozen companies we analyzed boils down to the companies’ financial stability, full time staff, and value-adding proprietary enhancements, customer support, training, and engineering services.
What this means is that the actual functionality of these open source search systems is similar to the enterprise proprietary solutions. In the open source sector, some vendors specialize by providing search for a Big Data environment or for remediating the poor search system in MySQL and its variants. Other companies sell a platform and leave the Lucene/Solr component as a utility service. Others just take the Lucene/Solr and go forward.
The Business View
In a conversation with Paul Doscher, president of LucidWorks, I learned that his organization is working through the Project Management Committee (PMC) Group of the Lucene/Solr project within the Apache Software Foundation to build the next-generation search technology. The effort is to help transform people’s ability to turn data into decision making information.
This next generation search technology is foundational in developing a big data technology stack to enable enterprisers to reap the rewards of the latest wave of innovation.
The key point is that figuring out which open source search system does what is now as confusing and time consuming as figuring out the difference between the proprietary enterprise search systems was 10 years ago.
Will there be a fix for me-too’s in enterprise search. I think that some technology will be similar and probably indistinguishable to non-experts? What is now raising the stakes is that search systems are viewed as utilities. Customers want answers, visualizations, and software which predicts what will happen. In my opinion, this is search with fuzzy dice, 20 inch chrome wheels, and a 200 watt sound system.
The key points of differentiation for me will remain the company’s financial stability, its staff quality, its customer service, its training programs, and its ability to provide engineering services to licensees who require additional services. In short, the differentiators may boil down to making systems pay off for licensees, not marketing assertions.
In the rush to cash in on organizations’ need to cut costs, open source search is now the “new” proprietary search solution. Buyer beware? More than ever. The Wheel of Fortune in search is spinning again. Who will be a winner? Who will be a loser? Place your bets. I am betting on open source search vendors with the service and engineering expertise to deliver.
Stephen E Arnold, November 5, 2012