Recommind: Following the Search Imperative

January 10, 2008

I opened my Yahoo alerts this morning, January 10, 2008, and read:

Recommind Predicts 2008 Enterprise Search and eDiscovery Trends: Search Becomes the Information Foundation of the … — Centre Daily Times Wed, 09 Jan 2008 5:32 AM PST

According to the enterprise search and eDiscovery technology experts at Recommind, 2008 will be the year that enterprise search and eDiscovery converge to become top areas of focus for enterprises worldwide, creating substantial growth and evolution in the management of electronic information.

The phrase “foundation of the electronic enterprise” struck me as meaningful and well-turned. Most search experts know Recommind by name only. I profiled the company in the third edition of The Enterprise Search Report, the last one that I wrote. I support the excellent fourth edition, but I did not do any of the updating for that version of the study. I’m confining my efforts to shorter, more specialized analyses.

The company once focused on the legal market. My take on the company’s technology was that it relied on Bayesian algorithms.

The Recommind product can deliver key word search. The company has a patented algorithm that implements “probabilistic latent semantic analysis.” I will discuss latent semantic indexing in “Beyond Search”. For our purpose, Recommind’s system identifies and analyzes the distribution in a document of concept-related words. The approach uses statistical methods to predict an item’s relevance. .

The Recommind implementation of these algorithms differentiate the company’s system from Autonomy’s. Autonomy, as you may know, is the high-profile proponent of “automatic” or “automated” text processing. The idea (and I am probably going to annoy the mathematicians who read this article) is that Bayesian algorithms can operate without human fiddling. The phrase “artificial intelligence” is often applied to a Bayesian system when it feeds information about processed content back into the content processing subsystem. The notion is that Bayesian systems can be implemented to adapt to the content flowing through the system. As the system processes content, the system recognizes new entities, concepts, and classifications. The phrase “set it and forget it” may be used to describe a system similar to Autonomy’s or Recommind’s. Keep in mind that each company will quickly refine my generalization. For my purposes, however, I’m not interested in the technology. I’m interested in the market orientation the news story makes clear.

Recommind is no longer a niche player in content processing. Recommind is cursoring the heartland of IBM, Microsoft, and Oracle: big business, the Fortune 1000, the companies that have money and will spend it on systems that enhance the firm’s revenue or control the firm’s costs. Recommind is an “enterprise content solutions vendor”

Some History

Lawyers are abstemious, far better at billing their clients than spending on information technology. Recommind offered a reasonably priced solution for what’s now called “eDiscovery.”

eDiscovery means collecting a set of documents, typically those obtained through the legal discovery process and processing them electronically. The processing part can have a number of steps, ranging from scanning, performing optical character recognition, and generating indexable files to performing relatively simple file transformation tasks. A simple transformation task is to take electronic mail and segment the message and save it, then save any attachment such as a PowerPoint presentation. Once a body of content obtained through the legal discovery process is available, that context is indexed.

Legal discovery means, and I am simplifying in this explanation, that each side in a legal matter must provide information to the opposing side. In complex matters, more than two law firms and usually more than two attorneys will be working on the matter. In the pre-digital age, discover involved keeping track of each discovered item manually, affixing an agreed upon identification number on the item, and making photocopies. The photocopies were — and still are in many legal proceedings — punched and placed in binders. The binders, even for relatively modest legal actions, can proliferate like gerbils. In a major legal action, the physical information can run to hundreds of thousands of documents.

eDiscovery, therefore, is the umbrella term for converting the text to electronic form, indexing it, and making that index available for those authorized to find and read those documents.

The key point about discovery is that it is not key word search. Discovery means that the system somehow finds out the important information in a document or collection of documents and makes that finding evident to a user. No key word query is needed. The user can read an email alert, click on a hot link that says, “The important information is here”, or displays a visual representation of what’s in a mass of content. Remember: discovery means no key word query, no reading of the document to find out what’s in it. Discovery is the most recent Holy Grail in information retrieval despite its long history in specialized applications like military intelligence.

Recommind found success in the eDiscovery market. The product was reasonably priced, particularly when compared to a brand name, high profile system such as those available from Autonomy, Endeca, Fast Search & Transfer, iPhrase (now a unit of IBM), and Stratify. Instead of six figures, think in terms of $30,000 to $50,000. For certain law firms, spending $50,000 to manipulate discovered materials electronically was preferable to spending $330,000.

The problem with the legal market is that litigation and legal matters come and go. For a vendor of eDiscovery tools, marketing costs chew away at margins. Only a small percentage of law firms maintain a capability to process case-related materials in a single system. The pattern is to gear up for a specific legal matter, process the content, and then turn off the system when the matter closes. Legal content related to a specific case is encumbered by necessary controls about access, handling of the information once the matter is resolved, and specific actions that must be taken with regard to the information obtained in eDiscovery; for example, expert witnesses must return or destroy certain information at the close of a matter.

The point is that eDiscovery systems are designed to make it possible for a law firm to comply with the stipulations placed on information obtained in the discovery process.

Approaches to eDiscovery

Stratify, now a unit of Iron Mountain, is one of the leaders in eDiscovery. Once called Purple Yogi and the darling of the American intelligence community, Stratify has narrowed its marketing to eDiscovery. The Stratify system performs automatic processes along with key word indexing of documents gathered via legal discovery. The system has been tuned for legal applications. Licensees receive knowledge bases with legal terms, a taxonomy, and an editorial interface so the licensing firm can add, delete, or modify the knowledge bases. Stratify is priced in a way that is similar to the approach taken by the Big Three (Autonomy, Endeca, and Fast Search & Transfer) in search; that is, fees in the hundreds of thousands of dollars are more common than $50,000 fees. Larger license fees are needed because the marketing costs are high, and the search vendors have to generate enough revenue to avoid plunging into financial shortfalls. Second, the higher fees make sense to large, cash rich organizations. Many companies want to pay more in order to get better service or the “best available” solution. Third, other factors may be operating such as the advice of a consultant or the recommendation of a law firm also working on the matter.

eDiscovery can also be performed using generalized and often lower-cost products. In the forthcoming “Beyond Search: What to Do When Your Search System Doesn’t Work”, I profile a number of companies offering software systems that can make discovered matter searchable. For most of these firms, the legal market is a sideline. Selling software to law firms requires specialized knowledge of legal proceedings, a sales person familiar with how law firms work, and marketing that reaches attorneys in a way that makes them comfortable. The legal market is a niche, and anyone can buy the names of lawyers from various sources, lawyers are not an easy market to penetrate.

Recommind, therefore, has shifted its marketing from the legal niche to the broader, more general market for Intranet search or what I call “behind the firewall” search. The term “enterprise search” is devalued, and I want to steer clear of giving you the impression that a single search systems can serve the many information access needs of a growing organization. More importantly, there’s a belief that “one size fits all” in search. That is a misconception. The reality is that an organization will have a need for many different types of information access systems. At some point in the future, there may be a single point solution, but for the foreseeable future, organizations will need separate, usually compartmentalized systems to avoid personnel, legal, and intellectual property problems. I will write more about this in “Beyond Search” and in this Web log.

Trajectory of Recommind

Recommind’s market trajectory is important. The company’s shift from a niche to a broader market segment illustrates how content processing companies must adapt to the revenue realities in selling search solutions. Recommind has moved into a market sector where a general purpose solution at a competitive price point should be easier to sell. Instead of the specialized sales person for the niche market, a sales person with more generalized experience can be hired. The small number of law firms is somewhat limited and has become saturated. The broader enterprise market consists of the Fortune 1000 and upwards of 15 million small- and mid-sized businesses. Most of these need and want a “better” search solution. Recommind’s expansion of its marketing into this broader arena makes sense, and it illustrates what many niche vendors often do to increase their revenues.

Here’s the formula and a diagram to illustrate this marketing shifting. Click on the thumbnail to view the illustration:

Increase the number of prospects for a search system by moving to a larger market. Example: from lawyers to general business or intelligence community in Washington, DC to business intelligence in companies; or from pharmaceutical text mining to general business text mining.
Simplify the installation, minimizing the need for specialized knowledge bases, tuning, and time-consuming set up. Example: offer a plug-and-play solution, emphasize speedy deployment, provide a default configuration that delivers advanced features without manual set up and time-consuming “training” of the system.
Maintain a competitive price point because the “vendor will make it up on volume”. With more customers and shorter buying cycles, the vendor will have increased chances to land a large account that generates substantial fees when customization or special functionality are required.
Boost the return on investment for research, development, sales, marketing, and customer support. The business school logic is inescapable to many search vendors. Note that these MBA (master of business administration) assumptions prove false is not my concern in this point. Search vendors can’t make their revenue goals in small niches and remain profitable, grow, and fund R&D. The search vendors have to find a way to grow and expand margins quickly. The broader business market is a solution that most content processing companies implement.

Implications of Market Shifts

Based on my research, several implications of moving upmarket, offering general purpose solutions, and expanding service options receive scant attention in the trade and business press. Let’s look at several. Keep in mind that my data and experience are unique. Your view may be different, and I welcome your view points. Let’s look at what I have learned:

First, smaller, specialized vendors have to move from a niche to a broader market. Examples range from the aforementioned Stratify, which moved from the U.S. intelligence niche to the broader business niche, only to narrow its focus in the broader business niche to handling special document collections. Iron Mountain saw value in this positioning and acquired Stratify. Vivisimo, which originally offered on-the-fly clustering, has repositioned itself as a vendor of “behind the firewall” search. The company’s core technology remains intact, but the firm has added functionality as it moves from a narrow “utility” vendor to a broader, “behind the firewall” vendor. Exegy, a vendor of special purpose, high-throughput processing technology, has moved from intelligence to financial services. This list can be expanded, but the point is clear. Search vendors have to move into broader markets in order to have a chance at making enough sales to generate the return investors demand. Stated another way, content processing vendors must find a way to expand their customer base or die.

Second, larger vendors — for example, the Autonomys, Endecas, and their ilk — must offer more and more services in an effort to penetrate more segments of the broader search market. Autonomy, in a sense, had to become a platform. Autonomy had to acquire Verity to get more upsell opportunities and more customers quickly. And the company had to diversify from search into other, adjacent information access and management services such as email management with its acquisition of Zantaz. The imperative to move into more markets and grow via acquisition is driving some of the industry consolidation now underway.

Third, established enterprise software vendors must move downmarket. IBM, Microsoft, and Oracle have to offer more information management, access, and processing services. A failure to take this step means that the smaller, more innovative companies moving from niches into broader business markets will challenge these firm’s grip on enterprise customers. Microsoft, therefore, had to counter the direct threat posed by Coveo, Exalead, ISYS, and Mondosoft (now SurfRay), among others.

Fourth, specialized vendors of text mining or business intelligence tools will find themselves subject to some gravitational forces. Inxight, the text analysis spin out of Xerox Palo Alto Research Center, was purchased by Business Objects. Business Objects was then acquired by SAP. After years of inattention, companies as diverse as Siderean Software (a semantic systems vendor with assisted navigation and dashboard functionality) to MarkLogic (an XML-on-steroids and data management vendor) will be sucked into new opportunities. Executives at both firms suggested to me that their products and services were of interest to superplatforms, search system vendors, and Fortune 1000 companies. I expect that both these companies will be themselves discovered as organizations look for “beyond search” solutions that work, mesh with existing systems, and eliminate if not significantly reduce the headaches associated with traditional information retrieval solutions.

I am reluctant to speculate on the competitive shifts that these market tectonics will bring in 2008. I am confident that the market for certain content processing companies is very bright indeed.

Back to Recommind

Recommind, therefore, is a good example of how a niche vendor of eDiscovery solutions can and must move into broader markets. Recommind is important, not because it offers a low-cost implementation of the Bayesian algorithms in the Autonomy system. Recommind warrants observation because it makes a useful case study of certain search sector market imperatives visible. As the diagram depicts, albeit somewhat awkwardly, is that each segment of the information retrieval market is in movement. Niche players must move upmarket and outwards. Superplatforms must move downmarket and into niches. Business intelligence system vendors must move into mainstream applications.

Exogenous Forces

The diagram omits two important exogenous forces. I will comment on these in another Web log article. For now, let me identify these two “storm systems” and offer several observations about search and content processing.

The first force is Lucene. This is the open source search solution that is poking its nose under a number of tents. IBM, for example, uses Lucene in some of its search offerings. A start up in Hungary called Tesuji offers Lucene plus engineering support services. Large information companies like Reed Elsevier continue to experiment with Lucene in an effort to shake free of burdensome licensing fees and restrictions imposed by established vendors. Lucene is not likely to go away, and with a total cost of ownership at a baseline of zero in licensing fees, some organizations will find the system warranting further investigation. More importantly, Lucene has been one of the factors turbo charging the “free search software” movement. The only way to counter certain chess moves is a symmetric action. Lucene, not Google or other vendors, is the motive force behind the proliferation of “free” search.

The second force is cloud computing. Google is often identified as the prime mover. It’s not. The notion of hosted search is an environmental factor. Granted, cloud based information retrieval solutions remain off the radar for most information technology professionals. Recall, however, that the core of hosted search is the commercial database industry. LexisNexis, Dialog, and Ebscohost are, in fact, hosted solutions for specialized content. Blossom Software, Exalead, Fast Search & Retrieval, and other content processing vendors offer off-premises or hosted solutions. The economics of information retrieval translate to steadily increasing interest in cloud based solutions. And when the time is right, Amazon, Google, Microsoft, and others will be offering hosted content processing solutions. In part it will be a response to what Dave Girouard, a Google executive calls, the “crisis in IT”. In part, it will be a response to economics. Few — very, very few — professionals understand the total cost of information retrieval. When the “number” becomes known, a market shift from on premises to cloud-based solutions will take place, probably with some velocity.

Wrap Up

Several observations are warranted:

First, Recommind is an interesting company to watch. It is, a microcosm of broader industry trends. The company’s management has understood the survival imperative and implemented a solution that becomes obvious in today’s market. Expand or stagnate.

Second, tectonic forces are at work that will reshape the information retrieval, content processing, and search market as it exists today. It’s not just consolidation; search and its cousins will become part of a larger data management fabric.

Third, there’s a great deal of money to be made as these forces grind through the more than 200 companies offering content processing solutions. Innovation, therefore, will continue to bubble up from U.S. research computing programs and outside the U.S. Tesuji is Hungary is just one example of dozens of innovative approaches to content processing.

Fourth, the larger battle is not yet underway. Many analysts see hand to hand combat between Google and Microsoft. I don’t. I think that for the next 18 to 24 months, battles will range within niches, among established search vendors, and among the established enterprise software vendors. Google is a study in “controlled chaos”. With this approach, Google is not likely to mount any single, direct attack on anything until the “controlled chaos” yields that data Google needs before deciding on a specific course of action.

Search is dead. At least the key word variety. Content processing is alive an well. The future is broader: data management and data spaces. As we rush forward, opportunities abound for licensees, programmers, entrepreneurs, and vendors. We are living in a transition from the Dark Ages of key word search to a more robust, more useful approach.

Stephen E. Arnold
10 January 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Search

Comments

One Response to “Recommind: Following the Search Imperative”

Clear Search History on January 21st, 2008 7:14 pm

[…] Recommind: Following the Search Imperative […]

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search