Coveo: Pushing Beyond Search

April 26, 2008

I’ve been briefed on the Coveo technology. I also labor for cash for the owners of CRM Magazine. Nevertheless, I want to point out that the “Rising Star” Award underscores an interesting shift in search and retrieval. You can read about the award here.

Coveo, one of the companies known for its “snap in” solution to the woes of Microsoft SharePoint’s built in search system has been recognized for its customer relationship management services. CRM is the god father of the self-service customer support movement. The idea is that customers can help themselves solve problems if the customer can find the information. Coveo’s system does that well. On the flip side, the people manning the customer support toll free lines and digging through the email need technology to find answers as well. CRM Magazine’s award underscores Coveo’s ability to deliver on that front as well.

Coveo has been successful in moving “beyond search” with its interface and assisted-search interface. But the company has also won key accounts where vendors such as RightNow, Oracle, and others have long held sway. Coveo, based in frosty Québec City, Québec, continues to innovate despite the long winters and endless hockey season.

Stephen Arnold, April 25, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, News, Search | Comments Off on Coveo: Pushing Beyond Search

Microsoft Chomps and Swallows Fast

April 26, 2008

It’s official. On April 24, 2008, Fast Search & Transfer became part of the Microsoft operation. You can read the details at Digital Trends here, the InfoWorld version here, or Examiner.com’s take here.

John Lervik, the Fast Search CEO, will become a corporate vice president at Microsoft. He will report to Jeff Teper, the corporate vice president for the Office Business Platform at Microsoft. The idea–based on my understanding of the set up–is that Dr. Lervik will develop a comprehensive group of search products and services. The offerings will involve Microsoft Search Sever 2008 Express, search for the Microsoft Office SharePoint Server 2007, and the Fast Enterprise Search Platform. Despite my age, I think the idea is to create a single enterprise search platform. Lucky licensees of Fast Search’s technology prior to the buy out will not be orphaned. Good news indeed, assuming the transition verbiage sets like hydrated lime, pozzolana, and aggregate. Some Roman concrete has been solid for two thousand years.

This is an example of Roman concrete. The idea of “set in stone” means that change is difficult. Microsoft has some management procedures that resist change.

A Big Job

The job is going to be a complicated one for Microsoft’s and Fast Search’s wizards.

First, Microsoft has encouraged partners to develop search solutions for its operating system, servers, and applications. The effort has been wildly successful. For example, if you are one of the more than 80 million SharePoint users, you can use search solutions from specialists like Interse in Denmark to add zip to the metadata functions of SharePoint, dtSearch to deliver lightning-fast performance with a natural language procession option, Coveo for clustering and seamless integration. You can dial into SurfRay’s snap in replacement for the native SharePoint search. You can turn to the ISYS Search System which delivers fast performance, entity extraction, and other other “beyond search” features. In short, there are dozens of companies who have developed solutions to address some of the native search weaknesses in SharePoint. So, one job will be handling the increased competition as the Fast Search team digs in while keeping “certified gold partners” reasonably happy.

This is a ceramic rendering of two of the “10,000 Immortals”. The idea is that when one Immortal is killed, another one takes his place. Microsoft’s certified gold partners–if shut out of the lucrative SharePoint aftermarket for search–may fight to keep their customers like the “10,000 Immortals”. The competitors will just keep coming until Microsoft emerges victorious.

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Microsoft, Search | 4 Comments

Search Turbocharging: A Boost for Search Company Valuations?

January 13, 2008

PCWorld’s January 12, 2008, story “Micrsoft’s FAST Bid Signals a Shift in Search.” The story is important because it puts “behind the firewall” search in the high beams.

A Bit of History

Fast Search & Tansfer got out of the online Web search and advertising business in early 2003. CNet covered the story thoroughly. Shortly after the deal either John Lervik or Bjorn Laukli, both Fast Search senior executives, told me, “Fast Search will become the principal provider of enterprise search.” In 2003, there was little reason to doubt this assertion. Fast Search was making progress with lucrative U.S. government contracts via its partner AT&T. Google’s behind-the-firewall search efforts were modest. Autonomy and Endeca each had specific functionality that generlly allowed the companies to compete in a gentlemanly way, often selling the same Forture 1000 company their search systems. Autonomy was automatic and able to process large volumes of unstructured content; Endeca at that time was more adept at handling structured information and work flow applications. Fast Search was betting that it could attack the enterprise market and win big.

Now slightly more than four years later, the bold bet on the enterprise market has created an interesting story. The decision to get out of Web search and advertising may prove to be one of the most interesting decisions in search and retrieval. Most of the coverage of the Microsoft offer to buy Fast Search focuses on the here and now, not the history. Fast Search suffered some financial set backs in 2006 and 2007, but the real setback from my point of view is in the broader enterprise market.

Some Rough Numbers for Context

Specifically, after four years of playing out its enterprise strategy, Fast Search has fallen behind Autonomy. That company’s revenues are likely to be about 30 percent higher than Fast Search’s on an annualized basis, roughly $300 million to $200 million over the last 12 months. (I’m rounding gross revenues for illustrative purposes.) Endeca is likely to hit the $90 to $100 million target in 2008, so these three companies generate collectively gross revenues of about $600 million. Now here’s the kicker. Google’s often maligned Google Search Appliance has more than 8,000 licensees. I estimate that the gross revenue from the GSA is about $350 million per year. Even if I am off in my estimates (Google refuses to answer my questions or acknowledge my existence), my research suggests that as of December 31, 2007, Google was the largest vendor of “behind the firewall” search. This estimate excludes the bundled search in the 65 million SharePoint installations and the inclusion of search in other enterprise applications.

One more data point, and again I am going to round off the numbers to make a larger point. Google’s GSA revenue is a fraction of Google’s projected $14 billion gross revenue in calendar 2007. Recall that at the time Fast Search got out of Web search and advertising, Google was generating somewhere in the $50 to $100 million per year and Fast Search was reporting revenue of about $40 million. Since 2003, Google has caught up with Fast Search and bypassed it in revenue generated revenue from the enterprise search market sector.

The Fast Search bet bought the high octane performance Microsoft bid. However, revenue issues, employee rationalization, and eroding days sales outstanding figures suggest that the Fast Search vehicle has some mechanical problems. Perhaps technology is the issue? Maybe management lacked the MBA skills to keep the pit crew working at its peak? Could the market itself changed in a fundamental way, looking for a something that was simpler and required less tinkering? I certainly don’t know.

What’s Important in a Search Acquisition?

Now back to the PCWorld story by IDG’s Chris Kanaracus. We learn that Microsoft got a deal at $1.2 billion and solid technology. Furthermore, various pundits and industry executives focus on the “importance” of search. One type of “importance” is financial because $1.2 billion for a company with $200 million in revenue translates to six times annual revenue. Another type of importance is environmental because the underperforming “behind the firewall” search sector got some much-needed publicity.

What we learn from this article is that “behind the firewall” search is still a highly uncertain. There’s nothing in the Micrsoft offer that clarifies the specifics of Micrsoft’s use of the Fast Search technology. The larger market remains equally murky. Search is not one thing. Search is key word indexing, text mining, classifying, and metatagging. Each of these components is complicated and tricky to set up and maintain. Furthermore, the vendors in the “behind the firewall” space can change their positioning as easily as a n F-1 team switches the decals on its race car.

Another factor is that no one outside of Google knows what Google, arguably the largest vendor of “behind the firewall” search will or will not do. Ignoring Google in the enterprise market is easy and convenient. A large number of “behind the firewall” search systems skirt Google or dismiss the company’s technology by commenting about it in an unflattering manner.

I think it’s a mistake. Before the pundits and the vendors start calculating their big paydays from Microsoft’s interest in Fast Search & Technology, Google cannot be ignored; otherwise, the dip in Microsoft shares cited in the PCWorld article might like a flashing engine warning light. Shifting into high gear is useless if the engine blows up.
Stephen E. Arnold
January 14, 2008

Written by Stephen E. Arnold · Filed Under Search | 1 Comment

Computerworld’s Take on Enterprise Search

January 12, 2008

Several years ago I received a call. I’m not at liberty to reveal the names of the two callers, but I can say that both callers were employed by the owner of Computerworld, a highly-regarded trade publication. Unlike its weaker sister, InfoWorld, Computerworld remains both a print and online publication. The subject of the call was “enterprise search” or what I now prefer to label “behind-the-firewall search.”

The callers wanted my opinion about a particular vendor of search systems. I provided a few observations and said, “This particular company’s system may not be the optimal choice for your organization.” I was told, “Thanks. Goodbye” IDG promptly licensed the system against which I cautioned. In December 2007 at the international online meeting in London, England, an aquaintance of mine who works at another IDG company complained about the IDG “enterprise search” system. When I found myself this morning (January 12, 2008) mentioned in an article authored by a professional working at an IDG unit, I invested a few moments with the article, an “FAQ” organized as questions and answers.

In general, the FAQ snugly fitted what I believe are Computerworld’s criteria for excellence. But a few of the comments in the FAQ nibbled at me. I had to work on my new study Beyond Search: What to Do When Your Search System Doesn’t Work, and I had this FAQ chewing at my attention. A Web can be a useful way to test certain ideas before “official” publication. Even more interesting is that I know that IDG’s incumbent search system, ah, disappoints some users. Now, before the playoff games begin I have an IDG professional cutting to the heart of search and content processing. The article “FAQ: Why Is Enterprise Search Harder Than Google Web Search?” references me. The author appears to be Eric Lai, and I don’t know him, nor do I have any interaction with Computerworld or its immedite parent, IDC, or the International Data Group, the conglomerate assembled by Patrick McGovern (blue suit, red tie, all the time, anywhere, regardless of the occasion).

On the article’s three Web pages (pages I want to add that are chock full of sidebars, advertisements, and complex choices such as Recommendations and White Papers) Mr. Lai’s Socratic dialog unfurls. The subtitle is good too: “Where Format Complications Meet Inflated User Expectations”. I cannot do justice to the writing of a trained, IDC-vetted journalist backed by the crack IDG editorial resources, of course. I’m a lousy writer, backed by my boxer dog Tyson and a moonshine-swilling neighbor next hollow down in Harrods Creek, Kentucky.

Let me hit the key points of the FAQ’s Socratic approach to the thorny issues of “enterprise search”, which is remember “behind-the-firewall search” or Intranet search. After thumbnailing each of Mr. Lai’s points, I will offer comments. I invite feedback from IDC. IDG, or anyone who has blundered into my Beyond Search Web log.

Point 1: Function of Enterprise Search

Mr. Lai’s view is that enterprise search makes information “stored in their [users’] corporate network available. Structured and unstructured data must be manipulated, and Mr. Lai on the authority of Dr. Yves Schabes, Harvard professor and Teragram founder, reports that a dedicated search system executes queries more rapidly “though it can’t manipulate or numerically analyze the data.”

Beyond Search wants to add that Teragram is an interesting content processing system. In Mr. Lai’s discussion of this first FAQ point, he has created a fruit salad mixed in with his ones and zeros. The phrase “enterprise search” is used as a shorthand way to refer to the information on an organization’s computers. Although a minor point, there is no “enterprise” in “enterprise search” because indexing behind-the-firewall information means deciding what not to index or at least, what content is available to whom under what circumstances. One of the gotchas in behind-the-firewall search, therefore, is making sure that the system doesn’t find and make available personal information, health and salary information, certain sensitive information such as what division is up for sale, and the like. A second comment I want to make is that Teragram is what I classify as a “content processing system provider”. Teragram’s technology, which has been used at the New York Times and America Online can be an enhancement to other vendors’ technology. Finally, the “war of words” that rages between various vendors about performance of database systems is quite interesting. My view is that behind-the-firewall search and the new systems on offer from Teragram and others in the content processing sector are responding to a larger data management problem. Content processing is a first step toward breaking free of the limitations of the Codd database. We’re at an inflection point and the swizzling of technologies presages a far larger change coming. Think dataspaces, not databases, for example. I discuss dataspaces in my new study out in April 2008, and I hope my discussion will put the mÃ©lange of ideas in Mr. Lai’s first Socratic question in a different context. The change from databases to dataspaces is more than a two consonants.

Point 2: Google as the Model for Learning Search

Mr. Lai’s view is that a user of Google won’t necessarily be able to “easily learn” [sic] “enterprise search” system.

I generally agree with the sentiment of the statement. In Beyond Search I take this idea and expand it to about 250 pages of information, including profiles of 24 companies offering a spectrum of systems, interfaces, and approaches to information access. Most of the vendors’ systems that I profile offer interfaces that allow the user to point-and-click their way to needed information. Some of the systems absolve the user of having to search for anything because work flow tools and stored queries operated in the background. Just-in-time information delivery makes the modern systems easier to use because the hapless employee doesn’t have to play the “search box guessing game.” Mr. Lai, I believe, finds query formulation undaunting. My research reveals the opposite. Formulating a query is difficult for many users of enterprise information access systems. When a deadline looms, employees are uncomfortable trying to guess the key word combination that unlocks the secret to the needed information.

Point 3: Hard Information Types

I think Mr. Lai reveals more about his understanding of search in this FAQ segment. Citing our intrepid Luxembourgian, Dr. Schabes, we learn about eDiscovery, rich media, and the challenge of duplicate documents routinely spat out by content management systems.

The problem is the large amounts of unstructured data in an organization. Let’s reign in this line of argument. There are multiple challenges in behind-the-firewall search. What makes information “hard” (I interpret the word “hard” as meaning “complex”) involves several little-understood factors colliding in interesting ways. [a] In an organization there may be many versions of documents, many copies of various versions, and different forms of those documents; for example, a sales person may have the Word version of a contract on his departmental server, but there may be an Adobe Portable Document Format version attached to the email telling the client to sign it and fax the PDF back. You may have had to sift through these variants in your own work. [b] There are files types that are in wide use. Many of these may be renegades; that is, the organization’s over-worked technical staff may be able to deal with some of them. Other file types such as iPod files, digital videos of a sales pitch captured on a PR person’s digital video recorder, or someone’s version of a document exported using Word 2007’s XML format are troublesome. Systems that process content for search and retrieval have filters to handle most common file types. The odd ducks require some special care and feeding. Translation: coding filters, manual work, and figuring out what to do with the file types for easy access. [c] Results in the form of a laundry list are useful for some types of queries but not for others. The more types of content processed by the system, the less likely a laundry list will be useful. Not urprisingly, advanced content processing systems produce reports, graphic displays, suggestions, and interactive maps. When videos and audio programs are added to the mix, the system must be able to render that information. Most organizations’ networks are not set up to shove 200 megabyte video files to and fro with abandon or alacrity. You can imagine the research, planning, and thought that must go into figuring out what to do with these types of digital content. None is “hard”. What’s difficult is the problem solving needed to make these data and information useful to an employee so work gets done quickly and in an informed manner. Not surprisingly, Mr. Lai’s Socratic approach leaves a few nuances in the tiny spaces of the recitation of what he thinks he heard Mr. Schabes suggest. Note that I know Mr. Schabes, and he’s an expert on rule-based content processing and Teragram’s original rule nesting technique, a professor at Harvard, and a respected computer scientist. So “hard” may not be Teragram’s preferred word. It’s not mine.

Point 4: Enterprise Search Is No More Difficult than Web Search

Mr. Lai’s question burrows to the root of much consternation in search and retrieval. “Enterprise search” is difficult.

My view is that any type of search ranks as one of the hardest problems in computer science. There are different types of problems with each variety of search–Web, behind-the-firewall, video, question answering, discovery, etc. The reason is that information itself is a very, very complicated aspect of human behavior. Dissatisfaction with “behind-the-firewall” search is due to many factors. Some are technical. In my work, when I see yellow sticky notes on monitors or observe piles of paper next to a desk, I know there’s an information access problem. These signs signal the system doesn’t “work”. For some employees, the system is too slow. For others, the system is too complex. A new hire may not know how to finagle the system to output what’s needed. Another employee may be too frazzled to be able to remember what to do due to a larger problem which needs immediate attention. Web content is no walk in the park either. But free Web indexing systems have a quick fix for problem content. Google, Microsoft, and Yahoo can ignore the problem content. With billions of pages in the index, missing a couple hundred million with each indexing pass is irrelevant. In an organization, nothing angers a system user quicker than knowing a document has been processed or should have been processed by the search system. When the document cannot be located, the employee either performs a manual search (expensive, slow, and stress inducing) or goes ballistic (cheap, fast, and stress releasing). In either scenario or one in the middle, resentment builds toward the information access system, the IT department, the hapless colleague at the next desk, or maybe the person’s dog at home. To reiterate an earlier point. Search, regardless of type, is extremely challenging. Within each type of search, specific combinations of complexities exist. A different mix of complexities becomes evident within each search implementation. Few have internalized these fundamental truths about finding information via software. Humans often prefer to ask another human for information. I know I do. I have more information access tools than a nerd should possess. Each has its benefits. Each has its limitations. The trick is knowing what tool is needed for a specific information job. Once that is accomplished, one must know how to deal with the security, format, freshness, and other complications of information.

Point 5: Classification and Social Functions

Mr. Lai, like most search users and observers, have noses that twitch when a “new” solution appears. Automatic classification of documents and support of social content are two of the zippiest content trends today.

Software that can suck in a Word file and automatically determine that the content is “about” the Smith contract, belongs to someone in accounting, and uses the correct flavor of warranty terminology is useful. It’s also like watching Star Trek and hoping your BlackBerry Pearl works like Captain Kirk’s communicator. Today’s systems, including Teragram’s, can index at 75 to 85 percent accuracy in most cases. This percentage can be improved with tuning. When properly set up, modern content processing systems can hit 90 percent. Human indexers, if they are really good, hit in the 85 to 95 percent range. Keep in mind that humans sometimes learn intuitively how to take short cuts. Software learns via fancy algorithms and doesn’t take short cuts. Both humans and machine processing, therefore, have their particular strengths and weaknesses. The best performing systems with which I am familiar rely on humans at certain points in system set up, configuration, and maintenance. Without the proper use of expensive and scarce human wizards, modern systems can veer into the ditch. The phrase “a manager will look at things differently than a salesperson” is spot on. The trick is to recognize this perceptual variance and accommodate it insofar as possible. A failure to deal with the intensely personal nature of some types of search issues is apparent when you visit a company where there are multiple search systems or a company where there’s one system–such as the the one in use at IDC–and discover that it does not work too well. (I am tempted to name the vendor, but my desire to avoid a phone call from hostile 20-year-olds is very intense today. I want to watch some of the playoff games on my couch potato television.)

Point 6: Fast’s Search Better than Google’s Search

Mr. Lai raises the question that is similar to America’s fascination with identifying the winner in any situation.

We’re back to a life-or-death, winner-take-all knife fight between Google and Microsoft. No search technology is necessarily better or worse than another. There are very few approaches that are radically different under the hood. Even the highly innovative approaches of companies such as Brainware and its “associative memory” approach or Exegy with its juiced up hardware and terabytes of on board RAM appliance share some fundamentals with other vendors’ systems. If you slogged through my jejune and hopelessly inadequate monographs, The Google Legacy (Infonortics, 2005) and Google Version 2.0 (Infonortics, 2007), and the three editions I wrote of The Enterprise Search Report (CMSWatch.com, 2004, 2005, 2006) you will know that subtle technical distinctions have major search system implications. Search is one of these areas with a minor tweak can yield two quite distinctive systems even though both share similar algorithms. A good example is the difference between Autonomy and Recommind. Both use Bayesian mathematics, but the differences are significant. Which is better? The answer is, “It depends.” For some situations, Autonomy is very solid. For others, Recommind is the system of choice. The same may be said of Coveo, Exalead, ISYS Search Software, Siderean, or Vivisimo, among others. Microsoft will have some work to do to understand what it has purchased. Once that learning is completed, Microsoft will have to make some decisions about how to implement those features into its various products. Google, on the other hand, has a track record of making the behind-the-firewall search in its Google Search Appliance better with each point upgrade. The company has made the GSA better and rolled out the useful OneBox API to make integration and function tweaking easier. The problem with trying to get Google and Microsoft to square off is that each company is playing its own game. Socratic Computerworld professionals want both companies to play one game, on a fight-to-the-death basis, now. My reading of the data I have is that a Thermopylae is not now or in the near future in the interests of either Google of Microsoft to clash too much. The companies have different agendas, different business models, and different top-of-mind problems to resolve. The future of search is that it will be invisible when it works. I don’t think that technology is available from either Google or Microsoft at this time.

Point 7: Consolidation

Mr. Lai wants to rev the uncertainty engine, I think. We learn from the FAQ that search is still a small, largely unknown market sector. We learn that big companies may buy smaller companies.

My view is that consolidation is a feature of our market economy. Mergers and acquisitions are part of the blood and bones of business, not a characteristic of the present search or content processing sector. The key point that is not addressed is the difficulty of generating a sustainable business selling a fuzzy solution to a tough problem. Philosophers have been trying to figure out information for a long time and have done a pretty miserable job as far as I can tell. Software that ventures into information is going to face some challenges. There’s user satisfaction, return on investment, appropriate performance, and the other factors referenced in this essay. The forces that will ripple through behind-the-firewall search are:

Business failure. There are too many vendors and too few buyers willing to pay enough to keep the more than 350 companies’ sustainable
Mergers. A company with customers and so-so technology is probably more valuable than a company with great technology and few customers. I have read that Microsoft was buying customers, not Fast Search & Transfer’s technology. Maybe? Maybe not.
Divestitures and spin outs. Keep in mind that Inxight Software, an early leader in content processing, was pushed out of Xerox’s Palo Alto Research Center. The fact that it was reported as an acquisition by Business Objects emphasized the end game. The start was, “Okay, it’s time to leave the nest.”

The other factor is not consolidation; it is absorption. Information is too important to leave in a stand-alone application. That’s why Microsoft’s Mr. Raikes seems eager to point out that Fast Search would become part of SharePoint.

Net-Net

The future, therefore, is that there will be less and less enthusiasm for expensive, stand-alone “behind-the-firewall” search. Information access is part of larger, higher-value information access solutions.

Stephen E. Arnold
January 13, 2008

Written by Stephen E. Arnold · Filed Under Search | 2 Comments

« Previous Page

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.