Featured

Google and Findability without the Complexity

Shortly after writing the first draft of Google: The Digital Gutenberg, “Enterprise Findability without the Complexity” became available on the Google Web site. You can find this eight page polemic at http://bit.ly/1rKwyhd or you can search for the title on—what else?—Google.com.

Six years after the document became available, Google’s anonymous marketer/writer raised several interesting points about enterprise search. The document appeared just as the enterprise search sector was undergoing another major transformation. Fast Search & Transfer struggled to deliver robust revenues and a few months before the Google document became available, Microsoft paid $1.2 billion for what was another enterprise search flame out. As you may recall, in 2008, Convera was essentially non operational as an enterprise search vendor. In 2005, Autonomy bought the once high flying Verity and was exerting its considerable management talent to become the first enterprise search vendor to top $500 million in revenues. Endeca was flush with Intel and SAP cash, passing on other types of financial instruments due to the economic downturn. Endeca lagged behind Autonomy in revenues and there was little hope that Endeca could close the gap between it and Autonomy.

Secondary enterprise search companies were struggling to generate robust top line revenues. Enterprise search was not a popular term. Companies from Coveo to Sphinx sought to describe their information retrieval systems in terms of functions like customer support or database access to content stored in MySQL. Vivisimo donned a variety of descriptions, culminating in its “reinvention” as a Big Data tool, not a metasearch system with a nifty on the fly clustering algorithm. IBM was becoming more infatuated with open source search as a way to shift development an bug fixes to a “community” working for the benefit of other like minded developers.

image

Google’s depiction of the complexity of traditional enterprise search solutions. The GSA is, of course, less complex—at least on the surface exposed to an administrator.

Google’s Findability document identified a number of important problems associated with traditional enterprise search solutions. To Google’s credit, the company did not point out that the majority of enterprise search vendors (regardless of the verbal plumage used to describe information retrieval) were either losing money or engaged in a somewhat frantic quest for financing and sales).

Here are the issues Google highlighted:

  • User of search systems are frustrated
  • Enterprise search is complex. Google used the word “daunting”, which was and still is accurate
  • Few systems handle file shares, Intranets, databases, content management systems, and real time business applications with aplomb. Of course, the Google enterprise search solution does deliver on these points, asserted Google.

Furthermore, Google provides integrated search results. The idea is that structured and unstructured information from different sources are presented in a form that Google called “integrated search results.”

Google also emphasized a personalized experience. Due to the marketing nature of the Findability document, Google did not point out that personalization was a feature of information retrieval systems lashed to an alert and work flow component. Fulcrum Technologies offered a clumsy option for personalization. iPhrase improved on the approach. Even Endeca supported roles, important for the company’s work at Fidelity Investments in the UK. But for Google, most enterprise search systems were not personalizing with Google aplomb.

Google then trotted out the old chestnuts gleaned from a lunch discussion with other Googlers and sifting competitors’ assertions, consultants’ pronouncements, and beliefs about search that seemed to be self-evident truths; for example:

  • Improved customer service
  • Speeding innovation
  • Reducing information technology costs
  • Accelerating adoption of search by employees who don’t get with the program.

Google concluded the Findability document with what has become a touchstone for the value of the Google Search Appliance. Kimberly Clark, “a global health and hygiene company,” reduced administrative costs for indexing 22 million documents. The costs of the Google Search Appliance, the consultant fees, and the extras like GSA fail over provisions were not mentioned. Hard numbers, even for Google, are not part of the important stuff about enterprise search.

One interesting semantic feature caught my attention. Google does not use the word knowledge in this 2008 document.

Several questions:

  1. Was Google unaware of the fusion of information retrieval and knowledge?
  2. Does the Google Search Appliance deliver a laundry list of results, not knowledge? (A GSA user has to scan the results, click on links, and figure out what’s important to the matter at hand, so the word “knowledge” is inappropriate.)
  3. Why did Google sidestep providing concrete information about costs, productivity, and the value of indexing more content that is allegedly germane to a “personalized” search experience? Are there data to support the implicit assertion “more is better.” Returning more results may mean that the poor user has to do more digging to find useful information. What about a few, on point results? Well, that’s not what today’s technology delivers. It is a fiction about which vendors and customers seem to suspend disbelief.

With a few minor edits—for example, a genuflection to “knowledge—this 2008 Findability essay is as fresh today as it was when Google output its PDF version.

Several observations:

First, the freshness of the Findability paper underscores the staleness and stasis of enterprise search in the past six years. If you scan the free search vendor profiles at www.xenky.com/vendor-profiles, explanations of the benefits and functions of search from the 1980s are also applicable today. Search, the enterprise variety, seems to be like a Grecian urn which “time cannot wither.”

Second, the assertions about the strengths and weaknesses of search were and still are presented without supporting facts. Everyone in the enterprise search business recycles the same cant. The approach reminds me of my experience questioning a member of a sect. The answer “It just is…” is simply not good enough.

Third, the Google Search Appliance has become a solution that costs as much, if not more, than other big dollar systems. Just run a query for the Google Search Appliance on www.gsaadvantage.gov and check out the options and pricing. Little wonder than low cost solutions—whether they are better or worse than expensive systems—are in vogue. Elasticsearch and Searchdaimon can be downloaded without charge. A hosted version is available from Qbox.com and is relatively free of headaches and seven figure charges.

Net net: Enterprise search is going to have to come up with some compelling arguments to gain momentum in a world of Big Data, open source, and once burned twice shy buyers. I wonder why venture / investment firms continue to pump money into what is same old search packaged with decades old lingo.

I suppose the idea that a venture funded operation like Attivio, BA Insight, Coveo, or any other company pitching information access will become the next Google is powerful. The problem is that Google does not seem capable of making its own enterprise search solution into another Google.

This is indeed interesting.

Stephen E Arnold, July 28, 2014

Interviews

Elasticsearch: A Platform for Third Party Revenue

Making money from search and content processing is difficult. One company has made a breakthrough. You can learn how Mark Brandon, one of the founders of QBox, is using the darling of the open source search world to craft a robust findability business.

I interviewed Mr. Brandon, a graduate of the University of Texas as Austin, shortly after my return from a short trip to Europe. Compared with the state of European search businesses, Elasticsearch and QBox are on to what diamond miners call a “pipe.”

In the interview, which is part of the Search Wizards Speak series, Mr. Brandon said:

We offer solutions that work and deliver the benefits of open source technology in a cost-effective way. Customers are looking for search solutions that actually work.

Simple enough, but I have ample evidence that dozens and dozens of search and content  processing vendors are unable to generate sufficient revenue to stay in business. Many well known firms would go belly up without continual infusions of cash from addled folks with little knowledge of search’s history and a severe case of spreadsheet fever.

Qbox’s approach pivots on Elasticsearch. Mr. Brandon said:

When our previous search product proved to be too cumbersome, we looked for an alternative to our initial system. We tested Elasticsearch and built a cluster of Elasticsearch servers. We could tell immediately that the Elasticsearch system was fast, stable, and customizable. But we love the technology because of its built-in distributed nature, and we felt like there was room for a hosted provider, just as Cloudant is for CouchDB, Mongolab and MongoHQ are for MongoDB, Redis Labs is for Redis, and so on. Qbox is a strong advocate for Elasticsearch because we can tailor the system to customer requirements, confident the system makes information more findable for users.

When I asked where Mr. Brandon’s vision for functional findablity came from, he told me about an experience he had at Oracle. Oracle owns numerous search systems, ranging from the late 1980s Artificial Linguistics’ system to somewhat newer systems like the late 1990s Endeca system, and the newer technologies from Triple Hop. Combine these with the SES technology and the hybrid InQuira formed from two faltering NLP systems, and Oracle has some hefty investments.

Here’s Mr. Brandon’s moment of insight:

During my first week at Oracle, I asked one of my colleagues if they could share with me the names of the middleware buyer contacts at my 50 or so named accounts. One colleague said, “certainly”, and moments later an Excel spreadsheet popped into my inbox. I was stunned. I asked him if he was aware that “Excel is a Microsoft technology and we are Oracle.” He said, “Yes, of course.” I responded, “Why don’t you just share it with me in the CRM System?” (the CRM was, of course, Siebel, an Oracle product). He chortled and said, “Nobody uses the CRM here.” My head exploded. I gathered my wits to reply back, “Let me get this straight. We make the CRM software and we sell it to others. Are you telling me we don’t use it in-house?” He shot back, “It’s slow and unusable, so nobody uses it.” As it turned out, with around 10 million corporate clients and about 50 million individual names, if I had to filter for “just middleware buyers”, “just at my accounts”, “in the Northeast”, I could literally go get a cup of coffee and come back before the query was finished. If I added a fourth facet, forget it. The CRM system would crash. If it is that bad at the one of the world’s biggest software companies, how bad is it throughout the enterprise?

You can read the full interview at http://bit.ly/1mADZ29. Information about QBox is at www.qbox.com.

Stephen E Arnold, July 2, 2014

Latest News

Software-Cluster Promises Better Mobile Services

A German software company popped into the aggregator with a press release entitled “Software-Cluster Designing Platforms For Innovative Internet Services” from... Read more »

July 31, 2014 | | Comment

Google Not Winning On Removing Information

Google has faced numerous lawsuits about having content removed from search results. International Business Times explains about a current battle in the UK: “Google’s... Read more »

July 31, 2014 | | Comment

Twitter: Short Text Outfit Gets Excited about Computer Vision

Robots. Magic stuff like semantic concept lenses. Logical wackiness like software that delivers knowledge. I read “Twitter Acquires Deep Learning Startup Madbits.”... Read more »

July 30, 2014 | | Comment

Quote to Note: Thomson Reuters Sustainable Growth

Good news for Thomson Reuters, one of the bellwether outfits for professional publishing and “real” news. The company continues to struggle with flat line revenue.... Read more »

July 30, 2014 | | Comment

No Search Or Publishing For Science

The scientific method is used to approach a problem logically and come to reasonable conclusion based off the presented evidence. Allow me to present the following... Read more »

July 30, 2014 | | Comment

Google Manipulating Content And Yelp Is Perturbed

Google is manipulating search results so that they favor Google content over Yelp. What a big shocker! Not really, so why are some people surprised? TechCrunch says... Read more »

July 30, 2014 | | Comment

SharePoint is Valuable but Underutilized Legal Tool

Document discovery is a big deal in the legal world – it is not only important but it is also time consuming. Lots of specialty software exists to aid legal firms... Read more »

July 30, 2014 | | Comment

The New SearchCIO Presents 219 Definitions of Failure

I received an email about the new “www.Search CIO.com” Here it is: I was not aware of the old search CIO. I clicked a link that delivered me to a page asking... Read more »

July 29, 2014 | | Comment

IHS Enterprise Search: Semantic Concept Lenses Are Here

I pointed out in http://bit.ly/X9d219 that IDC, a mid tier consulting firm that has marketed my information without permission on Amazon of all places, has rolled... Read more »

July 29, 2014 | | 1 Comment

Color Changing Ice Cream: The Metaphor for Search Marketing

I read “Scientist Invents Ice Cream That Changes Colour As You Lick It.” The write up struck me as a nearly perfect metaphor for enterprise search and retrieval.... Read more »

July 29, 2014 | | Comment