Ground Hog Day: Smart Enterprise Search

January 7, 2025

Hopping DinoI am a dinobaby. I also wrote the Enterprise Search Report, 1st, 2nd, and 3rd editions. I wrote The New Landscape of Search. I wrote some other books. The publishers are long gone, and I am mostly forgotten in the world of information retrieval. Read this post, and you will learn why. Oh, no AI helped me out unless I come up with an art idea. I used Stable Diffusion for the rat, er, sorry, ground hog day creature.

I think it was 2002 when the owner of a publishing company asked me if I thought there was an interest in profiles of companies offering “enterprise search solutions.” I vaguely remember the person, and I will leave it up to you to locate a copy of the 400 page books I wrote about enterprise search.

The set up for the book was simple. I identified the companies which seemed to bid on government contracts for search, companies providing search and retrieval to organizations, and outfits which had contacted me to pitch their enterprise search systems before they were exiting stealth mode. By the time the first edition appeared in 2004, the companies in the ESR were flogging their products.

image

The ground hog effect is a version of the Yogi Berra “Déjà vu all over again” thing. Enterprise search is just out of reach now and maybe forever.

The enterprise search market imploded. It was there and then it wasn’t. Can you describe the features and functions of these enterprise search systems from the “golden age” of information retrieval:

  • Innerprise
  • InQuira
  • iPhrase
  • Lextek Onix
  • MondoSearch
  • Speed of Mind
  • Stratify (formerly Purple Yogi)

The end of enterprise search coincided with large commercial enterprises figuring out that “search” in a complex organization was not one thing. The problem remains today. Lawyers in a Fortune 1000 company want one type of search. Marketers want another “flavor” of search. The accountants want a search that retrieves structured and unstructured data plus images of invoices. Chemists want chemical structure search. Senior managers want absolutely zero search of their personal and privileged data unless it is lawyers dealing with litigation. In short, each unit wants a highly particularized search and each user wants access to his or her data. Access controls are essential, and they are a hassle at a time when the notion of an access control list was like learning to bake bread following a recipe in Egyptian hieroglyphics.

These problems exist today and are complicated by podcasts, video, specialized file types for 3D printing, email, encrypted messaging, unencrypted messaging, and social media. No one has cracked the problem of a senior sales person who changes a PowerPoint deck to close a deal. Where is that particular PowerPoint? Few know and the sales person may have deleted the file changed minutes before the face to face pitch. This means that baloney like “all” the information in an organization is searchable is not just stupid; it is impossible.

The key events were the legal and financial hassles over Fast Search & Transfer. Microsoft bought the company in 2008 and that was the end of a reasonably capable technology platform and — believe it or not — a genuine alternative to Google Web search. A number of enterprise search companies sold out because the cost of keeping the technology current and actually running a high-grade sales and marketing program spelled financial doom. Examples include Exalead and Vivisimo, among others. Others just went out of business: Delphes (remember that one?). The kiss of death for the type of enterprise search emphasized in the ESR was the acquisition of Autonomy by Hewlett Packard. There was a roll up play underway by OpenText which has redefined itself as a smart software company with Fulcrum and BRS Search under its wing.

What replaced enterprise search when the dust settled in 2011? From my point of view it was Shay Banon’s Elastic search and retrieval system. One might argue that Lucid Works (né Lucid Imagination) was a player. That’s okay. I am, however, to go with Elastic because it offered a version as open source and a commercial version with options for on-going engineering support. For the commercial alternatives, I would say that Microsoft became the default provider. I don’t think SharePoint search “worked” very well, but it was available. Google’s Search Appliance appeared and disappeared. There was zero upside for the Google with a product that was “inefficient” at making a big profit for the firm. So, Microsoft it was. For some government agencies, there was Oracle.

Oracle acquired Endeca and focused on that computationally wild system’s ability to power eCommerce sites. Oracle paid about $1 billion for a system which used to be an enterprise search with consulting baked in. One could buy enterprise search from Oracle and get structured query language search, what Oracle called “secure enterprise search,” and may a dollop of Triple Hop and some other search systems the company absorbed before the end of the enterprise search era. IBM talked about search but the last time I drove by IBM Government systems in Gaithersburg, Maryland, it like IBM search, had moved on. Yo, Watson.

Why did I make this dalliance on memory lane the boring introduction to a blog post? The answer is that I read “Are LLMs At Risk Of Going The Way Of Search? Expect A Duopoly.” This is a paywalled article, so you will have to pony up cash or go to a library. Here’s an abstract of the write up:

  1. The evolution of LLMs (Large Language Models) will lead users to prefer one or two dominant models, similar to Google’s dominance in search.

  2. Companies like Google and Meta are well-positioned to dominate generative AI due to their financial resources, massive user bases, and extensive data for training.

  3. Enterprise use cases present a significant opportunity for specialized models.

Therefore, consumer search will become a monopoly or duopoly.

Let’s assume the Forbes analysis is accurate. Here’s what I think will happen:

First, the smart software train will slow and a number of repackagers will use what’s good enough; that is, cheap enough and keeps the client happy. Thus, a “golden age” of smart search will appear with outfits like Google, Meta, Microsoft, and a handful of others operating as utilities. The US government may standardize on Microsoft, but it will be partners who make the system meet the quite particular needs of a government entity.

Second, the trajectory of the “golden age” will end as it did for enterprise search. The costs and shortcomings become known. Years will pass, probably a decade, maybe less, until a “new” approach becomes feasible. The news will diffuse and then a seismic event will occur. For AI, it was the 2023 announcement that Microsoft and OpenAI would change how people used Microsoft products and services. This created the Google catch up and PR push. We are in the midst of this at the start of 2025.

Third, some of the problems associated with enterprise information and an employee’s finding exactly what he or she needs will be solved. However, not “all” of the problems will be solved. Why? The nature of information is that it is a bit like pushing mercury around. The task requires fresh thinking.

To sum up, the problem of search is an excellent illustration of the old Hegelian chestnut of Hegelian thesis, antithesis, and synthesis.  This means the problem of search is unlikely to be “solved.” Humans want answers. Some humans want to verify answers which means that the data on the sales person’s laptop must be included. When the detail oriented human learns that the sales person’s data are missing, the end of the “search solution” has begun.

The question “Will one big company dominate?” The answer is, in my opinion, maybe in some use cases. Monopolies seem to be the natural state of social media, online advertising, and certain cloud services. For finding information, I don’t think the smart software will be able to deliver. Examples are likely to include [a] use cases in China and similar countries, [b] big multi-national organizations with information silos, [c] entities involved in two or more classified activities for a government, [d] high risk legal cases, and [e] activities related to innovation, trade secrets, and patents, among others.

The point is that search and retrieval remains an extraordinarily difficult problem to solve in many situations. LLMs contribute some useful functional options, but by themselves, these approaches are unlikely to avoid the reefs which sank the good ships Autonomy and Fast Search & Transfer, and dozens of others competing in the search space.

Maybe Yogi Berra did not say “Déjà vu all over again.” That’s okay. I will say it. Enterprise search is “Déjà vu all over again.”

Stephen E Arnold, January 7, 2025

Comments

Got something to say?





  • Archives

  • Recent Posts

  • Meta