Algolia and Its View of the History of Search: Everyone Has an Opinion

August 11, 2021

Search is similar to love, patriotism, and ethical behavior. Everyone has a different view of the nuances of meaning with a specific utterance. Agree? Let’s assume you cannot define one of these words in a way that satisfies a professor from a mid tier university teaching a class to 20 college sophomores who signed up for something to do with Western philosophy: Post Existentialism. Imagine your definition. I took such a class, and I truly did not care. I wrote down the craziness the brown clad PhD provided, got my A, and never gave that stuff a thought. And you, gentle reader, are you prepared to figure out what an icon in an ibabyrainbow chat stream “means.” We captured a stream for one of my lectures to law enforcement in which she says, “I love you.” Yeah, right.

Now we come to “Evolution of Search Engines Architecture – Algolia New Search Architecture Part 1.” The write up explains finding information, and its methods through the lens of Algolia, a publicly traded firm. Search, which is not defined, characterizes the level of discourse about findability. The write up explains an early method which permitted a user to query by key words. This worked like a champ as long as the person doing the search knew what words to use like “nuclear effects modeling”.

The big leap was faster computers and clever post-Verity methods of getting distributed index to mostly work. I want to mention that Exalead (which may have had an informing role to play in Algolia’s technical trajectory) was a benchmark system. But, alas, key words are not enough. The Endeca facets were needed. Because humans had to do the facet identification, the race was on to get smart software to do a “good enough” job so old school commercial database methods could be consigned to a small room in the back of a real search engine outfit.

Algolia includes a diagram of the post Alta Vista, post Google world. The next big leap was scaling the post Google world. What’s interesting is that in my experience, most search problems result in processing smaller collections of information containing disparate content types. What’s this mean? When were you able to use a free Web search system or an enterprise search system like Elastic or Yext to retrieve text, audio, video, engineering drawings and their associated parts data, metadata from surveilled employee E2EE messages, and TikTok video résumés or the wildly entertaining puff stuff on LinkedIn? The answer is and probably will be for the foreseeable future, “No.” And what about real time data, the content on a sales person’s laptop with the changed product features and customer specific pricing. Oh, right. Some people forget about that. Remember. I am talking about a “small” content set, not the wild and crazy Internet indexes. Where are those changed files on the Department of Energy Web site? Hmmm.

The fourth part of the “evolution” leaps to keeping cloud centric, third party hosted chugging along. Have you noticed the latency when using the OpenText cloud system? What about the display of thumbnails on YouTube? What about retrieving a document from a content management system before lunch, only to find that the system reports, “Document not found.” Yeah, but. Okay, yeah but nothing.

The final section of the write up struck me as a knee slapper. Algolia addresses the “current challenges of search.” Okay, and what are these from the Algolia point of view: The main points have to do with using a cloud system to keep the system up and running without trashing response time. That’s okay, but without a definition of search, the fixes like separating search and indexing may not be the architectural solution. One example is processing streams of heterogeneous data in real time. This is a big thing in some circles and highly specialized systems are needed to “make sense” of what’s rushing into a system. Now means now, not a latency centric method which has remain largely unchanged for – what? — maybe 50 years.

What is my view of “search”? (If you are a believer that today’s search systems work, stop reading.) Here you go:

  1. One must define search; for example, chemical structure search, code search, HTML content search, video search, and so on. Without a definition, explanations are without context and chock full of generalizations.
  2. Search works when the content domain is “small” and clearly defined. A one size fits all content is pretty much craziness, regardless of how much money an IPO’ed or SPAC’ed outfit generates.
  3. The characteristic of the search engines my team and I have tested over the last — what is it now, 40 or 45 years — is that whatever system one uses is “good enough.” The academic calculations mean zero when an employee cannot locate the specific item of information needed to deal with a business issue or a student wants to locate a source for a statement from a source about voter fraud. Good enough is state of the art.
  4. The technology of search is like a 1962 Corvette. It is nice to look at but terrible to drive.

Net net: Everyone is a search expert now. Yeah, right. Remember: The name of the game is sustainable revenue, not precision and recall, high value results, or the wild and crazy promise that Google made for “universal search”. Ho ho ho.

Stephen E Arnold, August 11, 2021

Comments

One Response to “Algolia and Its View of the History of Search: Everyone Has an Opinion”

  1. Martin White on August 19th, 2021 6:03 am

    Excellent analysis. Thank you. But just to note that Algolia is not a publicly traded company https://techcrunch.com/2021/07/28/search-api-startup-algolia-raises-150-million-at-2-25-billion-valuation/

  • Archives

  • Recent Posts

  • Meta