Zero Search Results = Useful Information
August 26, 2020
I saw a notice for a conference called “Activate.” Zippy title. What caught my attention was the title of a talk; specifically, “Implementing a Deep Learning Search Engine.” The technology appears to be the open source Solr search system. As you know, dig into Solr and what do you find? Lucene. The hay day of enterprise search has gone. Perhaps another harvest will come? But after the implosion of the promises made by Fulcrum, Verity, Autonomy, Fast, Convera, and Entopia, I am not sure search has credibility.
Don’t get me wrong. Search is a major part of companies; for example, Salesforce bought Diffeo, which was an interesting search system. Elastic is, of course, the commercial firm selling support for the open source Elasticsearch system. There are unusual systems as well; for example, the quirky Qwant, which has some Pertimm inside.
But consider this description of the talk for the Activate conference delivered by two wizards (well, maybe apprentice wizards) from the Lucidworks outfit:
Recent advances in Deep Learning brings us the possibility to get improvements in almost any domain. Search Engines aren’t an exception. Semantic search, visual search, “zero results” queries, recommendations, chatbots etc. – this is just a shortlist of topics that can benefit from Deep Learning based algorithms. But more powerful methods are also more expensive, so they require addressing the variety of scalability challenges. In this talk, we will go through details of how we implement Deep Learning Search Engine at Lucidworks: what kind of techniques we use to train robust and efficient models as well as how we tackle scalability difficulties to get the best query time performance. We will also demo several use-cases of how we leverage semantic search capabilities to tackle such challenges as visual search and “zero results” queries in eCommerce.
Three points:
- Deep learning is one of those buzzwords that recyclers of open source technology slap on a utility function like search. What search vendor does not include smart software, semantics, and more Gartner-infused techno babble? Not many.
- Short cuts for training smart software for machine learning is indeed important. However, the approach which strikes me as interesting is the one taken by the ever-pragmatic AWS system pushed along by the Bezos bulldozer. AWS wants to make training a matter of buying commodity solutions of data off the shelf. Presumably the approach works like one of those consumer soap tablets I have seen in our local grocery store. Buy, rip, and wash. Bingo! Clean ML. Grubbing in data is time consuming, expensive, and oh-so-easy to get wrong.
- The goal of “zero results” in eCommerce or any other domain is not exactly a challenge. Zero results deliver data. I know that an objective system displays only the objects matching my query. Not any longer. Synonym expansion, predictive analytics, clustering, and other numerical processes are going to show me something. Too bad that the “something” is usually not what I want.
- For special cases like ecommerce, instead of a list of crazy options, why not ask the user, “Do you want to see what products other people purchased when searching for X?” Choice is sometimes helpful.
Is this important? To me, yes. To most others, no.
The problem with making information easy is everywhere today. From individuals who disbelieve verifiable information like the earth is spheroid to the wisdom of demanding no law enforcement. Yeah, that will work.
Some quick facts to put this Lucidworks’ assertion in perspective. The company has ingested more than $209 million since 2007. I did some advice giving to the first president of Lucidworks, then called Lucid Imagination. I did some advice giving for another semi-lucid president. None of that advice resonated because recycling jargon does not generate sustainable revenues.
The point is that jazzy words and crazy ideas like “zero results” are bad are part of the problem search vendors face. Today’s search systems have drifted from displaying results which match a user’s query to dumping baloney on the display.
It is easier to yip yap with buzzwords that deal with some of the painful realities of information retrieval. Deep learning? Yeah, that will help the person locate that PowerPoint… not.
Stephen E Arnold, August 26, 2020