Semantic Sci-Fi: Search Is Great
March 23, 2020
I read “Keyword Search is DEAD; Semantic Search Is Smart.” I assume the folks at Medium consider each article, weigh its value, and then release only the highest value content.
Semantic search is better than any other type of search in the galaxy.
Let’s assume that the write up is correct and keyword search is dead. Further, we shall ignore the syntax of SQL queries, the dependence of policeware and intelware systems on users’ looking for named entities, and overlook the interaction of people using an automobile’s navigation service by saying, “Home.” These are examples of keyword search, and I decided to give a few examples, skipping how keyword search functions in desktop search, chemical structure systems, medical research, and good old, bandwidth trimming YouTube.
Okay, what’s the write up say beyond “keyword search is dead.”
Here are some points I extracted as I worked my way through the write up. I required more than three minutes (the Medium estimate) because my blood pressure was spiking, and I was hyper ventilating.
Factoid 1 from the write up :
If you do semantic search, you can get all information as per your intent.
What’s with this “all.” Content domains, no matter what the clueless believe, are incomplete. There is no “all” when it comes online information which is indexed.
Factoid 2 from the write up:
semantic search seeks to understand natural language the way a human would.
Yep, natural language queries are possible within certain types of content domains. However, the systems I have worked with and have an opportunity to use in controlled situations exhibit a number of persistent problems. These range from computational constraints. One system could support four simultaneous users on a corpus of fewer than 100,000 text documents. Others simply output “good enough” results. Not surprisingly when a physician needs an antitoxin to save a child’s life, keywords work better than “good enough” in my experience. NLP has been getting better, but the idea that systems can integrate widely different data which may be incomplete, incorrect, or stale and return a useful output is a big hurdle. So far no one has gotten over it on a consistent, affordable basis. Short cuts to reduce index look ups can be packaged as semantics and NLP but mostly these are clever ways to improve “efficiency.” Understanding sometimes. Precision and recall? Not yet.
Factoid 3 from the write up:
The purpose of semantic search is to go beyond the static dictionary meaning of a word or phrase to understand the intent of a searcher’s query within a specific context.
Have you seen a Google ad which does not have any relevance to your query? That’s semantic search. The relaxation of strict keyword matching allows the Google to pull related words from a cluster. The advertiser is told, “Your ads appear on relevant results.” Yet advertisers complain when Google displays inappropriate ads next to big spender ads. Why? The relaxation is a way to burn through ad inventory. Precision does not meet revenue objectives. If I tell my in car navigation system “Home,” it works. Tell it to “Find the Moser Warehouse”, the system does not work. Ask Alexa, “Does Amazon work with the CIA?” Like the answer? The “context” to make today’s systems work involve filters, look up tables, rules, and dozens of numerical recipes. Is the search “good enough?” Sure, but it is not particularly useful for certain types of queries; for example, “What did Banjo do between 2016 and 2019?” I know the answer, but no system to which I have access has any idea about how to answer this NLP query.
Factoid 4 from the write up:
Deep learning improves this process by allowing us to automatically generate additional features that more comprehensively capture the intent of the query and the characteristics of a webpage.
This is Google “signals” territory. Signals can be parsed to yield useful information. Google uses methods from CLEVER, invented at IBM Almaden, and more than 120 other items of data to return relevant results. How well does this work? Run a query on Google mobile for information about Voyager Labs’ sales to the US government? Or ask your smartphone, “Who funds Voyager Labs?” Get any information? Sure. Is it accurate? Relevant? Precise? Timely? To save you time, the answer to these questions is, “Nope.”
Let’s stop.
This write up is an example of search and retrieval science fiction. The image of a system which can return on point information that is excellent has been in the minds of search developers for more than 50 years. The dream caused the implosion of Fast Search & Transfer. The dream caused the failure of Convera’s NBA video confection. The dream caused Delphis to crash and burn. The dream caused Entopia to become a touchstone for marketing craziness. There are other examples; for instance, the pain at HP when it realized the dream of Autonomy’s black box was not what HP thought it to be. The failure of OpenText to enhance the numerous search systems the company has purchased. Fulcrum, the Tim Bray SGML search, the Matt Kohl PBS system, and others are dead or stuck in time because there is not enough money to deliver on the “dreams” of these systems.
Here’s the reality of search:
- Effective search and retrieval requires the use of multiple content processing techniques
- Systems must offer users different ways to retrieve information from the repositories of processed content
- Reliable, affordable, precise, and understandable systems are not yet available. Period.
Search and retrieval is among the most complex problems in computer and information science. Throwing buzzwords around gives some users a false sense of confidence.
Today it is more difficult than ever to find, access, and make sense of information and data needed to answer a question.
By the way, have you figured out what Banjo was doing between 2016 and 2019? Try a Bing, Google, Yandex, or a metasearch query on DuckDuckGo or Startpage.
There are millions of questions which cannot be answered because search is not very good. Marketers are, on the other hand, outstanding when it comes to painting backdrops for science fiction stories; for example, “Amazing Tales of Semantic Search.”
Stephen E Arnold, March 23, 2020