More Search Explaining: Will It Help an Employee Locate an Errant PowerPoint?
May 13, 2021
“Semantics, Ambiguity, and the role of Probability in NLU” is a search-and-retrieval explainer. After half a century of search explaining, one would think that the technology required to enter a keyword and get a list of documents in which the key word appears would be nailed down. Wrong.
“Search” in 2021 embraces many sub disciplines. These range from explicit index terms like the date of a document to more elusive tags like “sentiment” and “aboutness.” Boolean has been kicked to the curb. Users want to talk to search, at least to Alexa and smartphones. Users want smart software to deliver results without the user having to enter a query. When I worked at Booz, Allen & Hamilton, one of my colleagues (I think his name was Harvey Poppel, the smart person who coined the phrase “paperless office”) suggested that someday a smart system would know when a manager walked into his or her office. The smart software would display what the person needed to know for that day. The idea, I think, was that whist drinking herbal tea, the smart person would read the smart outputs and be more smart when meeting with a client. That was in the late 1970s, and where are we? On Zooms and looking at smartphones. Search is an exercise in frustration, and I think that is why venture firms continue to pour money into ideas, methods, concepts, and demos which have been recycled many times.
I once reproduced a chunk of Autonomy’s marketing collateral in a slide in one of my presentations. I asked those in the audience to guess at what company wrote the text snippet. There were many suggestions, but none was Autonomy. I doubt that today’s search experts are familiar with the lingo of search vendors like Endeca, Verity, InQuire, et all. That’s too bad because the prose used to describe those systems could be recycled with little or no editing for today’s search system prospects.
The write up in question is serious. The author penned the report late last year, but Medium emailed me a link to it a day ago along with a “begging for dollars” plea. Ah, modern online blogs. Works of art indeed.
The article covers these topics as part of the “search” explainer:
- Ambiguity
- Understanding
- Probability
Ambiguity is interesting. One example is a search for the word “terminal.” Does the person submitting the query want information about a computer terminal, a bus terminal, or some other type of terminal; for instance the post terminal on the transformer to my model train set circa 1951? Smart software struggles with this type of ambiguity. I want to point out that a subject matter expert can assign a “field code” to the term and eliminate the ambiguity, but SMEs are expensive and they lose their index precision capability as the work day progresses.
The deal with the “terminal” example, the modern system has to understand [a] what the user wants and [b] what the content objects are about. Yep, aboutness. Today’s smart software does an okay job with technical text because jargon like Octanitrocubane allows relatively on point identification of a document relevant to a chemist in Columbus, Ohio. Toss in a chemical structure diagram, and the precision of the aboutness ticks up a notch. However, if you search for a word replete with social justice meaning, smart software often has a difficult time figuring out the aboutness. One example is a reference to Skokie, Illinois. Is that a radical right wing code word or a town loved for Potawatomi linguistic heritage?
Probability is a bit more specific — usually. The idea in search is that numbers can illuminate some of the dark corners of text’s meaning. Examples are plentiful. Curious about Miley Cyrus on SNL and then at the after party? The search engine will display the most probable content based on whatever data is sluiced through the query matcher and stored in a cache. If others looked at specific articles, then, by golly, a query about Miley is likely or highly probable to be just what the searcher wanted. The difference between ambiguity, understanding, and probability is — in my opinion — part of the problem search vendors faces. No one can explain why, after 50 years of SMART, and Personal Library Software, STAIRS, et al, finding on point information remains frustrating, expensive, and ineffective.
The write up states:
ambiguity was not invented to create uncertainty — it was invented as a genius compression technique for effective communication. And it works like magic, because on the receiving end of the message, there is a genius decoding and decompression technique/algorithm to uncover all that was not said to get at the intended thought behind the message. Now we know very well how we compress our thoughts into a message using a genius encoding scheme, let us now concentrate on finding that genius decoding scheme — a task that we all call now ‘natural language understanding’.
Sounds great. Now try this test. You have a recollection of viewing a PowerPoint a couple of weeks ago at an offsite. You know who the speaker was and you want the slide with the number of instant messages sent per day on WhatsApp? How do you find that data?
[a] Run a query on your Fabasoft, SearchUnify, or Yext system?
[b] Run a query on Google in the hopes that the GOOG will point you to Statista, a company you believe will have the data?
[c] Send an email to the speaker?
[d] All of the above.
I would just send the speaker a text message and hope for an answer. If today’s search systems were smart, wouldn’t the single PowerPoint slide be in my email anyway? Sure, someday.
Stephen E Arnold, May 13, 2021
Comments
One Response to “More Search Explaining: Will It Help an Employee Locate an Errant PowerPoint?”
I often wish there were still an old-school literal search engine like Alta Vista or Fast. Where I could remember a phrase and search for the exact phrase and not have the engine try to overthink what I mean and send me the meaning with the most expensive ads on it. Google’s getting useless.