Semantic Search Explained
February 11, 2010
A happy quack to the reader who sent me “Breakthrough Analysis: Tow + Nine Types of Semantic Search”. Martin White (Intranet Focus) and I tried to explain semantic search in the text and the glossary for our Successful Enterprise Search Management. Our approach was to point out that the word “semantic” is often used in many different ways. Our purpose was to put procurement teams on alert when buzzwords were used to explain the features of an enterprise search system. Our approach was focused on matching a specific requirement to a particular function. An example would be displaying results in categories. The search vendor had to have a system that performed this type of value-added processing. The particular adjectives and marketing nomenclature were secondary to the function. The practicality of our approach was underscored for me when I read the Intelligent Enterprise article about the nine types of semantic search.
Source: http://writewellcommunications.com/wp-content/uploads/2009/06/homonyms1.jpg
I don’t feel comfortable listing the Intelligent Enterprise list, but I urge you to take a close look at the write up. Ask yourself these questions:
- Do you understand the difference between related searches/queries, concept search, and faceted search?
- When you look for information, are you mindful of “semantic/syntactic annotations” operating under the covers or in plain view?
- Do you type queries of about three words, or do you type queries with a dozen words or more organized in a question?
Your answer underscores one of the most fragile aspects of search and content processing. A focus on the numerical recipes that different vendors use to deliver specific functions often makes little or no sense even to engineers with deep experience in search and content processing.
A quick example.
If you run a query on the Exstream (the enterprise publishing system acquired by Hewlett Packard), you can get a list of content elements. The system is designed to allow a person in charge of placing a message in a medical invoice or an auto payment invoice and other types of content assembly operations. The system is not particularly clever, but it works reasonably well. The notion of search in this enterprise environment is in my opinion quite 1980s, despite some nice features like saved projects along the lines of Quark’s palette of frequently used objects.
Now run a query on a Mark Logic based system at a major manufacturing company. The result looks a bit like a combination of a results list and a report, but if you move to another department, the output may have a different look and feel. This is a result of the underlying plumbing of the Mark Logic system. I think that describing Mark Logic as a search system and attributing more “meaningful” functions to it is possible, but the difference is the architecture.
A person describing either the Exstream or the Mark Logic system could apply one or more of the “two + nine” terms to the system. I don’t think those terms are particularly helpful either to the users or to the engineers at Exstream or Mark Logic. Here’s why:
- Systems have to solve a problem for a customer. Describing what the outputs look like are descriptive and may not reflect what is going on under the hood. Are the saved projects the equivalent of an stored Xquery for MarkLogic?
- Users need to have interfaces that allow them to get their work done. Arguably both Exstream and Mark Logic deliver for their customers. The underlying numerical recipes are essentially irrelevant if these two systems deliver for their customers.
- The terminology in use at each company comes from different domains, and it is entirely possible that professionals of Exstream and Mark Logic use exactly the same term with very different connotations.
The discourse about search, content processing, and information retrieval is fraught with words that are rarely defined across different slices of the information industry. In retrospect, Martin and I followed a useful, if pragmatic, path. We focused on requirements. Leave the crazy lingo to the marketers, the pundits, and the poobahs. I just want systems to work for their users. Words don’t do this, obviously, which makes lingo easier than implementing systems so users can locate needed information.
Stephen E Arnold, February 11, 2010
No one paid me to put in this shameless plug for Martin White’s and my monograph, Successful Enterprise Search Management. This is a marketing write up, and I have dutifully reported this fact to you.