Indexing: The Big Wheel Keeps on Turning

January 23, 2017

Yep, indexing is back. The cacaphone “ontology” is the next big thing yet again. Folks, an ontology is a form of metadata. There are key words, categories, and classifications. Whipping these puppies into shape has been the thankless task of specialists for hundreds if not thousands of years. “What Is an Ontology and Why Do I Want One?” tries to make indexing more alluring. When an enterprise search system delivers results which are off the user’s information need or just plain wrong, it is time for indexing. The problem is that machine based indexing requires some well informed humans to keep the system on point. Consider Palantir Gotham. Content finds its way into the system when a human performs certain tasks. Some of these tasks are riding herd on the indexing of the content object. IBM Analyst’s Notebook and many other next generation information access systems work hand in glove with expensive humans. Why? Smart software is still only sort of smart.

The write up dances around the need for spending money on indexing. The write up prefers to confuse a person who just wants to locate the answer to a business related question without pointing, clicking, and doing high school research paper dog work. I noted this passage:

Think of an ontology as another way to classify content (like a taxonomy) that allows you to identify what the content is about and how it relates to other types of content.

Okay, but enterprise search generally falls short of the mark for 55 to 70 percent of a search system’s users. This is a downer. What makes enterprise search better? An ontology. But without the cost and time metrics, the yap about better indexing ends up with “smart content” companies looking confused when their licenses are not renewed.

What I found amusing about the write up is that use of an ontology improves search engine optimization. How about some hard data? Generalities are presented, not instead of some numbers one can examine and attempt to verify.

SEO means getting found when a user runs a query. That does not work too well for general purpose Web search systems like Google. SEO is struggling to deal with declining traffic to many Web sites and the problem mobile search presents.

But in an organization, SEO is not what the user wants. The user needs the purchase order for a client and easy access to related data. Will an ontology deliver an actionable output. To be fair, different types of metadata are needed. An ontology is one such type, but there are others. Some of these can be extracted without too high an error rate when the content is processed; for example, telephone numbers. Other types of data require different processes which can require knitting together different systems.

To build a bubble gum card, one needs to parse a range of data, including images and content from a range of sources. In most organizations, silos of data persist and will continue to persist. Money is tight. Few commercial enterprises can afford to do the computationally intensive content processing under the watchful eye and informed mind of an indexing professional.

Cacaphones like “ontology” exacerbate the confusion about indexing and delivering useful outputs to users who don’t know a Boolean operator from a SQL expression.

Indexing is a useful term. Why not use it?

Stephen E Arnold, January 23, 2017


Comments are closed.

  • Archives

  • Recent Posts

  • Meta