Why Metadata? The Answer: Easy and Good Enough

April 30, 2021

I read “We Were Promised Strong AI, But Instead We Got Metadata Analysis.” The essay is thoughtful and provides a good summary of indexing’s virtues. The angle of attack is that artificial intelligence has not delivered the zip a couple of bottles of Red Bull provides. Instead, metadata is more like four ounces of Sunny D tangy original.

The write up states:

The phenomenon of metadata replacing AI isn’t just limited to web search. Manually attached metadata trumps machine learning in many fields once they mature – especially in fields where progress is faster than it is in internet search engines. When your elected government snoops on you, they famously prefer the metadata of who you emailed, phoned or chatted to the content of the messages themselves. It seems to be much more tractable to flag people of interest to the security services based on who their friends are and what websites they visit than to do clever AI on the messages they send. Once they’re flagged, a human can always read their email anyway.

This is an accurate statement.

The write up does not address a question I think is important in the AI versus metadata discussion. That question is, “Why?”

Here are some of the reasons I have documented in my books and writings over the years:

  1. Metadata is cheaper to process than spending to get smart software to work in a reliable way
  2. Metadata is good enough; that is, key insights can be derived with maths taught in most undergraduate mathematics programs. (I lectured about the 10 algorithms which everyone uses. Why? These are good enough.)
  3. Machines can do pretty good indexing; that is, key word and bound phrase extraction and mapping, clustering, graphs of wide paths among nodes, people, etc.
  4. Humans have been induced to add their own – often wonky – index terms or hash tags as the thumbtypers characterize their tags
  5. Index analysis (Gene Garfield’s citation analysis) provides reasonably useful indications of what’s important even if one knows zero about a topic, entity, etc.
  6. Packaging indexing – sorry, metadata – as smart software and its ilk converts VCs from skeptics into fantasists. Money flows even though Google’s DeepMind technology is not delivering dump trucks of money to the Alphabet front door. Maybe soon? Who knows?

Net net: The strongest supporters of artificial intelligence have specific needs: Money, vindication of an idea gestated among classmates at a bar, or a desire to become famous.

Who agrees with me? Probably not too many people. As the professionals who founded commercial database products in the late 1970s and early 1980s die off, any chance of getting the straight scoop on the importance of indexing decreases. For AI professionals, that’s probably good news. For those individuals who understand indexing in today’s context, good luck with your mission.

Stephen E Arnold, April 30, 2021


