IBM Watson: Predicting the Future

July 12, 2017

I enjoy IBM’s visions of the future. One exception: The company’s revenue estimates for the Watson product line is an exception. I read “IBM Declares AI the Key to Making Unstructured Data Useful.” For me, the “facts” in the write up are a bit like a Payday candy bar. Some nuts squished into a squishy core of questionable nutritional value.

I noted this factoid:

80 percent of company data is unstructured, including free-form documents, images, and voice recordings.

I have been interested in the application of the 80-20 rule to certain types of estimates. The problem is that the ‘principle of factor sparsity” gets disconnected from the underlying data. Generalizations are just so darned fun and easy. The problem is that the mathematical rigor necessary to validate the generalization is just too darned much work. The “hey, I’ve got a meeting” or the more common “I need to check my mobile” get in the way of figuring out if the 80-20 statement makes sense.

My admittedly inept encounters with data suggest that the volume of unstructured data is high, higher that the 80 percent in the rule. The problem is that today’s systems struggle to:

  • Make sense of massive streams of unstructured data from outfits like YouTube, clear text and encrypted text messages, and the information blasted about on social media
  • Identify the important items of content directly germane to a particular matter
  • Figure out how to convert content processing into useful elements like named entities and relate those entities to code words and synonyms
  • Perform cost effective indexing of content streams in near real time.

At this time, systems designed to extract actionable information from relatively small chunks of content are improving. But these systems typically break down when the volume exceeds the budget and computing resources available to those trying to “make sense” of the data in a finite amount of time. This type of problem is difficult due to constraints on the systems. These constraints are financial as in “who has the money available right now to process these streams?” These constraints are problematic when someone asks “what do we do with the data in this dialect from northern Afghanistan?” And there are other questions.

My problem with the IBM approach is that the realities of volume, interrelating structured and semi structured data, and multi lingual content is that these bumps in the information super highway Watson seems to speed along are absorbed by marketing fluffiness.

I loved this passage:

Chatterjee highlighted Macy’s as an example of an IBM customer that’s using the company’s tools to better personalize customers’ shopping experiences using AI. The Macy’s On Call feature lets customers get information about what’s in stock and other key details about the contents of a retail store, without a human sales associate present. It uses Watson’s natural language understanding capabilities to process user queries and provide answers. Right now, that feature is available as part of a pilot in 10 Macy’s stores.

Yep, I bet that Macy’s is going to hit a home run against the fast ball pitching of Jeff Bezos’ Amazon Prime team. Let’s ask Watson. On the other hand, let’s ask Alexa.

Stephen E Arnold, July 12, 2017

Comments

Got something to say?