An Important, Easily Pooh-Poohed Insight

December 24, 2023

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Dinobaby here. I am on the regular highway, not the information highway. Nevertheless l want to highlight what I call an “easily poohpoohed factoid. The source of the item this morning is an interview titled “Google Cloud Exec: Enterprise AI Is Game-Changing, But Companies Need to Prepare Their Data.”

I am going to skip the PR baloney, the truisms about Google fumbling the AI ball, and rah rah about AI changing everything. Let me go straight to factoid which snagged my attention:

… at the other side of these projects, what we’re seeing is that organizations did not have their data house in order. For one, they had not appropriately connected all the disparate data sources that make up the most effective outputs in a model. Two, so many organizations had not cleansed their data, making certain that their data is as appropriate and high value as possible. And so we’ve heard this forever — garbage in, garbage out. You can have this great AI project that has all the tenets of success and everybody’s really excited. Then, it turns out that the data pipeline isn’t great and that the data isn’t streamlined — all of a sudden your predictions are not as accurate as they could or should have been.

Why are points about data significant?

First, investors, senior executives, developers, and the person standing on line with you at Starbucks dismisses data normalization as a solved problem. Sorry, getting the data boat to float is a work in progress. Few want to come to grips with the issue.

Second, fixing up data is expensive. Did you ever wonder why the Stanford president made up data, forcing his resignation? The answer is that the “cost of fixing up data is too high.” If the president of Stanford can’t do it, is the run-fo-the-mill fast talking AI guru different? Answer: Nope.

Third, knowledge of exception folders and non-conforming data is confined to a small number of people. Most will explain what is needed to make a content intake system work. However, many give up because the cloud of unknowing is unlikely to disperse.

The bottom line is that many data sets are not what senior executives, marketers, or those who use the data believe they are. The Google comment — despite Google’s sketchy track record in plain honest talk — is mostly correct.

So what?

  1. Outputs are often less useful than many anticipated. But if the user is uninformed or the downstream system uses whatever is pushed to it, no big deal.
  2. The thresholds and tweaks needed to make something semi useful are not shared, discussed, or explained. Keep the mushrooms in the dark and feed them manure. What do you get? Mushrooms.
  3. The graphic outputs are eye candy and distracting. Look here, not over there. Sizzle sells and selling is important.

Net net: Data are a problem. Data have been due to time and cost issues. Data will remain a problem because one can sidestep a problem few recognize and those who do recognize the pit find a short cut. What’s this mean for AI? Those smart systems will be super. What’s in your AI stocking this year?

Stephen E Arnold, December 24, 2023


Comments are closed.

  • Archives

  • Recent Posts

  • Meta