Will More Big Data Make AI Deliver Results

April 6, 2020

Many companies have issued news releases about their coronavirus research support. Personally I find the majority of these “real news” announcements low ball marketing at its finest. The coronavirus problem is indeed serious, and researchers, art history majors, and MBA executives who hop on the “We are helping” bandwagon are amusing.

I read a 3,500 ZDNet article titled:

AI Runs Smack Up Against a Big Data Problem in COVID-19 Diagnosis. Researchers around the world have quickly pulled together combinations of neural networks that show real promise in diagnosing COVID-19 from chest X-rays and CT scans. But a lack of data is hampering the ability of many efforts to move forward. Some kind of global data sharing may be the answer.

Now that’s an SEO inspired title, but the write up makes one amazing assertion: More data will allow medical AI systems to output actionable information.

If I run through the litany of medical AI revolutions, my fingers would get tired clicking and mousing. The IBM Watson silliness is a good example, and it encapsulates the problem of using collections of numerical recipes to help physicians deal with cancer. Google has not made much, if any, progress on solving death. Remember that “hard problem.” Pushing deeper into the past there was NuTech Solutions’ ability to identify individuals likely to get diabetes based on sparse data and ant algorithms.

How did these companies’ efforts work out?

Failures from my point of view.

The write up runs down a number of research efforts. Companies like DarwinAI are mentioned. There are quotes which provide guidance to organizations challenged to find the snack room; for example:

“I think it would help if the WHO made a central database with de-identifying mechanisms, and some really good encryption,” said Dr. Luccioni. “That way, local health authorities would be reassured and motivated to share their data with each other.”

The problem is that smart software is mostly implementation of methods known in some cases for hundreds of years. These smart systems use recursion, feedback loops, and statistical procedures to output statistically valid (probable) information.

How are these systems working? There are data, but they are conflicting, disorganized, and inconsistent. News flash. That’s how information is. There is zero evidence that more data can be verified, normalized, processed in near real time to allow smart software to demonstrate it can do more than generate marketing collateral.

The companies pitching their artificial intelligence should articulate the reality of the outputs their workflows of algorithms can actually generate.

That might help more than the craziness of wanting data to be better, having some magic wand to normalize the messy real world of information, and converting what are mostly graduate school projects into something useful beyond speeding up some lab tests and getting a “real” job.

Will this happen? Not for a long time. Data are not the problem. Humans are the problem because the idea of creating a consistent, verified repository of on point data has not been achieved for small domains of content. Forget global data.

Don’t believe me. Check out any online system. Run some queries. Is “everything” in that system or federated system? What about a small collection of data; for example, the data on your mobile? What’s there that you can access? “What?” you ask. Yeah, the high value data are sucked away and those data are not shared with “everyone” including you who created the data in the first place.

Smart software performs some useful functions. Will data make Bayesian methods or patented techniques like those from Qure.ai “solve” Covid?

Hard in reality. Easy in ZDNet articles. Even easier for marketers. And the patients suffering? What? Who? Where?


Stephen E Arnold, April 5, 2020


Got something to say?

  • Archives

  • Recent Posts

  • Meta