A Theory: No Room for Shortcuts in Healthcare Datasets
July 1, 2021
The value of any machine learning algorithm depends on the data it was trained on, we are reminded in the article, “Machine Learning Deserves Better Than This” at AAAS’ Science Mag. Writer Derek Lowe makes some good points that are, nevertheless, likely to make him unpopular among the rah-rah AI crowd. He is specifically concerned with the ways machine learning is currently being applied in healthcare. As an example, Lowe examines a paper on coronavirus pathology as revealed in lung X-ray data. He writes:
“Every single one of the studies falls into clear methodological errors that invalidate their conclusions. These range from failures to reveal key details about the training and experimental data sets, to not performing robustness or sensitivity analyses of their models, not performing any external validation work, not showing any confidence intervals around the final results (or not revealing the statistical methods used to compute any such), and many more. A very common problem was the (unacknowledged) risk of bias right up front. Many of these papers relied on public collections of radiological data, but these have not been checked to see if the scans marked as COVID-19 positive patients really were (or if the ones marked negative were as well). It also needs to be noted that many of these collections are very light on actual COVID scans compared to the whole database, which is not a good foundation to work from, either, even if everything actually is labeled correctly by some miracle. Some papers used the entire dataset in such cases, while others excluded images using criteria that were not revealed, which is naturally a further source of unexamined bias.”
As our regular readers are aware, any AI is only as good as the data it is trained upon. However, data scientists can be so eager to develop tools (or, to be less charitable, to get published) that they take shortcuts. Some, for example, accept all data from public databases without any verification. Others misapply data, like the collection of lung x-rays from patients under the age of five that was included in the all-ages pneumonia dataset. Then there are the datasets and algorithms that simply do not have enough documentation to be trusted. How was the imaging data pre-processed? How was the model trained? How was it selected and validated? Crickets.
We understand why people are excited about the potential of machine learning in healthcare, a high-stakes field where solutions can be frustratingly elusive. However, it benefits no one to rely on conclusions drawn from flawed data. In fact, doing so can be downright dangerous. Let us take the time to get machine learning right first.
Cynthia Murrell, July 1, 2021