A Theory: No Room for Shortcuts in Healthcare Datasets
July 1, 2021
The value of any machine learning algorithm depends on the data it was trained on, we are reminded in the article, “Machine Learning Deserves Better Than This” at AAAS’ Science Mag. Writer Derek Lowe makes some good points that are, nevertheless, likely to make him unpopular among the rah-rah AI crowd. He is specifically concerned with the ways machine learning is currently being applied in healthcare. As an example, Lowe examines a paper on coronavirus pathology as revealed in lung X-ray data. He writes:
“Every single one of the studies falls into clear methodological errors that invalidate their conclusions. These range from failures to reveal key details about the training and experimental data sets, to not performing robustness or sensitivity analyses of their models, not performing any external validation work, not showing any confidence intervals around the final results (or not revealing the statistical methods used to compute any such), and many more. A very common problem was the (unacknowledged) risk of bias right up front. Many of these papers relied on public collections of radiological data, but these have not been checked to see if the scans marked as COVID-19 positive patients really were (or if the ones marked negative were as well). It also needs to be noted that many of these collections are very light on actual COVID scans compared to the whole database, which is not a good foundation to work from, either, even if everything actually is labeled correctly by some miracle. Some papers used the entire dataset in such cases, while others excluded images using criteria that were not revealed, which is naturally a further source of unexamined bias.”
As our regular readers are aware, any AI is only as good as the data it is trained upon. However, data scientists can be so eager to develop tools (or, to be less charitable, to get published) that they take shortcuts. Some, for example, accept all data from public databases without any verification. Others misapply data, like the collection of lung x-rays from patients under the age of five that was included in the all-ages pneumonia dataset. Then there are the datasets and algorithms that simply do not have enough documentation to be trusted. How was the imaging data pre-processed? How was the model trained? How was it selected and validated? Crickets.
We understand why people are excited about the potential of machine learning in healthcare, a high-stakes field where solutions can be frustratingly elusive. However, it benefits no one to rely on conclusions drawn from flawed data. In fact, doing so can be downright dangerous. Let us take the time to get machine learning right first.
Cynthia Murrell, July 1, 2021
Real Silicon Valley News Predicts the Future
July 1, 2021
I read “Why Some Biologists and Ecologists Think Social Media Is a Risk to Humanity.” I thought this was an amusing essay because the company publishing it is very much a social media thing. Clicks equal fame, money, and influence. These are potent motivators, and the essay is cheerfully ignorant of the irony of the Apocalypse foretold in the write up.
I learned:
One of the real challenges that we’re facing is that we don’t have a lot of information
But who is “we”? I can name several entities which have quite comprehensive information. Obviously these entities are not part of the royal “we”. I have plenty of information and some of it is proprietary. There are areas about which I would like to know more, but overall, I think I have what I need to critique thumbtyper-infused portents of doom.
Here’s another passage:
Seventeen researchers who specialize in widely different fields, from climate science to philosophy, make the case that academics should treat the study of technology’s large-scale impact on society as a “crisis discipline.” A crisis discipline is a field in which scientists across different fields work quickly to address an urgent societal problem — like how conservation biology tries to protect endangered species or climate science research aims to stop global warming. The paper argues that our lack of understanding about the collective behavioral effects of new technology is a danger to democracy and scientific progress.
I assume the Silicon Valley “real” news outfit and the experts cited in the write up are familiar with the work of J. Ellul? If not, some time invested in reading it might be helpful. As a side note, Google Books thinks that the prescient and insightful analysis of technology is about “religion.” Because Google, of course.
The write up adds:
Most major social media companies work with academics who research their platforms’ effects on society, but the companies restrict and control how much information researchers can use.
Remarkable insight. Why pray tell?
Several observations:
- Technology is not well understood
- Flows of information are destructive in many situations
- Access to information spawns false conclusions
- Bias distorts logic even among the informed.
Well, this is a pickle barrel and “we” are in it. What is making my sides ache from laughter is that advocates of social media in particular and technology in general are now asking, “Now what?”
Few like China’s approach or that of other authoritarian entities who want to preserve the way it was.
Cue Barbara’s “The Way We Were.” Oh, right. Blocked by YouTube. Do ecologists and others understand cancer?
Stephen E Arnold, July 1, 2021