Machine Learning and Data Quality
April 23, 2019
We’re updating our data quality files as part of the run up to my lecture at the TechnoSecurity & Digital Forensics Conference. A paper by Sanau.co is worth reading if you are thinking about how to solve some issues with the accuracy of the outputs of some machine learning systems. “Dear AI Startups: Your ML Models Are Dying Quietly.” The slow deterioration of certain Bayesian methods has been a subject I have addressed for years. The Sanau write up called to my attention another source of data deterioration or data rot; that is, seemingly logical changes made to field names and the insidious downstream consequences of these changes. The article provides useful explanations and a concrete example drawn from ecommerce. The article has a much broader application. Worth reading.
Stephen E Arnold, April 23, 2019