Machine Learning and Data Quality

April 23, 2019

We’re updating our data quality files as part of the run up to my lecture at the TechnoSecurity & Digital Forensics Conference. A paper by is worth reading if you are thinking about how to solve some issues with the accuracy of the outputs of some machine learning systems. “Dear AI Startups: Your ML Models Are Dying Quietly.” The slow deterioration of certain Bayesian methods has been a subject I have addressed for years. The Sanau write up called to my attention another source of data deterioration or data rot; that is, seemingly logical changes made to field names and the insidious downstream consequences of these changes. The article provides useful explanations and a concrete example drawn from ecommerce. The article has a much broader application. Worth reading.

Stephen E Arnold, April 23, 2019


Comments are closed.

  • Archives

  • Recent Posts

  • Meta