Data Mining: A Bad Report Card
October 9, 2008
Two readers sent me a link to reports about the National Research Council’s study findings about data mining. Declan McCullagh’s “Government Report: Data Mining Doesn’t Work Well” for CNet is here. BoingBoing’s most colorful write up of the report is here. The is certainly catchy, “Data Mining Sucks: Official Report.” The only problem with the study’s findings is that I don’t believe the results. I had a stake in a firm responsible for a crazy “red, yellow, green” flagging system for a Federal agency. The data mining system worked like a champ. What did not work was the government agency responsible for the program and the data stuffed into the system. Algorithms are numerical recipes. Some work better than others, but in most cases, the math in data mining is pretty standard. Sure there are some fancy tricks, but these are not the deep, dark secrets locked in Descartes’ secret notebooks. The math is taught in classes that dance majors and social science students never, ever consider taking. Cut through the math nerd fog, and the principles can be explained.
I am also suspicious that nothing reassures a gullible reader more than a statement that something is broken. I don’t think I am going to bite that worm nestled on a barbed hook. Clean data, off-the-shelf algorithms, reasonably competent management, and appropriate resources–data mining works. Period. Fumble the data, the management, and the resources–data mining outputs garbage. To get a glimpse of data mining that works, click here. Crazy stuff won’t work. Pragmatic stuff works really well. Keep that in mind after reading the NRC report.
Stephen Arnold, October 9, 2008