Fusion Problems
May 29, 2014
Brett Slatkin at One Big Fluke makes a provoking point in his blog post: “Data Fusion Has No Error Bounds” about how data analysis can be full of calculating errors. Slatkin relates how he has come across many data fusion issues in his career. Data fusion problems occur when people want to merge two or more data sets without any related sources. There are companies that have tried to rectify data fusion problems, but no matter how they advertise their software, code, or gimmick Slatkin proves that there is always going to be some margin of error. How does he do it? Math.
Slatkin illustrates data fusion with three data sets that have zero to little relation. He outlines all the possible outcomes of each data set, ending with that there is a portion that cannot be measured. He proves that despite all of the careful planning, mapping out the possible outcomes yields a phantom zone. His response to this simple outcome is:
“There are two outcomes in data fusion: you measure so you can calculate the error bars, or you make a wild guess.”
What have we learned from this? Despite all attempts to overcome any errors, data analysis is still error prone. Big data vendors will not like that.
Whitney Grace, May 29, 2014
Sponsored by ArnoldIT.com, developer of Augmentext