Fusion Problems

May 29, 2014

Brett Slatkin at One Big Fluke makes a provoking point in his blog post: “Data Fusion Has No Error Bounds” about how data analysis can be full of calculating errors. Slatkin relates how he has come across many data fusion issues in his career. Data fusion problems occur when people want to merge two or more data sets without any related sources. There are companies that have tried to rectify data fusion problems, but no matter how they advertise their software, code, or gimmick Slatkin proves that there is always going to be some margin of error. How does he do it? Math.

Slatkin illustrates data fusion with three data sets that have zero to little relation. He outlines all the possible outcomes of each data set, ending with that there is a portion that cannot be measured. He proves that despite all of the careful planning, mapping out the possible outcomes yields a phantom zone. His response to this simple outcome is:

“There are two outcomes in data fusion: you measure so you can calculate the error bars, or you make a wild guess.”

What have we learned from this? Despite all attempts to overcome any errors, data analysis is still error prone. Big data vendors will not like that.

Whitney Grace, May 29, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Data, News

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.