Big Data Basics: Garbage In, Garbage Out Still a Problem

July 20, 2015

The person writing “Data Integrity: A Sequence of Words Lost in the World of Big Data” appears to be older than 18. I don’t hear too many young wizards nattering about data integrity. The operative concept is that with enough data, the data work out the bumps in the Big Data tapestry. The cloth may have leaves and twigs in it. But when you make the woven object big enough and hang it on a wall in a poorly illuminated chateau, who can tell. Few visitors demand a ladder and a lanthorn to inspect the handiwork.

According to the write up:

The purpose of this post is to highlight the necessity to keep data clean and orderly so that the results of the analysis are reliable and trustworthy – if data integrity is intact, information derived from this data will be trustworthy resulting in actionable information.

Why tackle this topic in a blog for Big Data professionals?

Answer: No one pays much attention. The author saddles up and does the Don Quixote gallop at the Big Data hyperbole windmill.

The article includes a partial list of questions to ask and, keep this in mind, gentle reader, to answer. One example: “Are values outside of acceptable domain values?”

I found this article refreshing. Take a gander.

Stephen E Arnold, July 20, 2015

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta