Data Lake Alert: Tepid Water, High Concentration of Agricultural Runoff
August 13, 2015
Call me skeptical. Okay, call me a person who is fed up with silly jargon. You know what a database is, right? You know what a data warehouse is, well, sort of, maybe? Do you know what a data lake is? I don’t.
A lake, according to the search engine du jour Giburu:
An area prototypically filled with water, also of variable size.
A data lake, therefore, is an area filled with zeros and ones, also of variable size. How does a data lake differ from a database or a data warehouse?
According to the write up “Sink or Swim – Why your Organization Needs a Data Lake”:
A Data Lake is a storage repository that holds a vast amount of raw data in its native format for processing later by the business.
The magic in this unnecessary jargon is, in my opinion, a quest, perhaps Quixotic?) for sales leads. The write up points out that a data lake is available. A data lake is accessible. A data lake is—wait for it—Hadoop.
What happens if the water is neither clear nor pristine? One cannot unleash the hounds of the EPA to resolve the problem of data which may not very good until validated, normalized, and subjected to the ho hum tests which some folks want to have me believe may be irrelevant steps in the land of a marketer’s data lakes.
My admonition, “Don’t drink the water until you know it won’t make life uncomfortable—or worse. Think fatal.”
Stephen E Arnold, August 13, 2015