Big Data Lake: Are the Data Safe to Consume?
August 2, 2015
I read “The Analytics Journey Leading to the Business Data Lake.” Data lake is one of the terms floating around (pun definitely intended!) to stimulate sales. If one has a great deal of water, one needs a place to put it. Even though water is dammed, piped, used, recycled, and dumped—storage is the key.
Enter EMC, a company which is in the business of helping those with water store it and make use of that substance.
The write up reflects effort. I assume there was a PowerPoint slide deck in the mix. There are some snazzy graphics. Here’s one that caught my eye:
Instead of enterprise search being the go-to enterprise software solution, EMC has slugged in the following umbrella terms:
- Information ecosystem
- Business intelligence (perhaps an oxymoron in light of this article)
- Advanced analytics (obviously because regular analytics just are zippy enough)
- Knowledge layer (I remain puzzled about knowledge because I have a tough time defining. In fact, I resigned from my for fee knowledge management column because I just don’t know what the heck “knowledge” means.)
- The unfathomable data lake (yep, pun intended). What’s wrong with the word “storage” or “database” by the way?
- Master data which is also baffling. Is there servant data too?
- Machine data. Again I have no clue what this means.
The chart scatters undefined and fuzzy buzzwords like a crazed Jethro Tull, a water soluble blend of Jethro Tull (inventor of the seed drill) and Jethro Tull (the commercially successful and eccentric rock bands).
The write up is important because EMC has sucked in the jargon and assertions once associated with enterprise search and applied them to the dark and mysterious data lake.
I highlighted:
Our data lake is one logical data platform with multiple tiers of performance and storage levels to optimally serve various data needs based on Service Level Agreements (SLA). It will provide a vast amount of structured and unstructured data at the Hadoop and Greenplum layers to data scientists for advanced analytics innovation. The higher performance levels powered by Greenplum and in-memory caching databases will serve mission-critical and real-time analytics and application solutions. With more robust data governance and data quality management, we can ensure authoritative, high-quality data driving all of EMC business insights and analytics driven applications using data services from the lake.
Ah, the Mariana Trench of enterprise information: Governance. Like “knowledge” and “advanced analytics”, governance has euphony. I think of the water lapping against the shore of Lake Paseco.
So what? Several observations:
- This type of “suggest lots” marketing ended poorly for a number of companies who used this type of rhetoric when marketing search
- The folks who swallow this bait are likely to find themselves in a most uncomfortable spot
- The problems associated with making use of information to improve decision making by reducing risk are not going to be solved by crazy diagrams and unsupported assertions.
EMC has been able to return revenue growth. But the company’s profit margin has flat lined.
I am not sure that increasing the buzzword density in marketing write ups will help angle the red lines to low earth orbit. With better margins, it is much easier to check out the topographic view and see where lakes meet land.
Stephen E Arnold, August 2, 2015