The Data Lake Is a Hub: For Wheel I Tell You

August 25, 2015

When I read “Why Do I Need a Data Lake,” I thought about Mel Blanc. Mr. Blanc was a voice actor who enlivened the Jack Benny Show and Warner Bros. cartoons. For Mr. Benny, Mr. Blanc was the “sound” of the Maxwell automobile and the participant in the famous “Sí…Sy…sew…Sue” routine.

So what? I imagined Mr. Blanc reading aloud the write up to me as Daffy Duck.

Here’s a passage I highlighted and enjoyed:

The data lake has the potential to transform the business by providing a singular repository of all the organization’s data (structured AND unstructured data; internal AND external data) that enables your business analysts and data science team to mine all of organizational data that today is scattered across a multitude of operational systems, data warehouses, data marts and “spreadmarts”. [Emphasis in the original]

Note that the lake has “potential to transform”. I also like the categorical imperative of “all the organization’s data.” I find the “all” notion quite humorous because there are digital data which are not likely to be pooled and processed. One example is data governed by government contracts for which rules of secrecy apply. Another is digital information germane to a legal matter and in the control of the firm’s legal eagles. There are other examples as well. So the “all” is bobbing buoy. But what the heck is a spreadmart?

But the chortle inducing passage is the conversion of a data lake into a “hub and spoke service architecture.” That is quite a metaphorical shift.

Here’s another passage I highlighted:

the head of EMC Global Services Big Data Delivery team, termed this a “Hub and Spoke” analytics environment where the data lake is the “hub” that enables the data science teams to self-provision their own analytic sandboxes and facilitates the sharing of data, analytic tools and analytic best practices across the different parts of the organization.

I worked through the requisite list of dot points and then came upon a list of confusions for which I was prepared by the lake wheel juxtaposition. One confusion warrants some of my attention: “Create multiple data lakes.”

The idea is that an organization needs just “ONE [emphasis in original] data lake;

a singular repository where all of the organizations data – whether the organization knows what to do with that data or not – can be made available.  Organizations such as EMC are leveraging technologies such as virtualization to ensure that a single data lake repository can scale out and meet the growing analytic needs of the different business units – all from a single data lake.

I can hear Daffy as vivified by Mr. Blanc saying, “Do me a big data favor and scold anyone who starts talking about data lakes (plural) instead of a data lake.”

Okay, scold.

EMC, as I understand the firm’s strategy, is contemplating this action: The company has considered selling itself to one of its subsidiaries.

There you go. An example of a hub and spoke, data lake type analysis applied to storage. Why do I need a data lake.

That’s all folks.

Stephen E Arnold, August 25, 2015

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta