A Data Lake: Batch Job Dipping Only

February 11, 2016

I love the Hadoop data lake concept. I live in a mostly real time world. The “batch” approach reminds me of my first exposure to computing in 1962. Real time? Give me a break. Hadoop reminded me of those early days. Fun. Standing on line. Waiting and waiting.

I read “Data Lake: Save Me More Money vs. Make Me More Money.” The article strikes me as a conference presentation illustrated with a deck of PowerPoint goodies.

One of the visuals was a modern big data analytics environment. I have seen a number of representations of today’s big data yadda yadda set ups. Here’s the EMC take on the modernity:

Straight away, I note the “all” word. Yep, just put the categorical affirmative into a Hadoop data lake. Don’t forget the video, the wonky stuff in the graphics department, the engineering drawings, and the most recent version of the merger documents requested by a team of government investigators, attorneys, and a pesky solicitor from some small European Community committee. “All” means all, right?

Then there are two “environments”. Okay, a data lake can have ecosystems, so the word environment is okay for flora and fauna. I think the notion is to build two separate analytic subsystems. Interesting approach, but there are platforms which offer applications to handle most of the data slap about work. Why not license one of those; for example, Palantir, Recorded Future?

And that’s it?

Well, no. The write up states that the approach will “save me more money.” In fact, one does not need much more:

The savings from these “Save me more money” activities can be nice with a Return on Investment (ROI) typically in the 10% to 20% range. But if organizations stop there, then they are leaving the 5x to 10x ROI projects on the table. Do I have your attention now?

My answer, “No, no, you do not.”

Stephen E Arnold, February

Written by Stephen E. Arnold · Filed Under Analytics, Database, News

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.