Open Source Data Management: It Is Now Easy to Understand

January 10, 2016

I read “16 for 16: What You Must Know about Hadoop and Spark Right Now.” I like the “right now.” Urgency. I am not sure I feel too much urgency at the moment. I will leave that wonderful feeling to the executives who have sucked in venture money and have to find a way to generate revenue in the next 11 months.

The article runs down the basic generalizations associated with each of these open source data management components:

  • Spark
  • Hive
  • Kerberos
  • Ranger/Sentry
  • HBase/Phoenix
  • Impala
  • Hadoop Distributed File System (HDFS)
  • Kafka
  • Storm/Apex
  • Ambari/Cloudera Manager
  • Pig
  • Yarn/Mesos
  • Nifi/Kettle
  • Knox
  • Scala/Python
  • Zeppelin/Databricks

What the list tells me is two things. First, the proliferation of open source data tools is thriving. Second, there will have to be quite a few committed developers to keep these projects afloat.

The write up is not content with this shopping list. The intrepid reader will have an opportunity to learn a bit about:

  • Kylin
  • Atlas/Navigator

As the write up swoops to its end point, I learned about some open source projects which are a bit of a disappointment; for example, Oozie and Tez.

The key point of the article is that Google’s MapReduce which is now pretty long in the tooth is now effectively marginalized.

The Balkanization of data management is evident. The challenge will be to use one or more of these technologies to make some substantial revenue flow.

What happens if a company jumps on the wrong bandwagon as it leaves the parade ground? I would suggest that it may be more like a Pig than an Atlas. The investors will change from Rangers looking for profits to Pythons ready to strike. A Spark can set fire to some hopes and dreams in the Hive. Poorly constructed walls of Databricks can come falling down. That will be an Oozie.

Dear old Oracle, DB2, and SQLServer will just watch.

Stephen E Arnold, January 10, 2016

Comments

One Response to “Open Source Data Management: It Is Now Easy to Understand”

  1. problema ereccion on January 11th, 2016 2:56 am

    problema ereccion

    Open Source Data Management: It Is Now Easy to Understand : Stephen E. Arnold @ Beyond Search

  • Archives

  • Recent Posts

  • Meta