Confused about Hadoop, Spark, and MapReduce? Not Necessary Now
March 24, 2016
I read “MapReduce vs. Apache Spark vs. SQL: Your questions answered here and at #StrataHadoop.” The article strikes at the heart of the Big Data boomlet. The options one has are rich, varied, and infused with consequences.
According to the write up:
Forester is predicting total market saturation for Hadoop in two years, and a growing number of users are leveraging Spark for its superior performance when compared to MapReduce.
Yikes! A mid tier consulting firm is predicting the future again. I almost stopped reading, but I was intrigued. Exactly what are the differences among these three systems, which appear to be, really different. MapReduce is a bit of a golden oldie, and there is the pesky thought in my mind that Hadoop is a close relative of MapReduce. The Spark thing is an open source effort to create a system which runs quickly enough to make performance mesh with the idea that engineers have weekends.
The write up states:
As I mentioned in my previous post, we’re using this blog series to introduce some of the key technologies SAS will be highlighting at Strata Hadoop World. Each Q&A features the thought leaders you’ll be able to meet when you stop by the SAS booth #1022. Next up is Brian Kinnebrew who explains how new enhancements to SAS Data Loader for Hadoop can support Spark.
Yikes, yikes. The write up is a plea for booth traffic. In the booth a visitor can learn about the Hadoop, Spark, and MapReduce options.
The most interesting thing about the article is that it presents a series of questions and some SAS-skewed answers. The point is that SAS, the statistics company every graduate student in psychology learns to love, has a Data Loader Version 2.4 which is going to make life wonderful for the Big Data crowd.
I wondered, “Is this extract, transform, and load” all over again?”
The answer is not to get tangled up in the substantive differences among Hadoop, Spark and MapReduce like the title of the article implied. The point is that one can use NoSQL and regular SQL.
So what did I learn about the differences among Hadoop, Spark, and MapReduce?
Nothing. Just content marketing without much content in my view.
SAS, let me know if you want me to explain the differences to someone in your organization.
Stephen E Arnold, March 24, 2016