Generalizations about Big Data: Hail, the Mighty Hadoop
May 26, 2015
I read “A Big Data Cheat Sheet: What Executives Want to Know.” The hidden agenda in the write up is revealed with the juxtaposition of the source Social Media Today and the technology Hadoop.
Big Data is one of those buzzwords which now grates on me. When I hear it, I wonder what the outfit is pitching and how something as nebulous as Big Data is going to save someone’s bacon or, if one is a vegetarian, tofu.
This write up beats the Hadoop drum. Isn’t Hadoop one method for performing certain types of data management tasks and extracting results from those tasks? Hadoop is a tool, and like a router in the home workshop, a pretty feisty gizmo in the hands of a novice.
The article suggests that Hadoop is a federation system. Hadoop can be a federation system, but it can handle data from a single source; for example, log files. Federation is not magic; it requires work. In fact, federation may render the benefits of Hadoop secondary to the cost of the resources required to utilize Hadoop in an effective way.
There are other assertions as well; for example:
- Hadoop can archive “all data.” Hmmm. “All.” Does this sound a bit over blown.
- Hadoop is enterprise ready? Sure, if the enterprise has the resources to make appropriate use of Hadoop.
- Are data lakes and data warehouses the same? According to the write up, the data warehouse uses structured data and the data lake is just a big pool of disparate data. Queries across this type of “pool” can be exciting and expensive.
- The upsides and downsides of the data lake pivot on data management. Okay, that is definitely true. What is not explored is the cost of managing large volumes of data, their updates, and their manipulation. Queries can be expensive.
My point is that sweeping generalizations about a technology which is useful are not helpful. Firing buzzwords into the mushy brain of a person involved in social media can have some interesting consequences.
Hadoop is not magic. Hadoop requires specialized knowledge. Hadoop does not deliver like the tooth fairy a quarter under one’s pillow. If Hadoop were the answer to Big Data problems, why are so many Hadoop projects vulnerable to very common problems in configuration, memory handling, lousy performance, and problematic hives?
Social media experts are not likely to appreciate these challenges as they work to deal with large volumes of data, updates, and queries. Oh, are the outputs valid? Frankly some Hadoop projects never face that problem.
Stephen E Arnold, May 26, 2015