Database Tussles: MapReduce and Parallel DBMSs
March 10, 2011
We don’t want to take sides in this fight. The points raised in “MapReduce and Parallel DMBSs: Friends or Foes?” will find plenty of experts who want to support their favorite. The authors of this paper are well known among the database elite. Not surprisingly, the article does a very good job of reviewing the strengths of each approach. The information in the discussion of each approach is quite useful, but for me, the most interesting segment of the write up was the discussion of “Architectural Differences”. I made sure I had a copy of this segment of the analysis because clear explanations of complex data management architectures are tough to locate.
For me, I found the conclusion somewhat obvious but reassuring to both sides in this battle which occupies some:
Most of the architectural differences discussed here are the result of the different focuses of the two classes of system. Parallel DBMSs excel at efficient querying of large data sets; MR-style systems excel at complex analytics and ETL tasks. Neither is good at what the other does well. Hence, the two technologies are complementary, and we expect MR-style systems performing ETL to live directly upstream from DBMSs. Many complex analytical problems require the capabilities provided by both systems. This requirement motivates the need for interfaces between MR systems and DBMSs that allow each system to do what it is good at. The result is a much more efficient overall system than if one tries to do the entire application in either system. That is, “smart software” is always a good idea.
Very useful write up.
Stephen E Arnold, March 10, 2011
Freebie