MapReduce: Google’s Database Probe Launched

August 26, 2008

Update 2, August 29, 2008, 1 50 pm Eastern

There’s an interesting and possibly relevant story on CNet here. Matt Asay wrote “Google’s Weird Ways with Open Source Licenses,” which became available on August 29, 2008. The core of the story is in the title. Open source licenses appear to be handled in a Googley way; that is, Google’s way. I sure don’t want to dispute the assertions that MapReduce as used by Aster Data and Greenplum is in any way affected by these “weird ways”. I do want to point you to this article and quote one sentence that was of interest to me:

As for the MPL, while DiBona doesn’t state it outright, I suspect that Google’s decision to re-up its commitment to Mozilla for three more years probably involved some strained discussions about Google’s weird decision to dump the MPL, one of the industry’s most popular open-source licenses.Regardless, all is well that ends well. Google came to the right decision, however odd the logic.

You can the Steve Shankland article, which touches upon the great MapReduce technology here. For something as simple as making code available as open source, there’s a lot of huffing and puffing. I’m watching for signs of smoke now. Wizards, pundits, and Googley types are welcome to add links, correct either of these authors, or opine with limited data via the comments on this addled goose’s Web log. What’s next for open source? The programmable search engine technology. That would be useful here in the hills of Kentucky.

Update 1, August 29, 2008, around 11 am Eastern

My comment about MapReduce triggered some keyboarding by various wizards. Thanks for the inputs. The point of the flurry is that MapReduce doesn’t have anything to do with Google. MapReduce is “in the wild” and anyone can make use of it. Nevertheless, I remain keenly interested in this technology for several reasons:

  1. MapReduce was the subject of a lecture given at the University of Washington several years ago by Jeffrey Dean and then written up as a paper.  You can snag a copy here.
  2. Google has been careful about the scope of its enterprise ambitions with regard to data management, data base, and data analysis. The company has been sufficiently circumspect as to make the key players in the database and data management market confident that Google’s enterprise ambitions are focused on search, maps, and light weight cloud applications. Forget the dashboard I wrote about. It’s light weight too.
  3. Aster Data is a company that came on my radar because of its “Googley nature”. I have picked up some suggestive comments about the robustness of the Aster Data technology and I learned from Aster Data that it is not interested in search. I believe that statement but I watch this space for interesting developments.

From my point of view, MapReduce–open source or any other variety–intrigues me. Based on my observation of things Google from my remote hide away in Harrod’s Creek, Kentucky, my hunch is that Google has a tiny bit of interest in how Aster Data and Greenplum use MapReduce, how their customers respond, and what interest the technology generates. In my lingo, Google learns from its environment. That’s why I sub titled my Google Version 2.0 study “the calculating predator”. Watching, learning, waiting–could this be part of the Map Reduce or broader Google goodness? I will let you know what I snag in my crawler.

Original Post Below

I wrote about Aster Data several weeks ago. If you are not familiar with the company, you may want to look at my article or navigate to the Aster Data Web site and get up to speed. It is an important company and is in the process of becoming more important.

InfoWorld’s “Database Vendors Add Google’s MapReduce” here reports that Google has cut a deal with Aster Data and Greenplum for Google’s nifty method of combining two separate functions into one instruction, reducing the “time” and computational cycles required to perform a task essential to chopping results from a larger data set. MapReduce is useful for certain operations with peta scale data.

Has Google entered the enterprise data management market? Not yet. Like Google’s interaction with Salesforce.com, Google is in “learn” mode. MapReduce by itself is not a complete data solution, but it provides some horsepower to Aster Data and Greenplum.

Will Google challenge IBM, Microsoft, and Oracle among others in the DBMS market? Google will watch and learn. Google has some serious data management capabilities in development. MapReduce is a golden oldie at Google.

When Google figures out what it wants to do to cash in on the pain many companies experience when using traditional database management systems, the Google will leap frog what’s available. For now, Google is no threat to DBMS vendors. In the future, who knows, probably not even Google until it gets enough hard data to justify a decision one way or the other.

Stephen Arnold, August 26, 2008

Comments

8 Responses to “MapReduce: Google’s Database Probe Launched”

  1. Curt Monash on August 28th, 2008 4:58 pm

    Steve,

    Your premise is wrong. There was no licensing from Google, and this is not a business initiative on Google’s part.

    Best,

    CAM

  2. Stephen E. Arnold on August 28th, 2008 9:33 pm

    Curt Monash,

    Thanks for posting. I suppose I should have defined the scope of the term. May I ask, “Are you certain there is no understanding about the terms of use?” If you have documentation, please, share it. My research indicates boundaries.

    Stephen Arnold, August 29, 2008

  3. Dataspaces Analysis Available : Beyond Search on August 29th, 2008 12:02 am

    […] oldie” technology MapReduce available to Aster Data and Greenplum. You can read about this here. Last year, I spoke with representatives of IBM and Oracle. I asked about their perceptions of […]

  4. Luke Lonergan on August 29th, 2008 5:02 am

    Hi Stephen,

    Licensing is not required for MapReduce as it is a work derived from many sources of publicly shared know-how. It dates back to the original Lisp operators Map and Reduce.

    The Wikipedia page is pretty complete here:
    http://en.wikipedia.org/wiki/MapReduce

    Greenplum’s MapReduce support is designed to provide a superset of the semantic content of open source Hadoop and Google’s implementations, making it straightforward to port from those environments to Greenplum’s data analysis and management engine.

    Some important extensions we provide include:
    – Extensions for Joins and Pipelined task execution
    – Native parallel file access
    – Parallelism is full and transparent to the programmer

    In summary: we have implemented MapReduce within which you can write SQL, Perl, Python and many more languages. It is straightforward use MR programs written for Hadoop or Google and port them to Greenplum.

  5. Curt Monash on August 29th, 2008 5:19 am

    Stephen,

    Luke is co-founder/CTO of Greenplum. Satisfied? ;)

    Best,

    CAM

  6. Stephen E. Arnold on August 29th, 2008 10:52 am

    Curt Monash,

    Luke posted as well. Regarding satisfaction, uncertain.

    Stephen Arnold, August 29, 2008

  7. Winning with Data: Aster Data Systems Blog » Blog Archive » TDWI MapReduce Nightschool Recap on November 14th, 2008 7:06 am

    […] and technology community, with recent coverage in the NY Times and by influential blogs like DBMS2, Beyond Search, and Cloud N, just to name a […]

  8. RIsEoben on December 5th, 2008 12:46 am

    SEO ZONE is a search engine optimization(seo) firm, provides seo, seo article, seo tools,seo news and seo related informations,helping companies leverage the internet to increase revenues and profits.