Google: Try This on Windows Server
November 23, 2008
Google has thrown down a gauntlet to the supercomputer crowd. You can read “Sorting One Petabyte with MapReduce” here. To learn more about MapReduce, click here. Now this recent Google gauntlet is digital, not one of those Sir Walter Scott fictional jobs with yellow tassels and brass fittings. Google is saying, “Take a terabyte of data and sort it. Now beat this time: 68 seconds. Once you have that whipped, take one petabyte of data and sort it. Beat this time: 362 minutes. You can read the details, get a comparison so you have a sense how much data a petabyte comprises, and even a touch of Googley humor. The sentence begins, “We are not aware…” If you don’t laugh out loud, well, you aren’t Googley. Why mention this type of lab rat exercise? Four reasons:
- In my opinion Google is reminding the folks who are yapping about how fast their supercomputers are that the GOOG is running a zippy computer too
- Make it clear to Microsoft that it has some work to do to match Google’s as is data center performance
- Show that Google can tackle big data as part of its real world applications. If you allow unlimited block uploads to Google Base, then you darn well have to whip that stuff around withoiut choking the services that pay the bills like ads.
- Put a benchmark in place so that competitors like IBM, Oracle, and Yahoo get a hint about how far ahead in data management Google is now. Today. This instant. IBM’s, Oracle’s, and even Yahoo’s technologists may be able to say their system is as good as Google’s. Google is too polite to come right out and say, “We’re faster across the board on certain key benchmarks.”
Oh, I am feeling frisky this Sunday morning. Here’s a thought for you: if you don’t care about these sort speeds, I guarantee that you will have a tough time understanding what the GOOG has been doing for the last decade while everyone talked about the company as an ad agency.
Stephen Arnold, November 23, 2008