Google: Chubby and Paxos

July 26, 2008

The duo is not Cisco and Poncho or the Lone Ranger and Silver. Paxos is closer to leather biker gear and a Harley Davidson belt buckle. The outfit gets some panache and the biker’s pants stay properly slung. You may want to read the 16 pages of Googley goodness here. The paper is “Paxos Made Live–An Engineering Perspective.” One of the interesting facts about this paper is that Tushar Chandra has emerged as a spokesperson for Google. You can read my translation of some of his recent comments here.

In this brief essay, I want to identify three of the points discussed in this 2007 paper that are of particular interest to me. But before I highlight these points I want to provide some context. Chubby is a mechanism to keep processes from acting like hungry kindergartners running to the milk and cookies. Chubby keeps order and get the requests filled quickly without having two six year olds getting into a knock down fight over a graham cracker.

Chubby is pretty nifty technology, representing a major advance over the file and record locking schemes used for Codd databases. When I mention this point to IBM DB2 or Oracle wizards, I am greeted with hoots of laughter. “Google has nothing we don’t have and we have file and record locking schemes that are much better,” I was told in May 2007 in the IBM booth at a major trade show. No problem. I believe IBM and Oracle. I just hope their customers believe them, when Google reveals the efficiency of Chubby. You can learn more about Chubby in my 2005 The Google Legacy and my 2007 Google Version 2.0, or you can read this Google white paper. File and record locking for reads and writes is one of the hot spots in many database systems. Some companies turn cartwheels to figure out how to perform writes without screwing up read response time. Believe me, some of these outfits do Cirque de Soleil type acrobatics to work around the database read write problems.

Second, Chubby is not new. When a Google technical paper appears, Google is not revealing a work in progress. My analysis of Google engineering papers and patent documents suggests a careful staging of each information release. When a paper appears, the technology is up, running, and locked in. A competitor learning about a Google innovation from a patent document or a Google technical paper is learning about something that is two to five years “old”; that is, the company has been working on a problem and figured out a bunch of possible solutions. The one soluti0on that makes it into the Google production environment is a good one. When the Googlers talk about an innovation, the competitor who decides to respond is late out of the starting gate. Neither of my two Google studies contained “new” information. I was reporting what was ancient history for Googzilla.

Paxos

Now what’s a Paxos?

Paxos is not one thing. It is a collection of protocols that allow a system to adapt to failures. Google has lots of servers, so there are many failures. Chubby sits between the Google File System and Google’s BigTable (a data management system, not a traditional relational database). Wikipedia can deliver some less than stellar information, but the write up for Paxos struck me as reasonably good, and the information will get you anchored in the notion. The diagrams won’t be of much use, but the Google diagrams are almost equally opaque. The reason is that the flow diagrams don’t make much sense unless you have some experience with smart software in a failure prone environment. Based on the style of writing and the type of diagrams in the Paxos write up, my hunch is that a Google-grade brain contributed a thought of two the the Wikipedia entry. The external links reinforce my conclusion that this is a pretty reliable description of the flavors of Paxos. Of course, it’s tough to determine which “flavor” or “flavors” are part of the Google library.

A typical Google performance table. Google compares its processes to themselves, not to commercial alternatives. These data suggest that Google is doing the work of a cluster of high performance machines on a single commodity server. The key number is operations per second, which works out to 38,400 operations per second for 20 workers (clients). What’s remarkable is that throughput is 3.6 times greater for for the larger test database. In other words, as the data get bigger, the throughput goes up. © 2007 Google, Inc.

In my vastly simplistic way, Paxos is one tiny cog in Google’s library of smart algorithms. The algorithms crank mindlessly through a procedure writing values. Another process watches these values. When an anomaly becomes evident, the watching process “checks” with other processes and reaches a consensus about what action to take. It sounds really democratic and time consuming. The method is neither. The consensus is not like a human vote. When a group of processes return an acceptance value, the “master” decision is made automatically when a majority of the processes return a proposed value to the master.

Keep in mind that this occurs in a massively parallel computing environment. These types of system level processes occur with near zero latency. This type of master-slave set up is a feature of other core Google processes; for example, the Google File System itself. I describe the advantages of Google approach in The Google Legacy, and I will not repeat that information here. I think it is sufficient to point out that the approach has some very significant benefits, and most of Google’s competitors are racing to duplicate functionality that Google has had in operation for at least eight years.

So What?

My take on this paper is that Google in its own way is making it clear to potential employees, competitors, and partners smart enough to appreciate the depth of Google’s technical expertise these points:

The Chubby system was fast, but with Paxos fault tolerance runs like lightning. The metrics in the paper appear on page 14, and these are offered without context. The key point in the table is that Google’s fault tolerance method executes any where from 1.2 to 4.4 times faster than Google’s other methods. I looked at some of my data for lock operations, and I can say that Google has a performance advantage.
The Paxos method runs on commodity hardware in an environment that is fraught with failure. Google’s methods translate to this attitude, “We don’t care. We route around failures so there’s no downtime and no expense associated with a server or disc that fails.” This translates to a cost advantage. I include in my for fee studies some cost calculations, but, of course, Google won’t communicate with me, so my estimates provide only general guidance. The cost savings, however, are not trivial.
The Paxos method does not make Chubby piggy. Chubby is an efficient mechanism for file locking and unlocking, running on what are modest servers. To goose the file and record locks and unlocks on a big Oracle cluster, you have to have more hardware than a handful of commodity servers.

Net Net

This is an important paper. Like most Google technical write ups, it is neither casual nor without instrumentality. Google’s explanation of Paxos makes it clear to me that Google is putting in place the plumbing necessary to deliver sophisticated data management services first to consumers and then to the enterprise. I know that traditional database companies dismiss Google’s data capabilities as “Web search” centric. These folks are dead wrong. Google can scale, do it with commodity hardware, and deliver blinding performance where most systems encounter a bottleneck.

Stephen Arnold, July 26, 2008

Written by Stephen E. Arnold · Filed Under Cloud computing, Database, Feature, Google, Online (general), Technology

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.