Sparse Data Locality Invention from the Google

July 29, 2009

Google has been lighting up my Overflight patent watch in the last couple of days. The USPTO seems to be pushing work out the door this week. Vacation may be luring the intrepid examiners. One Google patent caught my eye. The title is certainly exciting: “Storing a Sparse Table Using Locality Groups”. You can locate the document at the USPTO searching for US patent 7,567,973. The abstract is as clearly written as the lawyers, mathematicians, and physicists at Google can make it:

Each of a plurality of data items is stored in a table data structure. The table structure includes a plurality of columns. Each of the columns is associated with one of a plurality of locality groups. Each locality group is stored as one or more corresponding locality group files that include the data items in the columns associated with the respective locality group. In some embodiments, the columns of the table data structure may be grouped into groups of columns and each group of columns is associated with one of a plurality of locality groups. Each locality group is stored as one or more corresponding locality group files that include the data items in the group of columns associated with the respective locality group.

The addled goose interprets this invention, filed in August 2005, as an important component of the BigTable technology. A blue-collar version of some of this data management wizardry is available as Hadoop. The good stuff, however, has not yet made it into the wild and wonderful world of open source.

locality structure

The schematic for Logical Table Data Structure

Why is this an important invention?

In my opinion, this technology performs three modest tricks. Think of trained dogs at an animal circus who can perform a small number of tricks very, very quickly with little or no intervention by the ring master.

First, the invention tackles the problem of storing large amounts of data in distributed computer systems. To make this approach even feasible, an “efficient manner” is needed to represent the data. The “locality group” is one of the key notions that the USPTO has blessed.

Second, the invention has to handle Google’s multi dimensional data. For Google to perform clever tricks with time, the company has to have a way to handle x, y, and z axes. The invention explains some of engineering for this important twist.

Finally, squishing the data tables to minimize storage, transfer, look up and other size-sensitive functions, the inventors have a method for compressing locality groups and metadata for each group.

If you want to get a look at Google circa 2005, the document is a useful one.

The question is, “What’s up for 2010?”

Stephen Arnold, July 29, 2009

Comments

One Response to “Sparse Data Locality Invention from the Google”

  1. Micro-burst: Master Node Topology — Dave Graham's Weblog on August 12th, 2009 10:54 am

    […] Sparse Data Locality Invention from the Google (arnoldit.com) […]

  • Archives

  • Recent Posts

  • Meta