Google: A Brace of Media Analyzer Inventions
May 11, 2008
On May 8, 2008, the USPTO, an outstanding organization with a stellar search system, published two Google patent applications. US2008/0107337 is “Methods and Systems for Analyzing Data in Media Material Having Layout” and US2008/0107338 is “Media Material Analysis of Continuing Article Portions”. You can download these here.
Both inventions, to which Google is the assignee, pertain to figuring out what’s important and what’s not on Web pages. Companies that scan hard copy and convert those images to machine-readable ASCII use some tricks but a great deal of brute force to figure out what’s information and what’s advertising or other dross.
The inventions’ systems and methods can also be applied to other types of images converted to a machine-readable form; for example, a PDF that consists of the PDF wrapper and the TIFF image in the wrapper. I know that commercial database publishers are on top of Google’s innovations in content processing, so this is old news to the wizards at ProQuest, Reed Elsevier, and Thomson Reuters. But others in the less rarified atmosphere may find these disclosures interesting. Two patent documents stumbling through the USPTO’s hallowed halls are not an accident of fate.
Stephen Arnold, May 11, 2008