Google Books and Lousy Indexing

September 6, 2009

Thomas Claburn’s “Google Books Metadata Includes Millions of Errors” disclosed some dirty meta data laundry from the Google Books project. Mr. Claburn reported:

A metadata provider gave Google a large number of book records from Brazil that list 1899 as a default publication date, resulting in about 250,000 misdated books from this one source.

Mr. Claburn rounded up additional information that suggests the error problem is orders of magnitude larger than some expect. The good news is that Google is working to correct errors. The bad news is that Google, like other commercial database producers, generates products and services that users perceive to be “right”. In reality, there are quite a few flaws in electronic products. Mistakes in print can be seen and easily shared with others. Electronic mistakes often behave differently and in many cases will go uncorrected for a long time, maybe forever, without anyone knowing what’s amiss or what the impact of the mistake is when smart software sucks up errors as fact. Whizzy new systems that generate reliability and provenance “tags” can be easily fooled. The repercussions of these types of propagated errors are going to be interesting to understand.

Stephen Arnold, September 6, 2009

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta