Google Book Search: Broken Unfixable under Current Incentives

February 19, 2019

I read “How Badly is Google Books Search Broken, and Why?” The main point is that search results do not include the expected results. The culprit, as I understand the write up, looking for rare strings of characters within a time slice behaves in an unusual manner. I noted this statement:

So possibly Google has one year it displays for books online as a best guess, and another it uses internally to represent the year they have legal certainty a book is released. So maybe those volumes of the congressional record have had their access rolled back as Google realized that 1900 might actually mean 1997; and maybe Google doesn’t feel confident in library metadata for most of its other books, and doesn’t want searchers using date filters to find improperly released books. Oddly, this pattern seems to work differently on other searches. Trying to find another rare-ish term in Google Ngrams, I settled on “rarely used word”; the Ngrams database lists 192 uses before 2002. Of those, 22 show up in the Google index. A 90% disappearance rate is bad, but still a far cry from 99.95%.

There are many reasons one can identify for the apparent misbehavior of the Google search system for books. The author identifies the main reason but does not focus on it.

From my point of view and based on the research we have done for my various Google monographs, Google’s search systems operate in silos. But each shares some common characteristics even though the engineers, often reluctantly assigned to what are dead end or career stalling projects, make changes.

One of the common flaws has to do with the indexing process itself. None of the Google silos does a very good job with time related information. Google itself has a fix, but implementing the fix for most of its services is a cost increasing step.

The result is that Google focuses on innovations which can drive revenue; that is, online advertising for the mobile user of Google services.

But Google’s time blindness is unlikely to be remediated any time soon. For a better implementation of sophisticated time operations, take a look at the technology for time based retrieval, time slicing, and time analytics from the Google and In-Q-Tel funded company Recorded Future.

In my lectures about Google’s time blindness DNA, I compare and contrast what Recorded Future can do versus what Google silos are doing.

Net net: Performing sophisticated analyses of the Google indexes requires the type of tools available from Recorded Future.

Stephen E Arnold, February 19, 2019


Comments are closed.

  • Archives

  • Recent Posts

  • Meta