Google Web Search Quality

April 20, 2022

The cat is out of the bag. The Reddit threat “Does Anyone Else Think Google Search Quality Has Gone Downhill Fast?” provides an interesting series of comments about “quality.”

The notion of “search quality” in the good old days involved gathering a corpus of text. The text was indexed using a system; for example, Smart or maybe Personal Bibliographic software. Test queries would be created in order to determine how the system displayed search results. The research minded person would then examine the corpus and determine if the result set returned the best matches. There are tricks those skilled in the art could use to make the test queries perform. One would calculate precision and recall. Bingo metrics. Now here’s the good part. Another search system would be used to index the content; for example, something interesting like the “old” Sagemaker, the mainframe fave IBM STAIRS III, or Excalibur. The performance of the second system would be compared to the first system. One would do this over time and generate precision and recall scores which could be compared. We used to use a corpus of Google patents, and I remember that Perfect Search (remember that one, gentle reader) outperformed a number of higher profile and allegedly more advanced systems.

I am not sure Reddit posts are into precision and recall, but the responses to the question about degradation of Google search quality is fascinating. Those posting are not too happy with what Google delivers and how the present day Googley search and retrieval system works. Thank you, Prabhakar Raghavan, former search wizard executive at Verity (wow, that was outstanding) and the individual who argued with a Bear Stearns’ managing director and me about how much better Yahoo’s semantic technology was that Google’s. Raghavan was at Yahooooo then and we know how wonderful Yahoo search was!)

Hewer’s a rundown of some of the issues identified in the Reddit thread:

  • From PizzaInteraction: “always laugh when I enter like 4 search terms and all the results focus on just one of the terms.”
  • Healthy-Contest-1605: “Every algorithm is being gamed to have their trash come out in top.”
  • Cl0udSurfer: “the usual tricks like adding quotes around required words, or putting a dash in front of words that should be excluded don’t work anymore.”

Net net: This is the Verity-Yahoo trajectory. Precision and recall? Ho ho ho. What about disclosing when a source was indexed and updated? What about Boolean operators? What about making as much money as possible so one can go to a high school reunion and explain the wonderfulness one’s cleverness? What happened to Louis Monier, Sanjay Ghemawat, and the Backrub crowd?

Stephen E Arnold, April 20, 2022


Comments are closed.

  • Archives

  • Recent Posts

  • Meta