Google Search: Retrievers Lose. Smart Software Wins

June 28, 2016

I scanned a number of write ups about Google’s embrace of machine learning and smart software. I supplement my Google queries with the results of other systems. Some of these have their own index; for example, Yandex.ru and Exalead. Others are metasearch engines will suck in results and do some post processing to help answer the users’ questions. Others are disappointing and I check them out when I have a client who is willing to pay for stone flipping; for example, DuckDuckGo, iSeek, or the estimable Qwant. (I love quirky spelling too.)

I read “RankBrain Third Most Important Factor Determining Google Search Results.” Here’s the quote I noted:

Google is characteristically fuzzy on exactly how it improves search (something to do with the long tail? Better interpretation of ambiguous requests?) but Jeff Dean [former AltaVista wizard] says that RankBrain is “involved in every query,” and affects the actual rankings “probably not in every query but in a lot of queries.” What’s more, it’s hugely effective. Of the hundreds of “signals” Google search uses when it calculates its rankings (a signal might be the user’s geographical location, or whether the headline on a page matches the text in the query), RankBrain is now rated as the third most useful. “It was significant to the company that we were successful in making search better with machine learning,” says John Giannandrea. “That caused a lot of people to pay attention.”Pedro Domingos, the University of Washington professor who wrote The Master Algorithm, puts it a different way: “There was always this battle between the retrievers and the machine learning people,” he says. “The machine learners have finally won the battle.”

I have noticed in the last year, that I am unable to locate certain documents when I use the words and phrases which had served me well before smart software became the cat’s pajamas.

One recent example was my need to locate a case example about a German policeman’s trials and tribulations with the Dark Web. When I first located this document, I was trying to verify an anecdote shared with me after one of my intelligence community lectures.

I had the document in my file and I pulled it up on my monitor. The document in question is the work of an outfit and person labeled “Lars Hilse.” The title of the write up is “Dark Web & Bitcoin: Global Terrorism “Threat Assessment. The document was published in April 2013 with an update issued in November 2013. (That document was the source or maybe confirmed the anecdote about the German policeman and his Dark Web research.)

For my amusement, I wondered if I could use the new and improved Google Web search to locate the document. I display section 4.8 on my screen. The heading of the section is “Extortion (of Law Enforcement Personnel).

I entered the phrase into Google without quotes. Here’s the first page of results:

image

None of the hits points to the document with the five word phrase.

Next I ran the query with quotes around the phrase like this “”Extortion (of Law Enforcement Personnel)”. Here’s the first page of results:

image

There were five hits out of the 60 million in the first query. Still no document. I discovered years ago that Google once indexed PDF, PowerPoint, and Word/Excel files. No longer. If the phrase were indexed by Google the list of hits should have included a link to the 23 page Hilse document.

I then ran the query with the author’s last name and the phrase without quotes. Here’s what Google displayed:

My third query returned this result and no pointer to the Hilse document:

image

The way to locate the document after I tried a number of search strategies is to enter this string:

Lars Hilse Dark Web & Bitcoin: Global Terrorism Threat Assessment

Here’s the result:

image

Bingo. There are four links to the PDF file. There is also a link to Lars Hilse.

What did this exercise reaffirm in my research?

  1. Google search works if the user knows the name of the author and the exact title of the document required. So much for intelligent query expansion and knowing what the “intent” of a search query is.
  2. Google’s index no longer points to content within a document even thought the document is a file type allegedly understood by Google’s indexing system. Looking for useful information within a PDF is pretty much impossible based on my research over the last nine months.
  3. The system generates irrelevant results even when the query is designed for high precision and recall; that is, the name of the author and the specific title of the document.

So what? Google is trying to reduce the credit card debt of search technology. The present methods index less. The smart system is not particularly useful to me. I typically do not look for topics related to Michael Jackson’s book collection or the audience share for the Top Gear TV show.

Net net: Web search using Google is not the useful tool it was before the company’s initial public offering and its transformation into everything except a search system delivering reasonable content indexing with acceptable precision and recall.

Hey, 90 percent of the searchers think Google is just super. Maybe for some. Not for me.

Stephen E Arnold, June 28, 2016

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta