Nagging for Google for Relevance Ranking Secrets
January 3, 2017
I read “Good Luck in Making Google Reveal Its Algorithm.” The title is incorrect. I think the word I expected was “algorithms and administrative interfaces.” The guts of Google’s PageRank system appear in the PageRank patent assigned to the Stanford Board of Directors. Because the “research” for PageRank is based in part on a US government grant, the PageRank method discloses the basic approach of the Google. If one looks at the “references” to other work, one will find mentions of Eugene Garfield (the original citation value wizard), the IBM Almaden Clever team, and a number of other researchers and inventors who devised a way to figure out what’s important in the context of linked information.
What folks ignore is that it is expensive to reengineer the algorithmic plumbing at an outfit like Google. Think in terms of Volkswagen rewriting its emissions code and rebuilding its manufacturing plants to produce non cheating vehicles. That’s the same problem the Google has faced but magnified by the rate at which changes have been required to keep the world’s most loved Web search system [a] working, [b] ahead of the spoofers who can manipulate Mother Google’s relevance ranking, [c] diverse content including videos and the social Plus stuff, and [d] mobile.
The result is that Google has taken its Airstream trailer and essentially added tailfins, solar panels, and new appliances; that is, the equivalent of a modern microwave instead of the old, inefficient toaster oven. But the point is that the Google Airstream is still an Airstream just “new and improved.”
The net net is that Google itself cannot easily explain what happens within the 15 years and ageing fast relevance Airstream. Outsiders essentially put up content, fiddle with whatever controls are available, and then wait to see what happens when one runs a query for the content.
The folks driving the Ford F-150 pulling the trailer have controls in the truck. The truck has a dashboard. The truck has extras. The truck has an engine. The entire multi part assemble is the Google search system.
The point is that Google’s algorithm is not ONE THING. It is a highly complex system, and there are not many people around who know the entire thing. The fact that it works is great. Sometimes, however, the folks driving the Ford F 150 have to fiddle with the dials and knobs. That administrative control panel is hooked to some parts of the gear in the Airstream. Other dials just do things to deal with what is happening right now. Love bugs make it hard to see out of the windscreen, so the driver squirts bug remover fluid and turns on the windshield wipers. The Airstream stuff comes along for the ride.
The article cited above explains that Google won’t tell a German whoop-de-doo how it works. Well, the author has got the “won’t tell” part right. Even if Google wanted to explain how its “algorithm” works, the company would probably just point to a stack of patents and journal articles and say, “There you go.”
The write up states:
We know that search results – and social media news feeds – are assembled by algorithms that determine the websites or news items likely to be most “relevant” for each user. The criteria used for determining relevance are many and varied, but some are calibrated by what your digital trail reveals about your interests and social network and, in that sense, the search results or news items that appear in your feed are personalized for you. But these powerful algorithms, which can indeed shape how you see the world, are proprietary and secret, which is wrong. So, Merkel argues, they should be less opaque.
The article also is correct when it says:
So just publishing secret stuff doesn’t do the trick. In a way, this is the hard lesson that WikiLeaks learned.
The write up uses Google as a whipping post. The issue is not math. The issue is the gap between those who use methods that are “obvious” and those who look for fuzzy solutions. Why not focus on other companies which use “obvious” systems and methods? Answer: Google is a big, fat, slow moving, predictable, ageing target.
Convenient for real journalists. Oh, 89 percent of this rare species does their research via Google, clueless about how the sausage is made. Grab those open source documents and start reading.
Stephen E Arnold, January 4, 2016