Google, Search, and Swizzled Results
July 1, 2015
I am tired of answering questions about the alleged blockbuster revelations from a sponsored study and an academic Internet legal eagle wizard. To catch up on the swizzled search results “news”, I direct your attention, gentle reader, to these articles:
- “Google Manipulates Search Results, According to Study from Yelp and Legal Star Tim Wu”
- “Study Offers New Evidence That Google Skews Search Results”
I don’t look for information using my mobile devices. I use my trusty MacBook and various software tools. I don’t pay much, if any, attention to the first page of results. I prefer to labor through the deeper results. I am retired, out of the game, and ready to charge up my electric wheel chair one final time.
Let me provide you with three basic truths about search. I will illustrate each with a story drawn from my 40 year career in online, information access, and various types of software.
Every Search Engine Provides Tuning Controls
Yep, every search system with which i have worked offers tuning controls. Here’s the real life story. My colleagues and I get a call in our tiny cubicle in an office near the White House. The caller told us to make sure that the then vice president’s Web site came up for specific queries. We created for the Fast Search & Transfer system a series of queries which we hard wired into the results display subsystem. Bingo. When the magic words and phrases were searched, the vice president’s Web page with content on that subject came up. Why did we do this? Well, we knew the reputation of the vice president and I had the experience of sitting in a meeting he chaired. I strongly suggested we just do the hit boosting and stop wasting time. That VP was a firecracker. That’s how life goes in the big world of search.
Key takeaway: Every search engine provides easy or hard ways to present results. These controls are used for a range of purposes. The index just does not present must see benefits information when an employee runs an HR query or someone decides that content is not providing a “good user experience.”
Engineers Tailor Results Frequently
The engineers who have to deal with the weirdness of content indexing, the stuff that ends up in the exception file, a broken relevance function when an external synonym list is created, whatever—these issues have to be fixed one by one. No one talking about the search system knows or cares about this type of grunt work. The right fix is the one that works with the least hassle. If one tries to explain why certain content is not in the index, a broken conversion filter is not germane to the complainer’s conversation. When the exclusions are finally processed, these may be boosted in some way. Hey, people were complaining so weight these cont4ent objects so they show up. This works with grumpy advertisers, cranky Board members, and clueless new hires. Here’s the story. We were trying to figure out why a search system at a major trade association did not display more than half of the available content. The reason was that the hardware and memory were inadequate for the job. We fiddled. We got the content in the index. We flagged it so that it would appear at the top of a results list. The complaining stopped. No one asked how we did this. I got paid and hit the road.
Key takeaway: In real world search, there are decisions made to deal with problems that Ivory Tower types and disaffected online ecommerce sites cannot and will not understand. The folks working on the system put in a fix and move on. There are dozens and dozens of problems with every search system we have encountered since my first exposure to STAIRS III and BRS. Search sucked in the late 1960s and early 1970s, and it sucks today. To get relevant information, one has to be a very, very skilled researcher, just like it was in the 16th century.
New Hires Just Do Stuff
Okay, here’s a fact of life that will grate on the nerves of the Ivy League MBAs. Search engineering is grueling, difficult, and thankless works. Managers want precision and recall. MBAs often don’t understand that which they demand. So why not hard wire every darned query from this ivy bedecked whiz kid. Ask Jeeves took this route and it worked until the money for humans ran out. Today new hires come in to replace the experienced people like my ArnoldIT team who say, “Been there done that. Time for cyberOSINT.” The new lads and lasses grab a problem and solve it. Maybe a really friendly marketer wants Aunt Sally’s home made jam to be top ranked. The new person just sets the controls and makes an offer of “Let’s do lunch.” Maybe the newcomer gets tired of manual hit boosting, writes a script to automate boosting via a form which any marketer can complete. Maybe the script kiddie posts the script on the in-house system. Bingo. Hit boosting is the new black because it works around perceived relevance issues. Real story: At a giant drug company, researchers could not find their content. The fix was to create a separate search system, indexed and scored to meet the needs of the researchers, and then redirect every person from the research department to the swizzled search system. Magic.
Key takeaway: Over time functions, procedures, and fixes get made and managers, like prison guards, no longer perform serious monitoring. Managers are too busy dealing with automated meeting calendars or working on their own start up. When companies in the search business have been around for seven, ten, or fifteen years, I am not sure anyone “in charge” knows what is going on with the newcomers’ fixes and workarounds. Continuity is not high on the priority list in my experience.
What’s My View of the Wu-velations?
I have three observations:
- Search results boosting is a core system function; it is not something special. If a search system does not include a boosting function, programmers will find a way to deliver boosting even if it means running two queries and posting results to a form with the boosted content smack in the top spot.
- Google’s wildly complex and essentially unmanageable relevance ranking algorithms does stuff that is perplexing because it is tied into inputs from “semantic servers” and heaven knows what else. I can see a company’s Web site disappearing or appearing because no one understands the interactions among the inputs in Google’s wild and crazy system. Couple that with hit boosting and you have a massive demonstration of irrelevant results.
- Humans at a search company can reach for a search engineer, make a case for a hit boosting function, and move on. The person doing the asking could be a charming marketer or an errant input system. No one has much, if any, knowledge of actions of a single person or a small team as long as the overall system does not crash and burn.
I am far more concerned about the predictive personalization methods in use for the display of content on mobile devices. That’s why I use Unbubble.eu.
It is the responsibility of the person looking for information to understand bias in results and then exert actual human effort, time, and brain power to figure out what’s relevant and what’s not.
Fine beat up on the Google. But there are other folks who deserve a whack or two. Why not ask yourself, “Why are results from Bing and Google so darned similar?” There’s a reason for that too, gentle reader. But that’s another topic for another time.
Stephen E Arnold, July 1, 2015