Breaking Relevance: The TrackMeKnot Method
July 18, 2011
Okay, with ProQuest, Cambridge Scientific, and Dialog about to jump into the statistical fog of relevance, I fell pretty glum. Most old school searchers prefer to type in explicit commands; for example
b 15
ss cc=77? AND cc=76?? AND esop
When the new “fuzzified” version of the commercial search system for ProQuest, Cambridge Scientific, and Dialog-type users, good luck with that. In the new commercial systems, the old school, brute force, Boolean approach would return consistent results search in and search out. Take it to the bank.
Change is afoot so queries will return somewhat unpredictable results depending on what pointers get jiggled in an index update.
If we shift to the free Web search engines, the notion of relevance is based on lots of “signals”. A signal is something that allows the search system to disambiguate or add context to an action. If you are running around an airport, the mobile search wizards want to look at your search history and hook those signals to your wandering GPS input. The result is search done for you.
Why is relevance lousy? Well, search engine optimization is to blame. The focus on selling targeted ads is a contributor. And there are some interesting software tools that aim to confuse certain traffic analysis systems. So far, no one wants to confuse the ProQuest, Cambridge Scientific, and Dialog-type systems, but the Web search world is like catnip.
One of our readers alerted us to TrackMeKnot, which is an obfuscation software designed to defeat certain types of usage tracking. Here’s what the developers say:
TrackMeNot runs in Firefox as a low-priority background process that periodically issues randomized search-queries to popular search engines, e.g., AOL, Yahoo!, Google, and Bing. It hides users’ actual search trails in a cloud of ‘ghost’ queries, significantly increasing the difficulty of aggregating such data into accurate or identifying user profiles. To better simulate user behavior TrackMeNot uses a dynamic query mechanism to ‘evolve’ each client (uniquely) over time, parsing the results of its searches for ‘logical’ future query terms with which to replace those already used.
If you want to cover your search clicks, give it a whirl. Obfuscation methods, if used by lots of people, may have an adverse impact on relevance, particularly when personalization is enabled. Lucky me.
Stephen E Arnold, July 18, 2011
Sponsored by Pandia.com (www.pandia.com), publishers of The New Landscape of Enterprise Search.