Three Metasearch Vulnerabilities and DuckDuckGo
May 25, 2012
I read “The Digital Skeptic: DuckDuckGo Cooks Google’s Goose.” I am okay with online cheerleading. I like to use metasearch systems like DuckDuckGo, but my favorite Ez2Ask.com went away. Ixquick is okay, but each of these systems has three vulnerabilities. I want to highlight them before my addled goose brain forgets them. It is possible that those experts writing about metasearch or federating systems will want to consider these points. One of two might make the analysis a little tastier, sort of like paté from a force fed goose.
First, metasearch engines take a query and send it to a third-party index. The results come back and the results are ideally deduped, relevance ranked, and displayed for the user. Some metasearch systems perform a number of value adding functions. These include putting the hits in folders, which was Vivisimo’s claim to fame. Others parse the results by source type and display them in groups, a function which EZ2Ask.com offered while it was going full throttle from its redoubt in southern France. But when the third party indexes charge money to pull results or just block the metasearch engine, the party is over. Vivisimo built a crawler in order to have an original index for some applications. Most metasearch systems just hope that the third party index won’t change the rules. Anyone remember the original BOSS service and its flexibility? So, vulnerability one is losing a source of hits. No hits, reduced utility. Less utility means less traffic.
Second, when queries are sent to third party indexes, there is latency. There are tricks to mask the latency, but the fact is that in certain situations, the metasearch engine is either presenting a partial result set or one that is just slow to render. So vulnerability two is a performance headache for the metasearch crowd.
Third, deduplication. For some queries, the Web indexes will bang the same drum and loudly. A query for Hewlett Packard Lynch will generate many duplicate and near duplicate hits. The metasearch system must have a way to winnow the most egregious duplicates from the results list and quickly. Slow deduping or no deduping is bad. Partial deduping may be acceptable, but there is a trade off. So, vulnerability three is a results list which contains many identical or similar stories.
Why do a metasearch engine if there are vulnerabilities cheerfully overlooked in the “Cooks Google’s Goose” write up?
- Metasearch is a heck of a lot cheaper to pull off than brute force search.
- Users often prefer the convenience of having one system “pull together” what the user perceives as the most relevant content
- Metasearch allows a marketer to engage in the type of promotion that produces the “Cooks Google’s Goose” article.
As an addled goose, I try not to be too confused about metasearch. Are you?
Stephen E Arnold, May 25, 2012
Sponsored by Polyspot