The Wages of SEO Sin

February 13, 2011

So Google can be fooled. It’s not nice to fool Mother Google. The inverse, however, is not accurate. Mother Google can take some liberties. Any indexing system can. Objectivity is in the eye of the beholder or the person who pays for results.

Judging from the torrent of posts from “experts”, the big guns of search are saying, “We told you so.” The trigger for this outburst of criticism is the New York Times’s write up about JC Penny. You can try this link, but I expect that it and its SEO crunchy headline will go dark shortly. (Yep, the NYT is in the SEO game too.)

Everyone from AOL news to blog-o-rama wizards are reviling Google for not figuring out how to stop folks from gaming the system. Sigh.

I am not sure how many years ago I wrote the “search sucks” article for Searcher Magazine. My position was clear long before the JC Penny affair and the slowly growing awareness that search is anything BUT objective.

Source: http://www.brianjamesnyc.com/blog/?p=157

In the good old days, database bias was set forth in the editorial policies for online files. You could disagree with what we selected for ABI/INFORM, but we made an effort to explain what we selected, why we selected certain items for the file, and how the decision affected assignment of index terms and classification codes. The point was that we were explaining the mechanism for making a database which we hoped would be useful. We were successful, and we tried to avoid the silliness of claiming comprehensive coverage. We had an editorial policy, and we shaped our work to that policy. Most people in 1980 did not know much about online. I am willing to risk this statement: I don’t think too many people in 2011 know about online and Web indexing. In the absence of knowledge, some remarkable actions occur.

You don’t know what you don’t know or the unknown unknowns. Source: http://dealbreaker.com/donald-rumsfeld/

Flash forward to the Web. Most users assume incorrectly that a search engine is objective. Baloney. Just as we set an editorial policy for ABI/INFORM each crawler and content processing system has similar decisions beneath it.

The difference is that at ABI/INFORM we explained our bias. The modern Web and enterprise search engines don’t. If a system tries to explain what it does, most of the failed Web masters, English majors working as consultants, and unemployed lawyers turned search experts just don’t care.

Search and content processing are complicated businesses, and the appetite for the gory details about certain issues are of zero interest to most professionals. Here’s a quick list of “decisions” that must be made for a basic search engine:

How deep will we crawl? Most engines set a limit. No one, not even Google, has the time or money to follow every link.
How frequently will we update? Most search engines have to allocate resources in order to get a reasonable index refresh. Sites that get zero traffic don’t get updated too often. Sites that are sprawling and deep may get three of four levels of indexing. The rest? Forget it.
What will we index? Most people perceive the various Web search systems as indexing the entire Web. Baloney. Bing.com makes decisions about what to index and when, and I find that it favors certain verticals and trendy topics. Google does a bit better, but there are bluebirds, canaries, and sparrows. Bluebirds get indexed thoroughly and frequently. See Google News for an example. For Google’s Uncle Sam, a different schedule applies. In between, there are lots of sites and lots of factors at play, not the least of which is money.
What is on the stop list? Yep, a list can kill index pointers, making the site invisible.
When will we revisit a site with slow response time?
What actions do we take when a site is owned by a key stakeholder?

Is it possible to spoof Google? Sure. The JC Penny example is a good one, but I find examples of Google’s bumbling every day. I get auto generated pages of baloney. I get links to 404 errors at the Health & Human Services Web site. I find examples of content in the index and not in the cache. I find in most results lists totally useless links.

Run a query for “information optimization”. What do you get for this meaningless phrase? You get a link to Vivisimo which uses “information optimization” instead of “search done right”, its 2007 catchphrase. You get links to Hewlett Packard, a blog about information optimization, and baloney about search engine optimization. The problem is that the phrase is essentially meaningless. I think it has been crafted to make it easier to locate outfits in the fuzz business.

Google falls for this joyfully. Google even lists the mind bogglingly expensive Google Search Appliance as associated with “information optimization” via another meaningless phrase, “knowledge management.”

The fact of the matter is that as the Web content diffuses and becomes more voluminous, the opportunities to play tricks increases. And the Web search engines continue to make their decisions behind the scenes. Google is proud of the fact that it keeps its method secret. In The Google Legacy, I summarized about 100 factors in use in 2004 and 2005. Each “factor” is a form of editorial policy.

The content of indexes is never objective. Whether one looks at a commercial database or a Web index, decisions inform the scope, depth, and approach of what’s available to a user.

SEO or search engine optimization is just a variant of indexing. I personally find SEO in general and SEO experts in particular annoying at best. I prefer that Web sites have content about a subject. That content can be casual like the information in this Beyond Search blog. It can be weaponized like the information on some government and political sites. It can be wacky like the humor sites. SEO injects links and words that are designed to fool the Web indexing systems.

So what we have is bias in the indexing systems. We have bias in the content. We have bias in the tagging and linking.

So now everyone is horrified that free Web search systems are not “objective”.

Give me a break.

The whole search sector is not objective. The algorithms execute but no one knows the weights, thresholds, or tweaks under the hood. Believe me. There is a lot of fiddling that must be done. In the first index of the US Federal government using the Inktomi system, considerable time was spent removing certain content from the index. No one paid much attention in Year 2000, and I don’t too many people pay much attention to the contents and content scrubbing in most indexes. Where did that education policy go on the company Intranet site? Answer: the technical team was told to delete the pointers. Good bye information. Few notice. Filtering, cleaning, and scrubbing are routine in each of the component of a search system. One vice president wanted his Web site’s content updated in near real time. No problem. The Railway Retirement Board? Well, every six months will probably do the trick.

Two different users of the same search system can derive different results via their own search methods. The “smart” systems can display different search results for different users. Web content creators can manipulate the Web indexing systems. Heck, the Web indexing systems can manipulate the results to their benefit or the benefit of their bottom line.

There is no free lunch for those who don’t know what they don’t know.

Fact: There are no objective search results. Just my view from Harrod’s Creek. Don’t believe me. Track down a person with a master’s degree in library science and ask that individual to run sample queries for you and then analyze the results.

Run the same queries on commercial systems and on free Web search system. No system have content congruence. No relevance method delivers exactly the same results unless a human intervenes. To get information, one must use multiple search systems, multiple sources, and talk to humans. The notion of “getting an answer” is popular. The problem is that the answer may be wrong, biased, or an ad. Most Web users are happy with whatever pops up. Intellectual laziness? Too much work? Convenience? Who knows.

The reason there is a degree in information science is a reaction to a need to understand provenance, precision, recall, editorial policies, and indexing.

English majors, lawyers, lousy Web masters, and money crazed SEO experts are different in many interesting ways. Their view of information often contrasts sharply with that of a person with deep experience in library and information science.

Web search has to generate revenue and only be “good enough.” Forget the superlatives and deal with the bias inherent in search, content processing, and indexing.

Stephen E Arnold, January 13, 2011

Freebie and not lunch

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Federated search, Google, Microsoft, Real time search, Search, Text processing

Comments

One Response to “The Wages of SEO Sin”

Tweets that mention The Wages of SEO Sin : Beyond Search -- Topsy.com on February 13th, 2011 12:42 pm

[…] This post was mentioned on Twitter by Stephen E Arnold, SEO News & Tips, SEO Blog News, getcheapseo, The Search Trap and others. The Search Trap said: The Wages of SEO Sin http://chtr.it/eanSZp #search […]

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.