Hit Boosting: SEO for Intranet Search Systems
February 5, 2008
The owner of a local marketing company asked me, “Is there such a thing as SEO for an in-house search system?”
After gathering more information about this large health care organization, I can state without qualification, “Yes.”
Let’s define some terms, because the acronym SEO is used primarily to apply techniques to get a public Web page to appear at the top of a results list. Once this definition is behind us, I want to look at three situations (not exhaustive but illustrative) when you would want to use SEO for behind-the-firewall search. To wrap up, I will present three techniques for achieving SEO-type “lift” on an Intranet.
I’m not going to dig too deeply into the specific steps for widely-used search systems. I want to provide some broad guidance. I decided to delete this information from my new study Beyond Search: What to Do When Your Search System Doesn’t Work in order to keep the manuscript a manageable size.
SEO and its variants is becoming more and more important, and I have been considering a short monograph on this topic. I implore the SEO gurus, genii, and mavens to spare me their brilliant insights about spoofing Google, Live.com, and Yahoo. I am not interested in deceiving a public Web search engine. Anyway, my comments aren’t aimed at the public indexing systems. We’re talking about indexing information on servers that live behind a firewall.
Definition Time
SEO means “search engine optimization.” In my view, this jargon should be used exclusively for explaining how a Web master can adjust Web pages (static and dynamic) to improve a site’s ranking in a results list. The idea behind SEO is to make editorial and coding changes so a Web page buried on results page 12 appears on results page 1 even though the content on the Web page doesn’t warrant that high rank. A natural high rank can be seen with this query; go to Google and search for “arnoldit google patents”. My Web site should be at or near the top of the results list. SEO wizards want to make this high ranking happen — not by content alone — but with a short cut or trick. SEO often aims to exploit idiosyncrasies in the search sysetm indexing and ranking procedures. If you want to see a list of about 100 factors that Google allegedly used in the 2004-2005 time period, get a copy of my The Google Legacy. I include a multi-page table and some examples. But my thinking about distorting a relevancy procedures makes me queasy.
When you want to make sure specific content appears on a colleague’s behind-the-firewall, results page, you are performing hit boosting. The idea behind “hit boosting” is that certain organizational content will not appear on a colleague’s results page because it is too new, too obscure, or set forth in a manner that a behind-the-firewall content processing system cannot figure out.
An example from my files says is a memo whose text is in its entirety, “ATTN: Fire Drill at 3 PM. Mandatory. Susan.” Not surprisingly, you would have to be one heck of a search expert to find this document even if you knew it existed. With the latency in most behind the firewall content processing systems, this memo may not be in the index until the fire drill was over and forgotten.
To get this message in front of your colleagues, you need “hit boosting”. Some information retrieval experts just say “boosting” to refer to this function.
What Needs Boosting?
Let me give you an example. a vice president of the United States wanted his home page to come up at the top of a results list on various Federal systems. One system — used by the 6,000 officials and staff of the US Senate — did not index the Veep’s Web content. The only way to make the site appear was to do “hit boosting.” The reason had nothing to do with relevance, timeliness, or any query. The need for hit boosting was pragmatic. A powerful person wanted to appear on certain results pages. End of story. You may find yourself in a similar situation. If you haven’t, you probably will.
A second example is an expansion of the emergency notification about the fire drill. Your colleagues in certain departments — HR, legal, and accounting in my experience — tell you that certain information must be displayed for all employees. I dislike categorical affirmatives, but these folks wallow in them. Furthermore the distinction between a search system, a portal, and a newsfeed is “too much detail”. Some search system vendors have added components to make this news push task easier.
A third example is that a very important document cannot be located. There are many reasons for this. Some systems may perform key word indexing. The terminology of the document is very complex, even arcane. A person looking for this type of legal, scientific, technical, or medical document cannot locate it unless he or she knows the specific terminology used in the document. Searching by more general concepts buries the document in a lengthy result list or chops off the least relevant documents, displaying only the 20 most relevant documents. Some systems routinely reject documents if they exceed certain word counts, contain non-text objects, or is an unsupported file format. Engineers are notorious for spitting out a drawing with a linked broadsheet containing the components in the drawing and snippets of text explaining in geek-speak a new security system.
To recap, you have to use “hit boosting” to deal with requests from powerful people, display content to employees whether those employees have searched for the information or not, or manipulate certain types of information to make it findable.
In my work, the need for “hit boosting” is increasingly. The need rises as the volume of digital information goes up. The days of printing out a message and putting it on the bulletin board by the cafeteria are fast disappearing.
How to Do It
There are three basic techniques for “hit boosting”. I am going to generalize, not select a single system such as the Google Search Appliance or Vivisimo’s system. Details vary by system, but the broad principles I summarize should work.
First, you create a custom search query and link it to an icon, image, or chunk of text. When the user clicks the hot link, the system runs the query and displays the content. For example, you can use the seal of the vice president, use hover text that says, “Important Information from the Vice President”, and use a hot link on text that says, “Click here.” Variations of this approach include what I call “CSS tweaking” accompanied with an iFrame. The idea is that on any results page, you force the information of the moment in front of the user. If this seems like a banner ad or an annoying Forbes’ message, you are correct. The idea is that you don’t fool around with your enterprise search system. You write code to deliver exactly what the powerful person wants. I know this is not “search”, but most powerful people don’t know search from a Queensland cassowary. When you do a demo, the powerful one sees what he / she expects to see.
Second, you read the documentation for your search engine and look for the configuration file(s) that control relevance. Some high-end search systems allow you to specify conditions or feed “rules” to handle certain content. The trick here is to program the system to make certain content relevant regardless of the user’s query. If you can’t find the config file or specific relevance control panel, then you use the search systems API. Write explicit instructions to get content from location A and display it at location B. You may end up with an RSS hack to refresh the boosted content pool, so expect to invest some time mucking around to get the effect you want. Because vendor documentation is often quite like a haiku, you will be doing some experimenting. (Remember. Don’t do this on a production server.) You can also hack an ad display widget into your results page. With this approach, your boosted content is handled as an ad.
Third, you take the content object, rework it into a content type the search system can manipulate. Then you add lots of metadata to this reworked document. You are doing what SEO mavens call keyword stuffing or term stuffing. With some experimentation, you can make one or more documents appear in the context you want. Once you have figured out the right combination of terms to stuff, you can automate this process and “inject” these additional tags into any document you want to boost. (The manual hit boosting techniques should be automated as soon as you know the hack won’t cause other problems.)
Wrap Up
Hit boosting is an important task for system administrators and the politically-savvy managers of a behind-the-firewall search system. If you have other tricks and techniques, please, post them so others can learn.
Stephen Arnold, February 5, 2008
Comments
2 Responses to “Hit Boosting: SEO for Intranet Search Systems”
Hit Boosting: SEO for Intranet Search Systems…
The idea behind SEO is to make editorial and coding changes so a Web page buried on results page 12 appears on results page 1 even though the content on the Web page doesn’t warrant that high rank. A natural high rank can be seen with ……
There is a fourth approach which might be called “spotlighting” which doesn’t involve changing the content
in any way, such as adding meta-data. An example is at http://usa.gov – just search for your favorite senator.
You’ll see the bio details highlighted in its own region of the search results page. A small bit of custom indexing of the content is involved, perhaps even as a mashup.
Some design principles are suggested here:
http://searchdoneright.com/2007/01/indexing-high-value-enterprise-information/