Search: Appearances Are Deceiving

March 22, 2008

In Toronto, Ontario, several years ago, I attended a lecture in which the speaker (whose name I have forgotten) asked the audience, “What do you see?” When I saw this illustration, I saved it. My source was the University of Toronto. What do you see?

wheelsillustion

My myopic eyes see wheels that rotate. When I focus my attention on a single “wheel”, nothing moves. When I shift my vision, some wheels turn.

Search and retrieval is to some people similar to this illusion. I wish I could assure you that “search” will settle down, allow us to examine it carefully, and remain fixed if we shift our attention to another problem. I can’t. Search is a blob of digital mercury, and we are — at least for the foreseeable future — going to find that it’s elusive. Perception of the viewer “defines” search.

Why is this important?

On March 21, 2008, I spoke to a journalist who asked me, “What’s the difference between Intranet search and a company’s Web site search system?” The distinction is important because information behind-the-firewall is usually viewed as “for employees only”. There are exceptions such as a consultant or attorney who needs to examine information residing on an organization’s servers. The idea is that a user name, password, and even other types of authentication may be required to tap into invoices, customer information, marketing and sales materials, and other organization information and data.

An Organization’s Private Information

We have some common concepts to help keep the boundary between what’s available to an employee — usually called an “authorized user”– and someone who is visiting the organization for a meeting. Security procedures, a firewall, and other types of verification remind people that once you are “in the building” you have limited access privileges. These physical reminders exist in the digital world as well. When I was in San Francisco on March 13, 2008, I was shown a “blue” cable. This “blue” cable was for “visitors.” When I plugged my lap top in this cable, I was “outside” the company’s network. The employees were using “gray” cables to make sure I understood the difference.

Most people understand this type of digital boundary. It follows that a search system for a person using a “gray” cable will provide access to the information that is off limits for outsiders. The software and data available to users of the “gray” cable is special in some way. Those data are not intended for eyes not certified for viewing the company’s private information.

The Public Web Site

Many organizations have a public-facing Web site. This Web site is now an important part of the organization’s marketing, sales, and technical presence. Web sites come in a mind boggling range of forms. For our purpose at the moment, let’s assume that the Web site is operated by an organization’s marketing department. Also, the marketing department relies on the organization’s in-house technical staff to provide some technical support. But for our example, the marketing department makes use of an outside designer, two contract programmers, and interns. Senior management recognizes the importance of the Web site. Funding is adequate but not generous. The assumption is that existing work processes apply to the information on the Web site.

The marketing department wants to provide a search-and-retrieval function for the Web site. The organization licenses a high-profile, behind-the-firewall system. After some poking around,the marketing department finds itself faced with several search options. These are:

  1. Use the existing industrial strength search system for the Web site. An additional license fee is required. Some work is needed to configure the industrial-strength system to search the modest amount of information on the organization’s Web site. Because the Web site is hosted at a well-known service provider, a dedicated search sub system will be needed to handle the Web site search load. Between you and me, this seems to be a great deal of work, but it’s a viable path in many large organizations. Consistency and licensing issues outweigh common sense in some cases.
  2. Use the search system provided by the Web hosting company. A bit of sniffing around reveals two options that are supported by the organization’s hosting company. The first is to use Lucene, which the hosting company makes available to its Web customers. The alternative is to use a third-party search system operated by a hosted search provider. An example of a company offering this type of solution is Blossom Software. An interview with the wizard behind Blossom is here.
  3. Use the Google “free” custom search engine. This approach allows a visitor to the Web site to search the organization’s Web site using Google’s Web search system. You learn that a snippet of code is available from Google to activate this service, and it delivers the Google technology without charge to your user’s digital doorstep.
  4. License a search system. You find that you have two choices. The first is to license an appliance from Google, Thunderstone, or another vendor. The appliance is, for our purposes, a “search toaster”. It is easy to set up and indexes whatever you configure it to index. The second is to license a separate on-premises search system. You find that you have a large number of choices. In fact, if you read my work, you already know that there are more than 150 different systems available as of March 22, 2008.

What do you do? Okay, now you see the significance of the optical illusion. When you shift your focus, the whole search “problem” starts jiggling, rotating, and squirming. Little wonder that most organizations flip over a couple of small stones, peer quickly underneath, and make a decision based on the fact that nothing looked too frightening lurked under the stone. I find this amusing. Despite the yip yapping about search, most systems are selected by decidedly unscientific means. With finding information one of the most important activities for a knowledge worker, the search decision is almost consistently flawed.

What’s Search? What Do You Need?

From 2002 to 2004, I gave a number of tutorials about behind-the-firewall search. My work for various government agencies had given me an opportunity to learn about the major search systems. Then I popped in and out of the FirstGov.gov (now USA.gov) as an analyst-investigator. The material in the tutorial gathered together the basics of behind-the-firewall search. The information found its way into the first, second, and third editions of the Enterprise Search Report, published by CMSWatch.com. I think my name is still used in conjunction with the fourth edition, but I no longer do the editorial work. Behind-the-firewall search lost its charm for me, but in 2002, I was revved and ready. People were hungry for information about search and retrieval, and I had a wealth of practical knowledge on the subject.

To help the attendees understand the differences between behind-the-firewall search and an organization’s Web site search, I used this table.

siteandintranetsearch

© Stephen E. Arnold, 2002-2008. Contact seaky2000 at yahoo.com for permission to reuse. Some restrictions may apply.

Please, remember, that this is information dating from 2002, and if I were doing this chart today, I would make changes. For our purpose, however, this chart is still useful. Let me call your attention to several points germane to our problem of figuring out search for employees and Web site visitors. Suspend, if you will, your interest in search across the public facing Web site and the content behind the organization’s firewall. I will get to that issue in a moment.

Notice that in my 2002 analysis, there are some significant differences between these two search systems. In general, these distinctions remain true in March 2008. Also, did you see these distinctions:

  • Web site search is hosted; behind-the-firewall search is not. This is very important because when Web site search is brought “in house”, marketing usually gets minimal technical support. I know there are exceptions, but let’s face it. In-house IT professionals are understaffed and overwhelmed with work. Marketing often comes low on the priority task list.
  • Web site search, in general, is “lighter weight” than the “heavier”, industrial-strength behind-the-firewall search. What’s interesting is that performance of Web site search is generally better than behind-the-firewall search. The reason? Hosting services do a better job of scaling for their paying customers. An in-house search system must fight for resources and, of course, the attention of the on-staff performance gurus.
  • Security is not an issue for public Web site search. Security is a big deal — not the whole enchilada, mind you — but very important. With corporate executives dressed in orange jump suits or waking up to find embarrassing email in the morning news program headlines, security is important.
  • Reports for Web site usage are important. Marketing managers “prove” their worth with charts showing increases in traffic, hits, and unique visitors. Behind-the-firewall systems have logs that often go unopened. The response I hear when I ask about behind-the-firewall logs is “We don’t have time?” or “The vendor does it?” or “We ran out of disc space?” or some other baloney. Let’s face it. In-house IT professionals don’t have time to read the entrails of search system logs unless there’s a good reason such as a major meltdown. Marketing staff don’t have the skills to crunch the logs and figure out whether the analytics are right, wrong, or incomplete. Most in-house search systems operate on auto-pilot. When humans intervene, many are “flying blind”.

“I Want to Search Inside and Our Web Site Too”

Have you heard this request. I hear it almost daily. A user of the behind-the-firewall search system at the investment bank I worked for until the meltdown last week end, makes it almost impossible for a bank employee to search the Internet or the company’s own Web site from a desktop computer in the company.

Last time I checked, I need to search what’s on my servers, what’s on my Web site, and what’s on my Web log. I have multiple search systems to “find” my own private information. Vendors take great delight in giving me their software to use. I have many systems to use. I also have a hosted search service on my ArnoldIT.com Web site and this Web log. I use Blossom Software’s system. Alan Feuer is an acquaintance of mine, and he is proactive, usually thinking one step ahead of me. Plus, his code is solid.

But you have to figure out how to use your in-house, behind-the-firewall system to make “public” information available from your behind-the-firewall search system. The solution I encounter is using the behind-the-firewall crawler to index the public facing Web site. But there’s some sand in the gears; namely, users see duplicate results. In some cases, the information from the behind-the-firewall servers and the Web server are out of sync; that is, the data are different. Your co-workers are not eager to figure out which “fact” is correct. When your co-worked is getting heat from her boss to hurry up, a search system that poses problems to users is–to use the vernacular–a search system that “sucks”.

Ah, you think, we’ll federate. The only problem is that federation requires that you obtain, integrate, configure, and deploy a federation component. You have a “now” problem. I’ve just identified some steps that can easily consume three months or more at a mid-sized organization. In a government agency, you are looking at a one-year effort.

What do you do? Punt? Ignore it? Look for a short cut? Few, very few, solve this problem, and it is resolvable.

Let’s Look at the “Real” Issue

At this point, you are probably wondering, “Why is this guy making such a big deal of a non-problem?” Pandia called me a contrarian, and I am going to show you some more evidence about my willingness to push back against glib dismissals.

Here’s your “real” problem. The behind-the-firewall user wants to search the content on the public facing Web server. Users find that it is easier to locate the information they need to do their job by using Google. You, as the owner of this problem, have to face this fact–your search system doesn’t work very well.

You can’t hide from me. I’m the person hired by your company to figure out why the million dollar behind-the-firewall system is not being used. I am supposed to determine the “fix” for employees who use Google to locate the information needed to do routine functions such as finding the phone number of a key vendor. The in-house system can’t do this without timing out. Furthermore, the behind-the-firewall system doesn’t have “current” information. The marketing department does a better job of putting product descriptions on the public facing Web site, and that information doesn’t appear on the in-house system as quickly. In short, the in-house system sucks. Searching the organization’s Web site on Google or some other public search system is better. You know what I hear, “Find out what these guys are doing with our million dollars. I’m not spending any more money on a search system that no one uses.”

So what do you do? The answer is, “You have to realize that you have been trapped in an illusion.” Your world is not what you think you see. Search is not tidy, and it is not really “technology”. Search means figuring out the specifics of the information your colleagues need to do their jobs. You need to create an information process that makes available to the search systems your constituencies use. Notice that I am not saying, “The search system you licensed because you were looking for a quick fix” or “The search system you licensed because it was easier to go with the really friendly vendor with the great demo”.

The “real” issue, therefore, is not vendor specific. The crisis in search that many organizations face boils down to one fact–search is not perceived correctly. Gentle reader, search is about information and how that information is used by people. Key word queries do not equal search. Assisted navigation is not search. Canned reports do not deliver search results.

Wrap Up

What’s the fix? The remedy varies by patient. I wish there was a wonder drug to cure the “search” ills. Sorry. Not yet. I do have some recommendations for you to consider:

  1. Understand what your users need. Map those needs to what the organization wants. When there’s a disconnect, invest the time necessary to craft a solution that works without your budget and time constraints. Don’t guess. Make a decision based on hard work, not lifted by marketing hot gas.
  2. Recognize that you will have multiple search systems. You will need to work on the information processes within your organization to reconcile the inevitable differences that exist between different systems. You get into trouble when you don’t have the ability to separate search apples from search oranges.
  3. Expect the in-house information technology professionals to abandon you. Plan for it. Don’t act surprised when their priorities are not your priorities. Search, to them, is a sub system. The greater need is to keep the payroll system up and running and making sure the accounting system is not dead.
  4. Look at hosted solutions. The notion of cloud computing is not new. It’s called something different today, but it’s a variant of timesharing which has been around in one form or another for more than 50 years. It works. You don’t have the money or time to figure out technical plumbing, search, and information processes. If you did, you wouldn’t be reading this post or sitting in meetings as I ferret out the cost overruns, technical mistakes, and rising user resentment toward the existing search system.

Well, if you are reading this on your weekend, you may want to argue with me. No problem. Pandia calls me a contrarian, and now you know why. Share your experiences. I’m old, but not too old to learn.

Stephen Arnold, March 22, 2008

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta