Requirements for Behind-the-Firewall Search

February 5, 2008

Last fall, I received a request from a client for a “shopping list of requirements for search.” The phrase shopping list threw me. My wife gives me a shopping list and asks me to make sure the tomatoes are the “real Italian kind”. She’s a good cook, but I don’t think she worries about my getting a San Marzano or an American genetically-engineered pomme d’amour.

Equating shopping list with requirements for a behind-the-firewall search / content processing system gave me pause. As I beaver away, gnawing down the tasks remaining for my new study Beyond Search: What to Do When Your Search System Won’t Work”, I had a mini-epiphany; to wit:

Getting the requirements wrong can
undermine a search / content processing system.

In this essay, I want to make some comments about requirements for search and content processing systems. I’m not going to repeat the more detailed discussion in The Enterprise Search Report, 1st, 2nd, and 3rd editions, nor will I recycle the information in Beyond Search. I propose to focus on the tendency of very bright people to see search and content processing requirements like check off items on a house inspection. Then I want to give one example of how a perceptual mismatch on requirements can cause a search and content processing budget to become a multi-year problem. To conclude the essay, I want to offer some candid advice to three constituencies: the customer who licenses a search / content processing solution, the vendor who enters into a deal with a customer, and the consultants who circle like buzzards.

Requirements

To me, a requirement is a clear, specific statement of a function a system should perform; for example, a search system should process the following file types: Lotus Notes, Framemaker, and DB2 tables.

How does one arrive at a requirement and then develop a list of requirements?

Most people develop requirements by combining techniques. Here’s a short list of methods that I have seen used in the last six months:

Ask users of a search or content processing system what they would like the search system to do
Look at information from vendors who seem to offer a solution similar to the one the organization thinks it wants
Ask a consultant, sometimes a specialist in a discipline only tangentially related to search.

The Fly Over

My preferred way of developing requirements is more mundane, takes time, and is resistant to short cuts. The procedure is easy to understand. The number of steps can be expanded when the organization operates in numerous locations around the world, processes content in multiple languages, and has different security procedures in place for different types of work.

But let’s streamline the process and focus on the core steps. When I was younger, I guarded this information closely. I believed knowing the steps was a key ingredient for selling consulting. Now, I have a different view, and I want you to know what I do for the simple reason that you may avoid some mistakes.

First, perform a data gathering sweep. In this step you will be getting a high-level or general view of the organization. Pay particular attention to these key areas. Any one of them can become a search hot spot and burn your budget, schedule, and you with little warning:

Technical infrastructure. This means looking at how the organization handles enterprise applications now, what the hardware platform is, what the work load on the present technical staff is, how the organization uses contractors and outsourcing, what the present software licensing deals stipulate, and the budget. I gather these data by circulating a data collection form electronically or using a variety of telephonic and in-person meetings. I like to see data centers and hardware. I can tell a lot by looking at how the cables are organized and from various log files which I can peruse on site with the customer’s engineer close at hand to explain a number or entry to me. The key point of the exercise is to understand if the organization is able to work within its existing budget and keep the existing systems alive and well.
User behavior. To obtain these data, I use two methods. One component is passive; that is, I walk around and observe. The other component is active; that is, I set up brief, informal meetings where people are using systems and ask them to show me what they now do. If I see something interesting, I ask, “What caused you to take that action?” I write down my observations. Note that I try to get lower-level employees input about needs before I talk to too many big wheels. This is an essential step. Without knowing what employees do, it is impossible to listen accurately to what top managers assert.
Competitive arena. Most organizations don’t know much about what their competitors do. In terms of search, most organizations are willing to provide some basic information. I find that conversations at trade shows are particularly illuminating. But another source of excellent information is search vendors. I admit that I can get executives on the telephone or by email pretty easily, but anyone can do that with some persistence. I ask general questions about what’s happening of interest in law firms or ecommerce companies. I am able to combine that information with data I maintain. From these two sources, I can develop a reasonable sense of what type of system is likely to be needed to keep Company A competitive with Company B.
Management goals. I try to get a sense of what management wants to accomplish with search and content processing. I like to hear from senior management, although most senior managers are out of touch with the actual information procedures and needs of their colleagues. Nevertheless, I endure discussions with the brass to get a broad calibration. Then I use two techniques to get information about the needs. Once these interviews or discussions are scheduled, I use two techniques to get data from mid-level managers. One technique is a Web survey. I use an online questionnaire and make it available to any employee who wishes to participate. I’m not a fan of long surveys. A few pointed questions delivers the freight of meaning I need. More importantly, survey data can be counted and used as objective data about needs. Second, I use various types of discussions. I like one-on-one meetings; I like small-group meetings; and I like big government-style meetings with 30 people sitting around a chunk of wood big enough to make a yacht. The trick is to have a list of questions and the ability to make everyone comment. What’s said is important but how people react to one another can speak volumes and indicate who really has a knack for expressing a key point for his / her co-workers.

I take this information and data, read it, sort it, and analyze it. The result is the intellectual equipment of a bookcase. The supports are the infrastructure. Each of the shelves consists of the key learnings from the high-level look at the organization. I don’t know how much content the organization has. I don’t know the file types. I don’t have a complete inventory of the enterprise applications into which the search and content processing must integrate. What I do know is whom to call or email for the information. So drilling down to get a specific chunk of data is greatly simplified by the high-level process.

Matching

I take these learnings and the specific data such as the list of enterprise systems to support and begin what I call the “matching sequence.” Here’s how I do it. I maintain a spreadsheet with the requirements from my previous search and content processing jobs. Each of these carries a short comment and a code that identifies the requirement by availability, stability, and practicality. For example, many companies want NLP or natural language processing. I code this requirement as Available, Generally Stable, and Impractical. You may disagree with my assessment of NLP, but in my experience few people use it, and it can add enormous complexity to an otherwise straight forward system. In fact, when I hear or identify jargon in the fly-over process, my warning radar lights up. I’m interested in what people need to do a job or to find on point information. I don’t often hear a person in accounting asking to do a query in the form a complete sentence. People want information in the most direct, least complicated way possible. Writing sentences is neither easy nor speedy for many employees working on a deadline.

What I have after working through my list of requirements and the findings from the high level process is three lists of requirements. I keep definitions or mini-specifications in my spread sheet, so I don’t have to write boiler plate for each job. The three lists with brief comments are:

Must-have. These are the requirements that the search or content processing system must meet in order to meet the needs of the organization based on my understanding of the data. A vendor unable to meet a must-have requirement, by definition, is excluded from consideration. Let me illustrate. Years ago, a major search procurement stipulated truncation, technically lemmatization. In plain English, the system had to discard inflections, called rearward truncation. One vendor wrote an email saying, “We will not support truncation.” The vendor was disqualified. When the vendor complained about the disqualification, I showed the vendor the email. Silence fell.
Options. These are requirements that are not mandatory for the deal, but the vendor should be able to demonstrate that these requirements can be implemented if the customers request them. A representative option is support for double-byte languages; e.g., Chinese. The initial deployment does not require double byte, but the vendor should be able to implement double-byte support upon request. A vendor who does not have this capability is on notice that if he / she wins the job, a request for double-byte support may be forthcoming. The wise vendor will make arrangements to support this request. Failure to implement the option may result in a penalty, depending on the specifics of the license agreement.
Nice-to-have. These are the Star Trek or science fiction requirements that shoot through procurements like fat through a well-marbled steak. A typical Star Trek requirement is that the system deliver 99 percent precision and 99 percent recall or deliver automatic translation with 99 percent accuracy. These are well-intentioned requests but impossible with today’s technology and budgets available to organizations. Even with unlimited money and technology, it’s tough to hit these performance levels.

Creating a Requirements Document

I write a short introduction to the requirements, create a table with the requirements and other data, and provide it to the client for review. After a period of time, it’s traditional to bat the draft back and forth, making changes on each volley. At some point, the changes become trivial, and the document is complete. There may be telephone discussions, face-to-face meetings, or more exotic types of interaction. I’ve participated in a requirements wiki, and I found the experience thrilling for the 20 – somethings at the bank and enervating for me. That’s what 40 years age difference yields — an adrenaline rush for the youngster and a dopamine burst for the geriatrics.

There are different conventions for a requirements document. The US Federal government calls a requirements document “a statement of work”. There are standard disclaimers, required headings for security, an explanation of what the purpose of the system is, the requirements, scoring, and a mind-numbing array of annexes.

For commercial organizations, the requirements document can be an email with the following information:

Brief description of the organization and what the goal is
The requirements, a definition, the metrics for performance or a technical specification for the item, and an optional comment
What the vendor should do with the information; that is, do a dog-and-pony show, set up an online demonstration, make a sales call, etc.
Whom to call for questions.

Whether you prefer the bureaucratic route or a Roman road builder method, you now have your requirements in hand.

Then What?

That’s is a good question. In go-go organizations, the requirements document is the guts of a request for a proposal. Managing an RFP process is a topic for another post. In government entities, the RFP may be preceded by an RFI or Request for Information. When the vendors provide information, a cross-matching of the RFI information with the requirements document (SOW) may be initiated. The bureaucratic process may take so long that the fiscal year ends, funding lost, and the project is killed. Government work is rewarding in its own way.

Whether you use the requirements to procure a search system or whether you put the project on hold, you have a reasonably accurate representation of what a search / content processing system should deliver.

The fly-over provides the framework. The follow up questions deliver detail and metrics. The requirements emerge from the analysis of these information and data. The requirements are segmented into three groups, with the wild and crazy requirements relegated to the “nice to have” category. The customer can talk about these, but no vendor has to be saddled with delivering something from the future today. The requirements document can be the basis of a procurement.

There are some pitfalls in the process I have described. Let me highlight three:

First, this procedure takes time, expertise, and patience. Most organizations lack adequate amounts of each ingredient. As a result, requirements are off kilter, so the search system can list or sink. How can a licensee blame the vendor when the requirements are wacky.

Second, the analysis of the data and information is a combination of analytic and synthetic investigation. Most organizations prefer to use their existing knowledge and gut instinct. While these may be outstanding resources, in my experience, the person who relies on these techniques is guessing. In today’s business climate, guessing is not just risky. It can severely damage an organization. Think about a well-known pharmaceutical company pushing a drug to trial despite it being known to show negative side effects in the company’s own prior research. That’s one consequence of a lousy behind-the-firewall search / content processing system.

Third, requirements are technical specifications. Today, people involved in search want to talk about the user interface. The user interface manifests what is in the system’s index. The focus, therefore, should not be on the Web 2.0 color and features of the interface. The focus must be kept squarely on the engineering specifications for the system.

You can embellish my procedure. You can jiggle the sequence. You may be able to snip out a step or a sub-process. But if you jump over the hard stuff in the requirements game, you will deploy a lousy system, create headaches for your vendor, annoy, even anger, your users, and maybe lose your job. So, get the requirements right. Search is tough enough without starting off on the wrong foot.

Stephen Arnold, February 6, 2008

Written by Stephen E. Arnold · Filed Under Library automation, Vertical search

Comments

One Response to “Requirements for Behind-the-Firewall Search”

Colbenson WebLog » Blog Archive » Enterprise search = “behind-the-firewall” search + site search on September 26th, 2008 8:25 am

[…] a enterprise search y al nuevo termino acuñado por Steve Arnold de Beyond Search: “search behind-the-firewall“. Simplemente queria profundizar un poco sobre las diferencias y el uso que se hace de estos […]

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.