Google and ProQuest

September 15, 2008

The Library Journal story “ProQuest and Google Strike Newspaper Digitization Deal” puts a “chrome” finish on a David and Goliath story. Oh, maybe that is ProQuest and Googzilla? In the story my mother told me, David used a sling to foil to big, dumb Goliath. With some physics, Goliath ended up dead. You need to read Josh Hadro’s version of this tale here.

The angle is that Google will pay UMI–er, ProQuest–to digitize. For me the most important paragraph in the story was:

The deal leaves significant room for ProQuest to differentiate its Historical Newspapers offering, which contain such publications as the New York Times and Chicago Tribune, as a premium product in terms of added editorial effort and the human intervention required to make its selectively scanned materials more discoverable and useful to expert researchers. In contrast to scanning by Google, editors hired by ProQuest check headlines, first paragraphs, captions, and more to achieve their claim of “99.95 percent accuracy.” In addition, metadata is added along with tags describing whether the scanned content is an article, opinion piece, editorial cartoon, etc. Finally, ProQuest stresses that the agreement does not affect long-term preservation plans for the microfilm collection. “Microfilm will always be the preservation medium…”

Three thoughts:

Commercial databases are starting to face rough water. Google, though not problem free, faces rough water with a nuclear powered stealth war craft. UMI–er, ProQuest–has a birch bark canoe.
Once the data are in the maw of the GOOG, what’s the outlook for UMI–er, ProQuest? In my opinion, this is a short term play with the odds in the mid and long term favoring Google.
Will the Cambridge Scientific financial wizards be able to float the Dialog Information Services boat, breathe life into library sales, and make the “microfilm will always be the preservation medium” a categorical affirmative? In my opinion, the GOOG has its snoot in the commercial database business and will disrupt it sending incumbent leaders into a tizzy.

Yes, and the point about David and Goliath. I think Goliath wins this one. Agree? Disagree? Help me learn. Just bring facts to the party.

Stephen Arnold, September 15, 2008

Written by Stephen E. Arnold · Filed Under Business strategy, Google, News, Online (general), Search, Vertical search | Comments Off on Google and ProQuest

A Vertical Search Engine Narrows to a Niche

September 4, 2008

Focus. Right before I was cut from one of the sports teams I tried to join I would hear, “Focus.” I think taking a book to football, wrestling, basketball, and wrestling practice was not something coaches expected or encouraged. Now SearchMedica, a search engine for medical professionals, is taking my coach’s screams of “Focus” to heart. The company announced on September 3, 2008, a practice management category. The news release on Yahoo said:

The new category connects medical professionals with the best practice management resources available on the Web, including the financial, legal and administrative resources needed to effectively manage a medical practice.

To me the Practice Management focus is a collection of content about the business of running a health practice. In 1981, ABI/INFORM had a category tag for this segment of business information. Now, the past has been rediscovered. The principal difference is that access to this vertical search engine is free to the user. ABI/INFORM and other commercial databases charge money, often big money to access their content.

If you want to know more about SearchMedica, navigate to www.searchmedica.com. The company could encourage a host of copy cats. Some would tackle the health field, but others would focus on categories of information for specific user communities. If SearchMedica continues to grow, it and other companies with fresh business models will sign the death sentence for certain commercial database companies.

The fate of traditional newspapers is becoming increasingly clear each day. Super star journalists are starting Web logs and organizing conferences. Editors are slashing their staff. Senior management teams are reorganizing to find economies such as smaller trim sizes, fewer editions, and less money for local and original reporting. My though is that companies like SearchMedica, if they get traction, will push commercial databases companies down the same ignominious slope. Maybe one of the financial sharpies at Dialog Information Services, Derwent, or Lexis Nexis will offer convincing data that success is in their hands, not the claws of Google or upstarts like SearchMedica. Chime in, please. I’m tired of Chrome.

Stephen Arnold, September 4, 2008

Written by Stephen E. Arnold · Filed Under News, Online (general), Search, Technology, Vertical search | Comments Off on A Vertical Search Engine Narrows to a Niche

Vertical Search Resurgent

July 16, 2008

Several years ago, the mantra among some of my financial service clients was, “Vertical search.” What’s vertical search? It is two ideas rolled into one buzzword.

A Casual Definition

First, the content processed by the search system is about a particular topic. Different database producers define the scope of a database in idiosyncratic ways. In Compendex, an index of engineering information, you can find a wide range of engineering topics, covering many fields. You can find information about environmental engineering, which looks to me as if the article belongs in a database about chemistry. But in general, the processed information fits into a topical basket. Chemical Abstracts is about chemistry, but the span of chemistry is wide. Nevertheless, the guts of a vertical search engine is bounded content that is brought together in a generally useful topic area. When you look for information about travel, you are using a vertical search engine. For example, Orbitz.com and BookIt.com are vertical search engines.

Second, the content has to searchable. So, vertical content collections require a search engine. Vertical content is often structured. When you look for a flight from LGA to SFO, you fill in dates, times, department airport code, arrival airport code, etc. A parametric query is a fancy way of saying, “Training wheels for a SQL query.” But vertical content collections can be processed by the menagerie of text processing systems. When you query, the Dr. Koop Web site, you are using the type of search system provided by Live.com and Yahoo.com.

Source: http://www.sonirodban.com/images/wheel.jpg

Google is a horizontal search engine, but it is also a vertical search engine. If you navigate to Google’s advanced search page, which is accessed by fewer than three percent of Google’s users, you will find links to a number of vertical search engines; for example, the Microsoft collection and the US government collection. Note: Google’s universal search is a bit of marketing swizzle that means Google can take a query and pass it across indexes for discrete collections. The results are pulled together, deduplicated, and relevance ranked. This is a function available from Vivisimo since 2000. Universal search Google style displays maps and images, but it is far from cutting edge technology save for one Google factor–scale.

Why am I writing about vertical search when the topic for me came and went years ago. In fact, at the height of the vertical search frenzy I dismissed the hype. Innovators, unaware of the vertical nature of commercial databases 30 years ago, thought something quite new was at hand. Wrong. Google’s horizontal information dominance forced other companies to find niches where Google was not doing a good job or any job for that matter.

Vertical search flashed on my radar today (July 15, 2008) when I flipped through the wonderful information in my tireless news reader.

Autonomy announced:

that Foundography, a subsidiary of Nexus Business Media Ltd, has selected Autonomy to power vertical search on its website (sic) for IT professionals: foundographytech.com. The site enables business information users to access only the information they want and through Autonomy’s unique conceptual capabilities delivers an ‘already found’ set of results, providing pertinent information users may not have known existed. The site also presents a unique proposition for advertisers, providing conceptually targeted ad selling.

Written by Stephen E. Arnold · Filed Under Database, Feature, Federated search, Google, Microsoft, Search, Vertical search | Comments Off on Vertical Search Resurgent

Requirements for Behind-the-Firewall Search

February 5, 2008

Last fall, I received a request from a client for a “shopping list of requirements for search.” The phrase shopping list threw me. My wife gives me a shopping list and asks me to make sure the tomatoes are the “real Italian kind”. She’s a good cook, but I don’t think she worries about my getting a San Marzano or an American genetically-engineered pomme d’amour.

Equating shopping list with requirements for a behind-the-firewall search / content processing system gave me pause. As I beaver away, gnawing down the tasks remaining for my new study Beyond Search: What to Do When Your Search System Won’t Work”, I had a mini-epiphany; to wit:

Getting the requirements wrong can
undermine a search / content processing system.

In this essay, I want to make some comments about requirements for search and content processing systems. I’m not going to repeat the more detailed discussion in The Enterprise Search Report, 1st, 2nd, and 3rd editions, nor will I recycle the information in Beyond Search. I propose to focus on the tendency of very bright people to see search and content processing requirements like check off items on a house inspection. Then I want to give one example of how a perceptual mismatch on requirements can cause a search and content processing budget to become a multi-year problem. To conclude the essay, I want to offer some candid advice to three constituencies: the customer who licenses a search / content processing solution, the vendor who enters into a deal with a customer, and the consultants who circle like buzzards.

Requirements

To me, a requirement is a clear, specific statement of a function a system should perform; for example, a search system should process the following file types: Lotus Notes, Framemaker, and DB2 tables.

How does one arrive at a requirement and then develop a list of requirements?

Most people develop requirements by combining techniques. Here’s a short list of methods that I have seen used in the last six months:

Ask users of a search or content processing system what they would like the search system to do
Look at information from vendors who seem to offer a solution similar to the one the organization thinks it wants
Ask a consultant, sometimes a specialist in a discipline only tangentially related to search.

The Fly Over

My preferred way of developing requirements is more mundane, takes time, and is resistant to short cuts. The procedure is easy to understand. The number of steps can be expanded when the organization operates in numerous locations around the world, processes content in multiple languages, and has different security procedures in place for different types of work.

But let’s streamline the process and focus on the core steps. When I was younger, I guarded this information closely. I believed knowing the steps was a key ingredient for selling consulting. Now, I have a different view, and I want you to know what I do for the simple reason that you may avoid some mistakes.

First, perform a data gathering sweep. In this step you will be getting a high-level or general view of the organization. Pay particular attention to these key areas. Any one of them can become a search hot spot and burn your budget, schedule, and you with little warning:

Technical infrastructure. This means looking at how the organization handles enterprise applications now, what the hardware platform is, what the work load on the present technical staff is, how the organization uses contractors and outsourcing, what the present software licensing deals stipulate, and the budget. I gather these data by circulating a data collection form electronically or using a variety of telephonic and in-person meetings. I like to see data centers and hardware. I can tell a lot by looking at how the cables are organized and from various log files which I can peruse on site with the customer’s engineer close at hand to explain a number or entry to me. The key point of the exercise is to understand if the organization is able to work within its existing budget and keep the existing systems alive and well.
User behavior. To obtain these data, I use two methods. One component is passive; that is, I walk around and observe. The other component is active; that is, I set up brief, informal meetings where people are using systems and ask them to show me what they now do. If I see something interesting, I ask, “What caused you to take that action?” I write down my observations. Note that I try to get lower-level employees input about needs before I talk to too many big wheels. This is an essential step. Without knowing what employees do, it is impossible to listen accurately to what top managers assert.
Competitive arena. Most organizations don’t know much about what their competitors do. In terms of search, most organizations are willing to provide some basic information. I find that conversations at trade shows are particularly illuminating. But another source of excellent information is search vendors. I admit that I can get executives on the telephone or by email pretty easily, but anyone can do that with some persistence. I ask general questions about what’s happening of interest in law firms or ecommerce companies. I am able to combine that information with data I maintain. From these two sources, I can develop a reasonable sense of what type of system is likely to be needed to keep Company A competitive with Company B.
Management goals. I try to get a sense of what management wants to accomplish with search and content processing. I like to hear from senior management, although most senior managers are out of touch with the actual information procedures and needs of their colleagues. Nevertheless, I endure discussions with the brass to get a broad calibration. Then I use two techniques to get information about the needs. Once these interviews or discussions are scheduled, I use two techniques to get data from mid-level managers. One technique is a Web survey. I use an online questionnaire and make it available to any employee who wishes to participate. I’m not a fan of long surveys. A few pointed questions delivers the freight of meaning I need. More importantly, survey data can be counted and used as objective data about needs. Second, I use various types of discussions. I like one-on-one meetings; I like small-group meetings; and I like big government-style meetings with 30 people sitting around a chunk of wood big enough to make a yacht. The trick is to have a list of questions and the ability to make everyone comment. What’s said is important but how people react to one another can speak volumes and indicate who really has a knack for expressing a key point for his / her co-workers.

I take this information and data, read it, sort it, and analyze it. The result is the intellectual equipment of a bookcase. The supports are the infrastructure. Each of the shelves consists of the key learnings from the high-level look at the organization. I don’t know how much content the organization has. I don’t know the file types. I don’t have a complete inventory of the enterprise applications into which the search and content processing must integrate. What I do know is whom to call or email for the information. So drilling down to get a specific chunk of data is greatly simplified by the high-level process.

Matching

I take these learnings and the specific data such as the list of enterprise systems to support and begin what I call the “matching sequence.” Here’s how I do it. I maintain a spreadsheet with the requirements from my previous search and content processing jobs. Each of these carries a short comment and a code that identifies the requirement by availability, stability, and practicality. For example, many companies want NLP or natural language processing. I code this requirement as Available, Generally Stable, and Impractical. You may disagree with my assessment of NLP, but in my experience few people use it, and it can add enormous complexity to an otherwise straight forward system. In fact, when I hear or identify jargon in the fly-over process, my warning radar lights up. I’m interested in what people need to do a job or to find on point information. I don’t often hear a person in accounting asking to do a query in the form a complete sentence. People want information in the most direct, least complicated way possible. Writing sentences is neither easy nor speedy for many employees working on a deadline.

What I have after working through my list of requirements and the findings from the high level process is three lists of requirements. I keep definitions or mini-specifications in my spread sheet, so I don’t have to write boiler plate for each job. The three lists with brief comments are:

Must-have. These are the requirements that the search or content processing system must meet in order to meet the needs of the organization based on my understanding of the data. A vendor unable to meet a must-have requirement, by definition, is excluded from consideration. Let me illustrate. Years ago, a major search procurement stipulated truncation, technically lemmatization. In plain English, the system had to discard inflections, called rearward truncation. One vendor wrote an email saying, “We will not support truncation.” The vendor was disqualified. When the vendor complained about the disqualification, I showed the vendor the email. Silence fell.
Options. These are requirements that are not mandatory for the deal, but the vendor should be able to demonstrate that these requirements can be implemented if the customers request them. A representative option is support for double-byte languages; e.g., Chinese. The initial deployment does not require double byte, but the vendor should be able to implement double-byte support upon request. A vendor who does not have this capability is on notice that if he / she wins the job, a request for double-byte support may be forthcoming. The wise vendor will make arrangements to support this request. Failure to implement the option may result in a penalty, depending on the specifics of the license agreement.
Nice-to-have. These are the Star Trek or science fiction requirements that shoot through procurements like fat through a well-marbled steak. A typical Star Trek requirement is that the system deliver 99 percent precision and 99 percent recall or deliver automatic translation with 99 percent accuracy. These are well-intentioned requests but impossible with today’s technology and budgets available to organizations. Even with unlimited money and technology, it’s tough to hit these performance levels.

Creating a Requirements Document

I write a short introduction to the requirements, create a table with the requirements and other data, and provide it to the client for review. After a period of time, it’s traditional to bat the draft back and forth, making changes on each volley. At some point, the changes become trivial, and the document is complete. There may be telephone discussions, face-to-face meetings, or more exotic types of interaction. I’ve participated in a requirements wiki, and I found the experience thrilling for the 20 – somethings at the bank and enervating for me. That’s what 40 years age difference yields — an adrenaline rush for the youngster and a dopamine burst for the geriatrics.

There are different conventions for a requirements document. The US Federal government calls a requirements document “a statement of work”. There are standard disclaimers, required headings for security, an explanation of what the purpose of the system is, the requirements, scoring, and a mind-numbing array of annexes.

For commercial organizations, the requirements document can be an email with the following information:

Brief description of the organization and what the goal is
The requirements, a definition, the metrics for performance or a technical specification for the item, and an optional comment
What the vendor should do with the information; that is, do a dog-and-pony show, set up an online demonstration, make a sales call, etc.
Whom to call for questions.

Whether you prefer the bureaucratic route or a Roman road builder method, you now have your requirements in hand.

Then What?

That’s is a good question. In go-go organizations, the requirements document is the guts of a request for a proposal. Managing an RFP process is a topic for another post. In government entities, the RFP may be preceded by an RFI or Request for Information. When the vendors provide information, a cross-matching of the RFI information with the requirements document (SOW) may be initiated. The bureaucratic process may take so long that the fiscal year ends, funding lost, and the project is killed. Government work is rewarding in its own way.

Whether you use the requirements to procure a search system or whether you put the project on hold, you have a reasonably accurate representation of what a search / content processing system should deliver.

The fly-over provides the framework. The follow up questions deliver detail and metrics. The requirements emerge from the analysis of these information and data. The requirements are segmented into three groups, with the wild and crazy requirements relegated to the “nice to have” category. The customer can talk about these, but no vendor has to be saddled with delivering something from the future today. The requirements document can be the basis of a procurement.

There are some pitfalls in the process I have described. Let me highlight three:

First, this procedure takes time, expertise, and patience. Most organizations lack adequate amounts of each ingredient. As a result, requirements are off kilter, so the search system can list or sink. How can a licensee blame the vendor when the requirements are wacky.

Second, the analysis of the data and information is a combination of analytic and synthetic investigation. Most organizations prefer to use their existing knowledge and gut instinct. While these may be outstanding resources, in my experience, the person who relies on these techniques is guessing. In today’s business climate, guessing is not just risky. It can severely damage an organization. Think about a well-known pharmaceutical company pushing a drug to trial despite it being known to show negative side effects in the company’s own prior research. That’s one consequence of a lousy behind-the-firewall search / content processing system.

Third, requirements are technical specifications. Today, people involved in search want to talk about the user interface. The user interface manifests what is in the system’s index. The focus, therefore, should not be on the Web 2.0 color and features of the interface. The focus must be kept squarely on the engineering specifications for the system.

You can embellish my procedure. You can jiggle the sequence. You may be able to snip out a step or a sub-process. But if you jump over the hard stuff in the requirements game, you will deploy a lousy system, create headaches for your vendor, annoy, even anger, your users, and maybe lose your job. So, get the requirements right. Search is tough enough without starting off on the wrong foot.

Stephen Arnold, February 6, 2008

Written by Stephen E. Arnold · Filed Under Library automation, Vertical search | 1 Comment

Vertical Search: A Chill Blast from the Past

January 15, 2008

Two years ago, a prestigious New York investment banker asked me to attend a meeting without compensation. I knew my father was correct when he said, “Be a banker. That’s where the money is.” My father didn’t know Willie Sutton, but he has money insight. The day I arrived the bankers’ topic was “vertical search,” the next big money maker in search, according to the vice president who escorted me into a conference room overlooking the East River.

As I understood the notion from these financial engineers, certain parties (translation: publishers) had a goldmine of content (translation: high-value information created by staff writers and freelancers). The question asked was: “Isn’t a revenue play possible using search-and-retrieval technology and a subscription model?”

There’s only one answer that New York bankers want to hear, and that is, “I think there is an opportunity for an upside.” I repeated the catch phrase, and the five money mavens smiled. I was a good Kentucky consultant, and I had on shoes too.

My recollection is that everyone in the Park Avenue meeting room was well-groomed, scrupulously polite, and gracefully clueless about online. The folks asking me to stop by for a chat listened to me for about 60 seconds and then fired questions at me about Web 2.0 technology (which I don’t fully grasp), online stickiness (which means repeat visitors and time spent on a Web site), and online revenue growth (which I definitely understand after getting whipsawed with costs in 1993 when I was involved with The Point (Top 5% of the Internet). Note: we sold this site to Lycos in 1995, and I vowed not to catch spreadsheet fever again. Spreadsheet fever is particularly contagious in the offices of New York banks.

This morning — Tuesday, January 15, 2008 — I read a news story about Convera’s vertical search solution. The article explained that Lloyd’s List , a portal reporting the doings in the shipping industry, was going online with a “vertical search solution.”

The idea, as I understand it, is that a new online service called Maritime Answers will become available in the future. Convera Corporation, a one-time big dog in the search-and-retrieval sled races, would use its “technical expertise to provide a powerful search tool for the shipping community.” (Note: in this essay I am not discussing the sale of Convera’s search-and-retrieval business to Fast Search & Transfer or the capturing by Autonomy of some of Convera’s key sales professionals in 2007.)

Vertical Search Defined

In my first edition of The Enterprise Search Report, I included a section about vertical search. I cut out that material in 2003 because the idea seemed outside the scope of “behind the firewall” search. In the last five years, the notion of vertical search has continued to pop up as a way to serve the needs of a specific segment or constituency in a broader market.

Vertical search means limiting the content to a specific domain. Examples include information for attorneys. Companies in the vertical search business for lawyers include Lexis Nexis (a unit of Reed Elsevier) and Westlaw (a service absorbed into the the Thomson Corporation). A person with an interest in a specific topic, therefore, would turn to an online system with substantial information about a particular field. Examples range from the U.S. government’s health information available as Medline Plus to Game Trade Magazine with tens of thousands of other examples. One could make a good case that Web logs on a specific topic and a search box are vertical search systems.

The idea is appealing because if one looks for information on a narrow topic, a search system with information only on that topic, in theory, makes it easier to find the nugget or answer the user seeks — at least to someone who doesn’t know much about the vagaries of online information. I will return to this idea in a moment.

Commercial Databases: The Origin Vertical Search

Most readers of this Web log will have little experience with using commercial databases. The big online vendors have found themselves under siege by the Web and their own actions.

In the late 1960s when the commercial online business began with an injection of U.S. government funding, the only kind of database possible was one that was very narrow. The commercial online services offered specific collections of information on very narrow topics or information confined to a specific technical discipline. By 1980, there were some general business databases available, but these were narrowly constrained by editorial policies.

In order to make the early search-and-retrieval systems useful, database publishers (the name given to the people and companies who built databases) had to create fields or what today would be called “fields” or “XML document type definitions.” The database builders would pay indexers to put the name of the author, the title of the source, the key words from a controlled term list, and other data (now called metadata) into these fields.

The user would in 1980 pay a fee to get an account with an online vendor. Leaders a quarter century ago, mean very little to most online users today. The Googles and Microsofts of 1980 were Dialog Corporation, BRS, SDC, and a handful of others such as DataStar.

Every database or “file” on these systems was a vertical database. Users of these commercial systems would have to learn the editorial policy of a particular database; for example, ABI / INFORM or PROMT. When Dialog was king, the service offered more than 300 commercial databases, and most users picked a particular file and entered queries using a proprietary syntax. For example, to locate marketing information from the most recent update to the ABI / INFORM database one would enter into the Dialog command line: SS UD=9999 and CC=76?? and marketing. If a user wanted chemical information, the Chemical Abstracts service required the user to know the specific names and structures of chemicals.

Characteristics of These Original Vertical Databases

A peculiar characteristic of a collection of information on a topic or in a field is not understood by most users or investment bankers. The more narrow the content collection, the greater the need for a specialized vocabulary. Let me give an example. In the ABI / INFORM file it was pointless to search for the concept via the word “management.” The entire database was “about” management. Therefore, a careless query would, in theory, return a large number of hits. We, therefore, made “management” a stop word; that is, one that would not return results. We forced users to access the content via a controlled vocabulary, complete with Use For and See Also cross references. We created a business-centric classification coding scheme so a user could retrieve the marketing information using the command CC=76??.

Another attribute of vertical content or deep information on a narrow subject is that the terminology shifts. When a new development occurs in oil and gas, the American Petroleum Institute had to identify this term and take steps to map the new idea to content “about” that new subject. Let me give an example from a less specialized field than oil exploration. You know about an acquisition. The term means one company buys another. In business, however, the word takeover may be used to describe this action. In financial circles, there will be leveraged buyouts, a venture capital buyout, or a management buyout. In short, the words used to describe an acquisition evidence the power of English and the difficulty of creating a controlled vocabulary for certain fields. The paradox is that the deeper the content in detail and through time, the more complicated the jargon becomes. A failure to search for the appropriate terms means that information on the topic is not retrieved. In the search systems of yore, the string required to get the information from ABI / INFORM on acquisitions would require an explicit query with all of the terms present.

Vertical Search 2008

Convera is a company that has faced some interesting and challenging experiences. The company’s core technology was rooted in scanning paper documents, converting these documents to ASCII via optical character recognition, and then making the documents searchable via an interface. The company acquired for $33 million in 1995 ConQuest Software, developed by a former colleague of mine at Booz, Allen & Hamilton. Convera also acquired Semio’s Claude Vogel in 2002, a rocket scientist who has since left Convera. Convera from Allen & Co., a New York firm, and embarked on a journey to reinvent itself. This is an intriguing case example, and I may write about it in the future.

The name “Convera” was adopted in 2000 when Excalibur Technologies landed a deal with Intel. After the Intel deal went south about the same time a Convera deal with the NBA ran aground, the Convera name stuck. Convera in the last eight years has worked to reduce its debt, find new sources of revenue, and finally divested itself of its search-and-retrieval business, emerging as a provider of vertical search. I have not done justice to a particularly interesting case study in the hurdles companies face when those firms try to make money without a Google-type business model.

Now Convera is in the vertical search business. It uses its content acquisition technology or crawlers and parsers to build indexes. Convera has word lists for specific markets such as law enforcement and heath as well as technology that automatically indexes, classifies, and tags processed content. The company also has server farms that can provide hosted or managed search services to its customers.

Instead of competing with Google in the public Web indexing space, Convera’s business model, as I understand it, approaches a client who wants to build a vertical content collection. Convera then indexes the content of certain Web sites and any content the customer such as a publisher has. The customer pays Convera for its services. The customer either gives away access to the content collection or charges the customer a fee to access the content.

In short, Convera is in the vertical search business. The idea is that Convera’s stakeholders get money by selling services, not licensing a search-and-retrieval engine to an enterprise. Convera’s interesting history makes clear that enterprise software and joint ventures such as those with Intel can lose big money, more than $600 million give or take a couple hundred million. Obviously Convera’s original business model lacked the lift its management teams projected.

The Value of Vertical Search

The value of vertical search depends upon several factors that have nothing to do with technology. The first factor is the desire of a customer such as a publisher like Lloyd’s List to find a new way to generate growth and zip from a long-in-the-tooth information service. Publishers are in a tough spot. Most are not very good at technical foresight. More problematic, the online options can cannibalize their existing revenues. As a business segment, traditional publishing is a hostile place for 17th-century business models.

Another factor is the skill of the marketers and sales professionals. Never underestimate the value of a smooth talking peddler. Big deals can be done on the basis of charm and a dollop of FUD, fear-uncertainty-doubt.

A third element is the environmental pressures that come from companies and innovators largely indifferent to established businesses. One example is the Google-Microsoft-Yahoo activity. Each of these companies is offering online access to information mostly without direct fees to the user. The advertisers foot the bill. All three are digitizing books, indexing Web logs or social media, and working with certain third parties to offer certain information. Even Amazon is in the game with its Kindle device, online service, and courtesy fee for certain online Web log content. Executives at these companies know about the problems publishers face, but there’s not much executives at these companies can do to alter the tectonic shift underway in information access. I know I wouldn’t start a traditional magazine or newspaper even though for decades I was an executive in newspaper and publishing companies like the Courier Journal & Louisville Times and Ziff Communications.

Vertical Search: Google Style

You can create your own vertical search system now. You don’t have to pay Convera’s wizards for this service. In fact, you don’t have to know how to program or do much more than activate your browser. Google will allow anyone to create a custom search engine, which is that company’s buzzword for vertical search system. Navigate to Google’s CSE page and explore. If you want to see the service in action, navigate to Macworld’s beta.

We’ve come full circle in a sense. The original online market was only vertical search; that is, very specific collections of content on a particular topic or discipline. Then we shifted to indexing the world of information. Now, the Google system allows anyone to create a very narrow domain of content.

What’s this mean? First, I am not sure the Convera for-fee approach will be a financially rewarding as the company’s stakeholders expect. Free is tough to beat. For a publisher wanting to index proprietary content, Google will license a Google Search Appliance . With the OneBox API, it is possible to integrate the Google CSE with the content processed by the GSA. Few people recognize that Google’s approach allows a technically savvy person or one who is Googley to replicate most of the functionality on offer from the hundreds of companies competing in the “beyond search” markets.

Second, a narrow collection built on spidering a subset of Web sites, by definition, will face some cost hurdles. As costs rise, companies providing custom subsets by direct spidering and content processing will face rising costs. These costs will be controllable by cutting back on the volume of content spidered and processed. Alternatively, the quality of service or technical innovations will have to be scaled to match available resources. Either way, Google, Microsoft, and Yahoo may control the fate of the vertical search vendors.

Finally, the enthusiasm for vertical search may be predicated on misunderstanding available information. There is a big market for vertical search in law enforcement, intelligence, and pharmaceutical competitive intelligence. There may be a market in other sectors, but with a free service like Google’s getting better with each upgrade to the Google service array, I think secondary and tertiary markets may go with the lower-cost alternative.

Stakeholders in Convera don’t know the outcome of Convera’s vertical search play. One thing is certain. New York bankers are mercurial, and their good humor can disappear with a single disappointing earnings report. I will stick with the motto, “Surf on Google” and leave certain types of search investments to those far smarter than I.

Stephen E. Arnold
January 15, 2008, 10 am

Written by Stephen E. Arnold · Filed Under Vertical search | 2 Comments

« Previous Page

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Google and ProQuest

A Vertical Search Engine Narrows to a Niche

Vertical Search Resurgent

Requirements for Behind-the-Firewall Search