The Economics of Dealing with Complex Information

May 24, 2008

Microsoft announced via its Live Search blog that its Live Search Books and Live Search Academic are “taken down”. Google’s book digitization and journal project caused concern to the commercial database vendors. Google, with its generous cash flow and avowed goal of indexing “all the world’s information” seemed to sign the death warrants of such companies as Dialog, Ebsco, and ProQuest, among others. A flap of the wings to Techmeme for its related links.

The economics of doing anything significant with complex information are not taught in the ivory towers at Harvard, Stanford, and Yale. Google–indifferent to the brutal economics that hobble commercial database publishers–has the cash to figure out how to use software to do tasks usually done by humans. For example, Google has figured out how to scan a book, have software determine what should be converted to ASCII, and generating a reasonably clean, searchable text file. The page images are mostly linked to the correspond text references. Not so for most database producers. These decisions still require humans, often working in exotic locations where labor is less expensive than in Ann Arbor, Boston, and Denver.

Google also has figured out how to take content, apply structure to it, create a variety of additional index terms (metadata), and convert the whole shebang into easily manipulated numerical representations. Not so with the mainstream commercial database publishers. Tagging, cross referencing, and content clean up still takes expensive humans.

Manipulating the information in books and journals is for commercial database producers very expensive. Many costs are difficult to reduce. Google, on the other hand, has invested over the last decade to find software solutions to these intractable cost problems. Fortunately for the commercial database publishers, Google so far has been content to process books and journals. Google finds access to weighty tomes useful for a variety of purposes. I haven’t heard that these motive forces are related to revenue. Google appears to be casual about the cost of its books and journals project. If you aren’t familiar with Google Books, navigate to http://books.google.com. For Google Scholar, go to http://scholar.google.com.

Enter Microsoft. The company jumped to index books and journals. Now it is climbing out of the swamp of costs. Unlike Google, Microsoft faces–maybe for the first time in the company’s history–a need to focus its technical and financial resources. Google keeps on scanning and indexing documents about hyperbolic geometry. Microsoft can’t and no longer will.

For me the most telling statement in the announcement is:

Given the evolution of the Web and our strategy, we believe the next generation of search is about the development of an underlying, sustainable business model for the search engine, consumer, and content partner. For example, this past Wednesday we announced our strategy to focus on verticals with high commercial intent, such as travel, and offer users cash back on their purchases from our advertisers. With Live Search Books and Live Search Academic, we digitized 750,000 books and indexed 80 million journal articles. Based on our experience, we foresee that the best way for a search engine to make book content available will be by crawling content repositories created by book publishers and libraries. With our investments, the technology to create these repositories is now available at lower costs for those with the commercial interest or public mandate to digitize book content. We will continue to track the evolution of the industry and evaluate future opportunities.

Here’s how I read this. First, the reference to next-generation search is about making money with a business model. In short, next-generation search is not about moving beyond traditional metadata, pushing into data management, and creating new types of user experiences. Search at Microsoft means money.

Second, Microsoft wants to index what’s available. That’s certainly less costly than fiddling with the train schedules that Google has indexed at Oxford University. In my experience, indexing what is already available begs for applications that moves beyond what I can do at my local library or with a search engine such as Exalead.com or metasearch system such as Vivisimo’s Clusty.com.

Third, the notion of tracking and looking for future opportunities does not convince me that Microsoft knows what it will do tomorrow. And whatever the company does, by definition, will be reactive.

Microsoft’s termination of this service means that the status quo in the commercial database world will be subject to pressure from Google. More troubling is that Google’s technical papers and its patent documents reveal that the company is moving beyond key word search at an increasing pace. I think that it is significant that Microsoft is husbanding its resources. Now I want to read in a Microsoft Web log about an innovation path that will permit the company to leap frog over Google. Send me a link to this information, and you will receive a gentle quack.

Stephen Arnold, May 24, 2008

Written by Stephen E. Arnold · Filed Under Cost, Feature, Google, Microsoft, Search

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.