Trapped by a Business Model, Not Technology

February 12, 2008

The headline “Reuters CEO sees Semantic Web in its Future” triggered an immediate mouse click. The story appear on O’Reilly’s highly regarded Radar Web log.

Tim O’Reilly, who wrote the article, noted: “Adding metadata to make that job of analysis easier for those building additional value on top of your product is a really interesting way to view the publishing opportunity.”

Mr. O’Reilly noted that: “I don’t think he [Devin Wenig, a Reuters executive] should discount the statistical, computer-aided curation that has proven so powerful on the consumer Internet.”

Hassles I’ve Encountered

Reuters comment about the Semantic Web did underscore the often poor indexing done by publishing and broadcasting companies. In my experience, I have had to pay for content that was in need of considerable post-processing and massaging.

For example, if you license a news feed from one of the commercial vendors, some of the feeds will:

Send multiple versions of the stories “down the wire”, often with tags that make it difficult to determine which is more accurate version. Scripts can delete previous versions, but errors can occur, and when noticed, some have to be corrected by manual inspection of the feed data.

Deliver duplicate versions of the same story because the news feed aggregator does not de-duplicate variants of the story from different sources. Some systems handle de-duplication gracefully and efficiently. Examples that come to mind are Google News and Vivisimo. Yahoo’s approach with tabs to different news services is workable as well, but it is not “news at a glance”. Yahoo imposes additional clicking on me.

Insert NewsXML plus additional tags without alerting downstream subscribers. When this happens, the scripts can crash or skip certain content. The news feed services try to notify subscribers about changes, but in my experience there are many “slips betwixt cup and lip.”

Now the traditional powerhouses in the news business face formidable competition on multiple fronts. There are Web logs. There are government “news” services, including the remarkably productive US Department of State, largely unknown Federal News Service , and the often useful Government Printing Office listserv. There are news services operated by trade associations. These range from the American Dental Association to
the Welding Technology Institute of Australia. Most of these organizations are now Internet savvy. Many use Web logs, ping servers, and RSS (really simple syndication) to get information to constituents, users, and news robots. Podcasts are just another medium for grass roots publishers to use at low or without cost.

We are awash in news — text, audio, and video.

Balancing Three Balls

Traditional publishers and broadcasters, therefore, are trying to accomplish three goals at the same time. I recall from a lecture that the legendary president of General Motors, Alfred P. Sloan (1875 – 1966) is alleged to have said: “Two objectives is no objective.” Nevertheless, publishers like Reuters and its soon-to-be owner are trying to balance three balls on top of one another:

First, maintain existing revenues in the face of the competition from governments, associations, individual Web log operators, and ad-supported or free Internet services.

Second, create new products and services that generate new revenue. The new revenue must not cannibalize any traditional revenue.

Third, give the impression of being “with it” and on the cutting edge of innovation. This is more difficult than it seems, and it leads to some executives’ talking about an innovation that is no longer news. Could I interpret the Reuters’ comment as an example of faux hipness?

Publishers can indeed leverage the Semantic Web. There’s a published standard. Commerical systems are widely available to perform content transformation and metatagging; for example, in Beyond Search I profile two dozen companies offering different bundles of the needed technology. Some of these are known (IBM, Microsoft); others are less well known (Bitext, Thetus). And as pre-historic as it may seem to some publishing and broadcast executives, even skilled humans are available to perform some tasks. As good as today’s semantic systems are, humans are sometimes need to do the knowledge work required to make content more easily sliced and diced, post-processed and “understood”.

It’s Not a Technology Problem

The fact is that traditional publishers and broadcasters have been slow to grasp that their challenge is their business model, not technology. No publisher has to be “with it” or be able to exchange tech-geek chatter with a Google, Microsoft, or Yahoo wizard.

Nope.

What’s needed is a hard look at the business models in use at most of the traditional publishing companies, including Reuters and the other companies who have their reports in professional publishing, trade publishing, newspaper publishing, and magazine publishing. While I’m making a list I want to include radio, television, and cable broadcasting companies as well.

These organizations have engineers who know what the emerging technologies are. There may be some experiments that are underway and yielding useful insights into how traditional publishing companies can generate new revenues.

The problem is that the old business models generate predictable revenue. Even if that revenue is softening or declining, most publishing executives understand the physics of their traditional business model. Newspapers sell advertising. Advertisers pay to reach the readers. Readers pay a subscription to get the newspaper with the ads and a “news hole”. Magazine publishers either rely on controlled circulation to sell ads or a variant of the newspaper model. Radio and other broadcast outlets sell outlets to advertisers.

These business models are deeply ingrained, have many bells and whistles, and deliver revenue reasonably well in today’s market. The problem is that the revenue efficiency in many publishing sectors is softening.

Now the publishers want to generate new revenues while preserving their traditional business models, and the executives don’t want to cannibalize existing revenues. Predictably, the cycle repeats itself. How hard is it to break the business model handcuffs of traditional publishing. Rupert Murdock has pulled in his horns at the Wall Street Journal. Not even he can get free of the business model shackles that are confining once powerful organizations and making them sitting ducks for competitive predators.

Semantic Web — okay. I agree it’s hot. I am just finishing a 250-page look at some of the companies doing semantics now. A handful of these companies are almost a decade old. Some, like IBM, were around when Albert Einstein was wandering around Princeton in his house slippers.

I hope Reuters “goes semantic”. With the core business embedded in numeric data, I think the “semantic” push will be more useful when Reuters’ customers have the systems and methods in place to make use of richer metatagging. The Thomson Corporation has been working for a decade or more to make its content “smarter”; that is, better indexing, automated repurposing of content, and making it possible for a person in one of Thomson’s more than 100 units to find out what another person in another unit wrote about the same topic. Other publishers are genuinely confused and unstandably uncertain about the Internet as an application platform. Buggy whip manufacturers could not make the shift to automotive seat covers more than a 100 years ago. Publishers and broadcasters face the same challenge.

Semantic technology may well be more useful inside a major publishing or broadcasting company initially. In my experience, most of these operations have data in different formats, systems, and data models. It will be tough to go “semantic” until the existing data can be normalized and then refreshed in near real time. Long updates are not acceptable in the news business. Wait too long, and you end up with a historical archive.

Wrap Up

To conclude, I think that new services such as The Issue, the integration of local results into Google News, and wide range of tools that allow anyone to create a personalized news feed are going to make life very, very difficult for traditional publishers. Furthermore, most traditional publishing and broadcast companies have yet to understand the differences between TV and cable programming and what I call “YouTube” programming.

Until publishing finds a way to get free of its business model “prison”, technology — trendy or not — will not be able to work revenue miracles.

Update February 13, 2008, 8 34 am Eastern — Useful case example about traditional publishing and new media. The key point is that the local newspaper is watching the upstart without knowing how to respond. Source: Howard Downs.

Stephen Arnold, February 12, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Online (general) | 4 Comments

Social Search: No Panacea

February 11, 2008

I wrote a chapter for the forthcoming book of essays, Collective Intelligence. Information about the volume is at the Oss.net Web site. If you don’t see a direct link to the study, check back. The book is just in its final run up to publication.

I’m thinking about my chapter “Search Panacea or Ploy: Can Collective Intelligence Improve Findability?” As we worked on the index for my contribution, we talked about the notion of social search. Wikipedia, as you might have suspected, has a substantial entry about social search. A search for the phrase “social search” on any of the Web search engines returns thousands of entries. As of February 11, 2008, here are Yahoo’s.

Few will doubt that the notion of social search — with humans providing metatags about information — is a hot trend in search.

I can’t recycle the arguments presented in my contribution to Collective Intelligence. I can, however, ask several questions about social search to which I think more research effort should be applied:

Gaming the System

In a social setting, most people will play by the rules. A small percentage of those people will find ways to “game” or manipulate the system to suit their purposes. Online social systems are subject to manipulation. Digg.com and Reddit.com have become targets of people and their scripts. The question is, “How can a user trust the information on a social system?” This is a key issue for me. Several years ago I gave a talk at a Kroll (Marsh McLennan) officer’s meeting where the audience was keenly interested in ways to determine the reputation of people and the validity of their actions in a social online system.

Most Lurk, Two Percent Contribute

My work in social search last year revealed a surprising — to me at least — piece of data. Take a social search site with 100 users. Only two people contribute on a regular basis. I think more research is needed to understand how active individuals can shape the information available. The question is, “What is the likelihood that active participants will present information that is distorted or skewed inadvertently?” The problem is that in an online space where there is no or a lax editorial policy, distortion may be “baked into” the system. Naive users can visit a site in search of objective results, and the information, by definition, is not objective.

Locked in a Search Box

Some of the social search systems offer tag clouds or a graphic display of topics. The Mahalo.com site makes it easy for a user to get a sense of the topics covered. Click on the image below, and you will readily see that Mahalo is a consumer centric system, almost an updated version of Yahoo’s original directory:

The question is, “What else is available in this system?” Most of the social search sites pose challenges to users. There’s no index to the content, and no easy way to know when the information was updated. I’ve had this issue with About.com for years. The notion of scope and currency nag at me, and the search box requires that I guess the secret combination of words before I can dig deeply into the information available.

In my contribution to Collective Intelligence, I cover a number of more complex issues. For example, Google is — at its core — a social search system. The notion of links and clicks are artifacts of human action and attention. By considering these, Google has its pulse on its users’ behavior. I think this aspect of Google’s system has be long understood, but Google’s potential in the social search space has not been viewed in some of the social buzz.

Stephen Arnold, February 11, 2008

Written by Stephen E. Arnold · Filed Under Search, Social | 1 Comment

Google and Obfuscated JavaScript

February 10, 2008

Sunday mornings are generally calm in rural Kentucky. There’s the normal pop of gun fire as my neighbors hunt squirrels for burgoo, and there is the routine salvo of news stories about Google.

I zipped through the “Google to Invest in CNet” but paused on the Google’s “obfuscated JavaScript” story here. A number of Web log and Web sites are running the news item. Google Blogoscoped’s story ran on Friday, February 8, 2008, “Why Does Google Obfuscate Their [sic] Code?” Philipp Lenssen does a good job, and this post contains a number of intriguing comments from his readers. These folks speculate on Google’s compressing JavaScript to save bandwidth; others hint that Google is intentionally creating hard to read code. A possible code siting is here.

Speculating about Google whys and wherefores is fun but semi-helpful. My hit-and-miss dealings with the company reveal “controlled chaos.” The way to get a look at what Google does is to dig through their technical papers (do it daily because the list can change) and read some of the company’s patent applications, patents, and if you are in Washington, DC, the “wrappers” available to some savvy researchers.

Some hints of the JavaScript mystery appear in this document: “Method and System for Dyanamically Composing Distributed Interactive Applications from High-Level Programming Languages”, US20080022267. The invention was filled in April 2005 anad was published on January 24, 2008. When an application is published, Google often has examples of the document’s “invention” running or even visible for the intrepid investigator. Three years is a long time for gestation at the Google. My hunch is the JavaScript is produced by the Googleplex’s auto-programming techniques, possibly the one disclosed in US20080022267.

I’m no attorney, and patents are difficult to analyze even for the experts. Read the document. You may find that the odd ball JavaScript is a way to eliminate manual drudgery for Googlers. US20080022267 may shed some light on what Google may do to spit out JavaScript for browsers, for instance. What do you think? I am toying with the idea that Google does automatic JavaScript to improve efficiency and eliminate some grunt work for its wizards.

You can obtain US20080022267 here. If you haven’t used the USPTO’s search and retrieval system, check out the sample queries. The system is sluggish, so you can try Google’s own patent service here. I’ve found that Google’s service is okay, but it’s better to go to the USPTO site, particularly for recently issued documents.

I want to conclude by saying, “I don’t think conspiracy theories are the way to think about Google.” Google’s big. It is innovative. It is — to use Google’s own term — chaotic. I think Google operates like a math club on steroids, based on my limited experience with the company.

I’m inclined to stick with the simplest explanation which seems clearly set forth in US20080022267. “Controlled chaos” is a way to disrupt monoliths, but it doesn’t lend itself to highly-targeted, human-motivated fiddling with JavaScript. Not even Google has that many spare synapse cycles.

Stephen Arnold, February 10, 2008

Written by Stephen E. Arnold · Filed Under Google | Comments Off on Google and Obfuscated JavaScript

Is the Death Knell for SEO Going to Sound?

February 9, 2008

Not long ago, a small company wondered why its Web site was the Avis to its competitor’s Hertz. The company’s president checked Google each day, running a query to find out if the rankings had changed.

I had an opportunity to talk with several of the people at this small company. The firm’s sales did not come from the Web site. Referrals had become the most important source of new business. The Web site was — in a sense — ego-ware.

I shared some basic information about Google’s Web master guidelines, a site map, and error-free code. These suggestions were met with what I would describe as “grim acceptance.” The mechanics of getting a Web site squared away was work but not unwelcome. Mycomments articulated what the Web team already knew.

The second part of the meeting focused on the “real” subject. The Web team wanted the Web site to be number one. I thanked the Web team and said, “I will send you the names of some experts who can assist you.” SEO work is not my cup of tea.
Then, yesterday, as Yogi Berra allegedly said, “It was dÃ©jÃ vu all over again.” Another local company found my name and arranged a meeting. Same script, different actors.

“We need to improve our Google ranking,” the Web master said. I probed and learned that the company’s business came within a 25 mile radius of the company’s office. Google and other search engines listed the firm’s Web site deep in the results lists.

I replayed the MP3 in my head about clean code, sitemaps, etc. I politely told the local Web team that I would email them the names of some SEO experts. SEO is definitely an issue. Is the worsening economy the reason?
Here’s a summary of my thinking about these two opportunities for me to bill some time, make some money:

Firms want to be number one of Google and somehow have concluded that SEO tactics can do the trick.
There is little resistance to mechanical fixes, but there is little enthusiasm for adding substantive content to a Web site
In the last year, interest in getting a Web site to the top of Live.com or Yahoo.com has declined, based on my observations.

Content, the backbone of a Web site, is important to site visitors. When I do a Web search, I want links to sites that have information germane to my query. Term stuffing, ripped off content, and other “tricks” don”t endear certain sites to me.

I went in search of sources and inspiration for ranking short cuts. Let me share with you some of my more interesting findings:

Getting Number One on Google with Frames! Note: Google uses frames. If you use frames, you can slither down the Google relevance pole, not up.
Increase Your Google hits. Get Top Rankings. Can You Become Number One? In theory, any Web site can be number one, but I’m not sure a “search phrase” will do the job.
How To Get To Number 1 In Google – SEO. Yes, it’s just like making a taco, the most popular how to topic among 9th graders.

You get the idea. There are some amazing assertions about getting a particular Web site to the top of the Google results list. Several observations may not be warranted, but here goes:

First, writing, even planning, high-impact, useful content is difficult. I’m not sure if it is a desire for a short cut, a lack of confidence, laziness, or inadequate training. There’s a content block in some organizations, so SEO is the way to solve the problem.

Second, Web sites can fulfill any need its owner may have. The problem is that certain types of businesses will have a heck of a time appearing at the top of a results list for a general topic. Successful, confident people expect a Web indexing system to fall prey to their charms as their clients do. Chasing a “number one on Google” can be expensive and a waste of time. There are many “experts” eager to help make a Web site number one. But I don’t think the results will be worth the cost.

Third, there are several stress points in Web indexing. The emergence of dynamic sites that basic crawlers cannot index is a growing trend. Some organizations may not be aware that their content management system (CMS) generates pages that are difficult, if not impossible, for a Web spider to copy and crunch Google’s programmable search engine is one response, and it has the potential to alter the relevance landscape if Google deploys the technology. The gold mine that SEO mavens have discovered guarantees that baloney sites will continue to plague me. Ads are sufficiently annoying. Now more and more sites in my results list are essentially valueless in terms of substantive content.

The editorial policy for most of the Web sites I visit is non-existent. The Web master wants a high ranking. The staff is eager to do mechanical fixes. Recycling content is easier than creating solid information.

The quick road to a high ranking falls off a cliff when a search system begins to slice and dice content, assigns “quality” scores to the informaton, and builds high-impact content pages. Doubt me. Take a look at this Google patent application, US20070198481 and let me know what you think.

Stephen Arnold, February 9, 2008

Written by Stephen E. Arnold · Filed Under Online (general) | Comments Off on Is the Death Knell for SEO Going to Sound?

Taxonomy: Search’s Hula-Hoop®

February 8, 2008

I received several thoughtful comments on my Beyond Search Web log from well-known search and content processing experts (not the search engine optimization type or the MBA analyst species). These comments addressed the topic of taxonomies. One senior manager at a leading search and content processing firm referenced David Weinberger’s quite good book, Everything is Miscellaneous. My copy has gone missing, so join me in ordering a new one from Amazon. Taxonomy and taxonomies have attained fad status in behind-the-firewall search and content processing. Every vendor has to support taxonomies. Every licensee wants to “have” a taxonomy.

This is a screen shot of the Oracle Pressroom. Notice that a “taxonomy” is used to present information by category. The center panel presents hot links by topics with the number of documents shown for each category. The outside column features a tag cloud.

A “taxonomy” is a classification of things. Let me narrow my focus to behind-the firewall content processing. In an organization, a taxonomy provides a conceptual framework that can be used to the organization’s information. Synonyms for taxonomy include classification, categorization, ontology, typing, and grouping. Each of these terms can be used with broader or narrower meanings, but for my purpose, we will assume each can be used interchangeably. Most vendors and consultants toss these terms around as interchangeable Lego blocks in my experience.

A fad, as you know, is an interest that is followed for some period of time with intense enthusiasm. Think Elvis, bell bottoms, and speaking Starbuck’s coffee language.

A Small Acorn

A few years ago, a consultant approached me to write about indexing content inside an organization. This individual had embarked on a consulting career and needed information for her Web site. I dipped into my files, collected some useful information about the challenges corporate jargon presented, and added some definitions of search-related terms.

I did work for hire, so my client could reuse the information to suit specific needs. Imagine my pleasant surprise when I found my information recycled multiple times and used to justify a custom taxonomy for an enterprise. I was pleased to have become a catalyst for a boom in taxonomy seminars, newsgroups, and consulting businesses. One remarkable irony was that a person who had recycled the information I sold to consultant A thousands of miles away turned up as consultant B at a company in which I was an investor. I sat in a meeting and heard my own information delivered back to me as a way to orient me about classifying an organization’s information.

Big Oak

A taxonomy revolution had taken place, and I was only partially aware. A new industry had taken root, flowered, and spread like kudzu around me.

The interest in taxonomies continues to grow. After completing the descriptions of companies offering what I call rich content processing, organizations looking for taxonomy-centric systems have many choices. Of the 24 companies profiled in the Beyond Search study, all 24 “do” taxonomies. Obviously there are greater and lesser degrees of stringency. One company has a system that supports American National Standards Institute guidelines for controlled terms and taxonomies. Other companies “discover” categories on the fly. Between these two extremes there are numerous variations. One conclusion I drew after this exhausting analysis is that it is difficult to locate a system that can’t “do” taxonomies.

What’s Behind the Fad?

Let me consider briefly a question that I don’t tackle in Beyond Search: “Why the white-hot interest in taxonomies?”

Taxonomies have a long and distinguished history in library science, philosophy, and epistemology. For those of you who are a bit rusty, “epistemology” is the theory of knowledge. Taxonomies require a grasp, no matter how weak, on knowledge. No matter how clever, a person creating a taxonomy must figure out how to organize email, proposals, legal documents, and the other effluvia of organizational existence.

I think people have enough experience with key word search to realize its strengths and limitations. Key words — either controlled terms or free text — work wonderfully when I know what’s in an electronic collection, and I know the jargon or “secret words” to use to get the information I need.

Boolean logic (implicit or explicit) is not too useful when one is trying to find information in a typical corpus today. There’s no editorial policy at work. Anything the indexing subsystem is fed is tossed into an inverted index. This is the “miscellaneous” in David Weinberger’s book.

A taxonomy becomes a way to index content so the user can look at a series of headings and subheadings. A series of headings and sub-headings makes it possible to see the forest, not the trees. Clever systems can take the category tags and marry them to a graphical interface. With hyperlinks, it is possible to follow one’s nose — what some vendors call exploratory search or search by serendipity.

Taxonomy Benefits

A taxonomy, when properly implemented, offers yields payoffs:

First, users like to point-and-click to discover information without having to craft a query. Believe me, most busy people in an organization don’t like trying to outfox the search box.

Second, the categories — even when hidden behind a naked search box interface — are intuitively obvious to a user. An accountant may (as I have seen) enter the term finance and then point-and-click through results. When I ask users if they know specific taxonomy terms, I hear, “What’s a taxonomy?” Intuitive search techniques should be a part of behind-the-firewall search and content processing systems.

Third, management is willing to invest in fine-tuning a taxonomy. Unlike a controlled vocabulary, a suggestion to add categories meets with surprisingly little resistance. I think the intuitive usefulness of cataloging and categorizing is obvious to people who tell people to search for them.

Some Pitfalls

There are some pitfalls in the taxonomy game: The standard warnings are “Don’t expect miracles when you categorize modest volumes of content.” And “Be prepared for some meetings that are more like a graduate class in logic than trying to figure out how to deliver what the marketing department needs in a search system. ” Etc.

On the whole, the investment in a system that automatically indexes is a wise one. It becomes ever wiser when the system can use a knowledge bases, word lists, taxonomies, and other information inputs to index more accurately.

Keep in mind that “smart” systems can be right most of the time and then without warning run into a ditch. At some point, you will have to hunker down and do the hard thinking that a useful taxonomy requires. If you are not sure how to proceed, try to get your hands on a the taxonomies that once were available from Convera. Oracle one once? offered vertical term lists. You can also Google for taxonomies. A little work will return some useful examples.

To wrap up, I am delighted that so many individuals and organizations have an interest in taxonomies — whether a fad or something more epistemologically more satisfying. The content processing industry is maturing. If you want to see a taxonomy in action, check out:

HMV, powered by Dieselpoint

Oracle’s Pressrom, powered by Siderean Software’s system

US government portal powered by Vivisimo (Microsoft)

Stephen Arnold, February 8, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Online (general), Search | Comments Off on Taxonomy: Search’s Hula-Hoop®

No News. No, Really

February 8, 2008

A colleague (who shall remain nameless) sent me at 0700 Eastern this morning, Friday, February 8, 2008. To brighten this gloomy day in Harrod’s Creek, he wrote: “Your site indeed does have excellent content. It could be more ‘newsy’ though, which would encourage people to come back daily.”

Blog Changes Coming

Great comment, and I have begun taking baby steps to improve this Web log. It’s not even a month old, and I know I have to invest more time in this beastie. In the next two weeks, I want to introduce a three-column format, and I will include links to news. Personally, I dislike news because it is too easy to whack out a “press release”, post it on a Free news release distributions, and sit back while the tireless Topix.net and Google.com bots index the document. Sometimes, a few seconds after the ping server spits out a packet, the “press release” turns up in one of my alert mechanisms. Instant news — sort of. Not for me, sorry.

The other change will be the inclusion of some advertising. My son, who runs a thriving Google-centric integration business, reminded me, “Dad, you are getting traffic. Use AdSense to generate some revenue.” As a docile father, I will take his suggestion. We will put the text ads in the outside column of the new “magazine” layout we will be using. I will also write an essay about what he is doing and why it is representative of the change taking place in fixing a “broken” search system. Not news. But if you are struggling with a multi-million dollar investment in a behind-the-firewall system that users find a thorn in their britches, you will want to know how to make the pain go away without major surgery. You won’t find this information on most search and content processing vendors’ Web sites. Nor will you get much guidance from the search “experts” involved in search engine optimization, shilling for vendors with deep pockets, or from analysts opining about the “size” of the market for “enterprise search”, whatever that phrase means. You will get that information here, though. No links in this paragraph. I don’t want hate mail.

Let me be perfectly clear, as our late, beloved, President Richard Nixon used as an audio filler, “The content of this Web log is NOT news.” If you look at the posts, I have been using this Web log to contain selected information I couldn’t shoehorn into my new study Beyond Search: What to Do When Your Search System Doesn’t Work (in press at Gilbane Group now, publication date is April 2008).

Some News Will Creep In

It is true that I have commented on some information that is garnering sustained financial and media attention; for example, the issues that must be satisfactorily resolved if Microsoft succeeds in its hostile take over of Yahoo. Unless a white knight gallops into Mountain View, California, soon, Micro-Hoo will be born. I’ve also made a few observations about the Microsoft – Fast tie up. Although 1 /36th the dollar amount of the Yahoo deal — the Microsoft – Fast buyout is interesting. The cultural, technical, social, and financial issues of this deal are significant. My angle is that Fast Search & Transfer was the company to turn its back on Web indexing and advertising at the moment Google was bursting from the starting blocks. Fast Search’s management sold its Web search site AllTheWeb.com and its advertising technology at the moment Google embraced these two initiatives. We have, therefore, a fascinating story with useful lessons about a single decision’s impact. Google’s market cap today is $157.9 billion and Fast Search’s is $1.2 billion. I think this decision is a pivotal event in online search and content processing.

In my opinion, Fast Search was in 2003 – 2004 the one company with high-speed indexing and query processing technology comparable to Google’s. When Fast Search & Transfer withdrew from Web indexing, Google had zero significant competition. AltaVista.com had fallen off the competitive radar under Hewlett-Packard’s mismanagement.

Fast Search bet the farm on “enterprise search”. Google bet on Web search and advertising sector. Here’s a what – if question to consider, “What if Fast Search had fought head-to-head with Google? Perhaps a business historian will dig into this more deeply.

I have several posts in the works that are definitely non-news. Later today, I will offer some observations about today’s taxonomy fad. I have some information about social search, and I put particular emphasis on the weaknesses of this sub-species of information retrieval; namely, the leader is Google. The other folks are doing “social search” roughly in the same way a high school soccer team plays football against the Brazilian national team. The game is the same, but the years of experience the Brazilians have translate into an easy win. I think this is a controversial statement. Is it news? No.

Stealth Feature to DÃ©but

The revamped Web log will include a “stealth feature”. (In Silicon Valley speak, this means something anyone can do but when kept secret becomes a “secret”.) I don’t want to let the digital cat out of the Web bag yet, but you will be able to get insight into how some of the major search and content processing developed. I will post some original information on my archive Web site and summarize the key points in a Web log posting in this forum.

We have been getting an increasing number of off-topic comments. I’m deleting these. My editorial policy is that substantive comments germane to search and content processing are fine. You may disagree with me, explain a point in a different way, or provide supplemental information. Using the comments section to get a person to buy stolen software, acquire Canadian drugs, connect with Clara (a popular spammer surname) for a good time, or the other wacko stuff is out of bounds.

Okay, Here’s Some Real News

For the news mavens, I’ve included some hot links in this announcement of non news. Here’s one to Google’s version of Topix.net’s local news service. (Hurry, these newsy links go dead pretty quickly, which is one reason I don’t do news.) You don’t need me to tell you what this means to Topix.net. It seems that when you search for Arnoldit on Google, the Govern – ator comes up first. Now that’s a interesting twist in Google’s relevancy algorithm.,

Stephen Arnold, February 8, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Microsoft, Search | Comments Off on No News. No, Really

When Turkeys Marry

February 7, 2008

In 2006, I was invited to attend a dinner with Robert Scoble in London, England. For two hours, I sat and listened. As the trajectory of the dinner took shape, I concluded that I should generally agree with his views. Now, 14 months removed from that instructive experience, I’m going to agree with him about the proposed Microsoft – Yahoo tie up.

I enjoyed “What You All Are All Missing about Google.” The best sentence in the essay is, in my opinion: “As I said on Channel 5 news on Friday night: put two turkeys together and you don’t get an eagle.” [Emphasis added.] Microsoft is one turkey. Yahoo is the other turkey. Together — no eagle. I agree with him.

I also agree in a tangential way with the statement attributed to an SEO guru: “Danny Sullivan told me that this deal is all about search.” [Emphasis added.] Let me offer what the French call pensÃ©es.

Turkeys

Turkeys, as I recall from my days near the farm yard, are one of the few birds rumored to drown in a rainstorm. The folk tale is that when a turkey looks up at the rain and forgets to button its beak, the turkey drowns. Turkeys are not the rocket scientists of the bird world, but , in my experience, turkeys aren’t that dumb. When Thanksgiving came around, my aunt and her hatchet were watched closely by the turkeys in the family’s farm yard. Turkeys knew something was about to happen.

The firms are profitable. Both have some good products. Both companies have thousands of extremely bright people. The reason I’m uncomfortable (note that I am not disagreeing) is that each company has certain technological characteristics. Each company has a distinctive culture. Turkey may be too strong a characterization.

Google’s View of the Deal

I agree that Google can benefit from Microsoft’s acquisition of Yahoo. Mr. Scoble says: “Google stands to gain HUGE by slowing down this deal.” To add one small nuance to Mr. Scoble’s statement I find Google’s thoughts and actions difficult to predict. My infrequent yet interesting interactions with the company have given me sufficient data to conclude that Google is a strategy mongrel. Other actions are manifestations of spontaneous decisions and “controlled chaos.” Perhaps Google is working, on one front, with thrusts into five or six key business sectors. On the other hand, Google is reacting quickly with a suggestion that a Microsoft – Yahoo tie up will be the end of the Web as we know it.

Email

I agree email is not where the money is. But Google has devised several mechanism to monetize electronic communications of which email is one member of this class. Google may have several ways to turn the “hard to monetize” email into “less hard to monetize” email. I expect rapid-fire testing and adapting. I want to wait and see what the Google does.

Instant Messaging: An Email Variant

Instant messaging has been difficult to monetize. IM is a variant of email. Perhaps the communication modes that today seem distinct will blur going forward?

Search

I agree that the Microsoft – Yahoo deal is about search. May I respectfully suggest that the major chord in this opera is the platform? Search is but one application running on that platform. I articulate this idea in my two Google studies: 2005’s The Google Legacy: How Search Became the Next Applicaton Platform and 2007’s Google Version 2.0: The Calculating Predator, If Google is a platform, does it follow that Google could disrupt other industry sectors as it has snarled competitive search engines’ horse power?

Gold in Google’s Pockets

I agree that Google benefits by a deal delay. I agree that Google benefits if Microsoft – Yahoo get married. Google may be vulnerable no matter what Microsoft does. Perhaps I will discuss some of these exogenous factors in another post.

Stephen Arnold, February 7, 2008

Written by Stephen E. Arnold · Filed Under Google, Microsoft | Comments Off on When Turkeys Marry

Requirements for Behind-the-Firewall Search

February 5, 2008

Last fall, I received a request from a client for a “shopping list of requirements for search.” The phrase shopping list threw me. My wife gives me a shopping list and asks me to make sure the tomatoes are the “real Italian kind”. She’s a good cook, but I don’t think she worries about my getting a San Marzano or an American genetically-engineered pomme d’amour.

Equating shopping list with requirements for a behind-the-firewall search / content processing system gave me pause. As I beaver away, gnawing down the tasks remaining for my new study Beyond Search: What to Do When Your Search System Won’t Work”, I had a mini-epiphany; to wit:

Getting the requirements wrong can
undermine a search / content processing system.

In this essay, I want to make some comments about requirements for search and content processing systems. I’m not going to repeat the more detailed discussion in The Enterprise Search Report, 1st, 2nd, and 3rd editions, nor will I recycle the information in Beyond Search. I propose to focus on the tendency of very bright people to see search and content processing requirements like check off items on a house inspection. Then I want to give one example of how a perceptual mismatch on requirements can cause a search and content processing budget to become a multi-year problem. To conclude the essay, I want to offer some candid advice to three constituencies: the customer who licenses a search / content processing solution, the vendor who enters into a deal with a customer, and the consultants who circle like buzzards.

Requirements

To me, a requirement is a clear, specific statement of a function a system should perform; for example, a search system should process the following file types: Lotus Notes, Framemaker, and DB2 tables.

How does one arrive at a requirement and then develop a list of requirements?

Most people develop requirements by combining techniques. Here’s a short list of methods that I have seen used in the last six months:

Ask users of a search or content processing system what they would like the search system to do
Look at information from vendors who seem to offer a solution similar to the one the organization thinks it wants
Ask a consultant, sometimes a specialist in a discipline only tangentially related to search.

The Fly Over

My preferred way of developing requirements is more mundane, takes time, and is resistant to short cuts. The procedure is easy to understand. The number of steps can be expanded when the organization operates in numerous locations around the world, processes content in multiple languages, and has different security procedures in place for different types of work.

But let’s streamline the process and focus on the core steps. When I was younger, I guarded this information closely. I believed knowing the steps was a key ingredient for selling consulting. Now, I have a different view, and I want you to know what I do for the simple reason that you may avoid some mistakes.

First, perform a data gathering sweep. In this step you will be getting a high-level or general view of the organization. Pay particular attention to these key areas. Any one of them can become a search hot spot and burn your budget, schedule, and you with little warning:

Technical infrastructure. This means looking at how the organization handles enterprise applications now, what the hardware platform is, what the work load on the present technical staff is, how the organization uses contractors and outsourcing, what the present software licensing deals stipulate, and the budget. I gather these data by circulating a data collection form electronically or using a variety of telephonic and in-person meetings. I like to see data centers and hardware. I can tell a lot by looking at how the cables are organized and from various log files which I can peruse on site with the customer’s engineer close at hand to explain a number or entry to me. The key point of the exercise is to understand if the organization is able to work within its existing budget and keep the existing systems alive and well.
User behavior. To obtain these data, I use two methods. One component is passive; that is, I walk around and observe. The other component is active; that is, I set up brief, informal meetings where people are using systems and ask them to show me what they now do. If I see something interesting, I ask, “What caused you to take that action?” I write down my observations. Note that I try to get lower-level employees input about needs before I talk to too many big wheels. This is an essential step. Without knowing what employees do, it is impossible to listen accurately to what top managers assert.
Competitive arena. Most organizations don’t know much about what their competitors do. In terms of search, most organizations are willing to provide some basic information. I find that conversations at trade shows are particularly illuminating. But another source of excellent information is search vendors. I admit that I can get executives on the telephone or by email pretty easily, but anyone can do that with some persistence. I ask general questions about what’s happening of interest in law firms or ecommerce companies. I am able to combine that information with data I maintain. From these two sources, I can develop a reasonable sense of what type of system is likely to be needed to keep Company A competitive with Company B.
Management goals. I try to get a sense of what management wants to accomplish with search and content processing. I like to hear from senior management, although most senior managers are out of touch with the actual information procedures and needs of their colleagues. Nevertheless, I endure discussions with the brass to get a broad calibration. Then I use two techniques to get information about the needs. Once these interviews or discussions are scheduled, I use two techniques to get data from mid-level managers. One technique is a Web survey. I use an online questionnaire and make it available to any employee who wishes to participate. I’m not a fan of long surveys. A few pointed questions delivers the freight of meaning I need. More importantly, survey data can be counted and used as objective data about needs. Second, I use various types of discussions. I like one-on-one meetings; I like small-group meetings; and I like big government-style meetings with 30 people sitting around a chunk of wood big enough to make a yacht. The trick is to have a list of questions and the ability to make everyone comment. What’s said is important but how people react to one another can speak volumes and indicate who really has a knack for expressing a key point for his / her co-workers.

I take this information and data, read it, sort it, and analyze it. The result is the intellectual equipment of a bookcase. The supports are the infrastructure. Each of the shelves consists of the key learnings from the high-level look at the organization. I don’t know how much content the organization has. I don’t know the file types. I don’t have a complete inventory of the enterprise applications into which the search and content processing must integrate. What I do know is whom to call or email for the information. So drilling down to get a specific chunk of data is greatly simplified by the high-level process.

Matching

I take these learnings and the specific data such as the list of enterprise systems to support and begin what I call the “matching sequence.” Here’s how I do it. I maintain a spreadsheet with the requirements from my previous search and content processing jobs. Each of these carries a short comment and a code that identifies the requirement by availability, stability, and practicality. For example, many companies want NLP or natural language processing. I code this requirement as Available, Generally Stable, and Impractical. You may disagree with my assessment of NLP, but in my experience few people use it, and it can add enormous complexity to an otherwise straight forward system. In fact, when I hear or identify jargon in the fly-over process, my warning radar lights up. I’m interested in what people need to do a job or to find on point information. I don’t often hear a person in accounting asking to do a query in the form a complete sentence. People want information in the most direct, least complicated way possible. Writing sentences is neither easy nor speedy for many employees working on a deadline.

What I have after working through my list of requirements and the findings from the high level process is three lists of requirements. I keep definitions or mini-specifications in my spread sheet, so I don’t have to write boiler plate for each job. The three lists with brief comments are:

Must-have. These are the requirements that the search or content processing system must meet in order to meet the needs of the organization based on my understanding of the data. A vendor unable to meet a must-have requirement, by definition, is excluded from consideration. Let me illustrate. Years ago, a major search procurement stipulated truncation, technically lemmatization. In plain English, the system had to discard inflections, called rearward truncation. One vendor wrote an email saying, “We will not support truncation.” The vendor was disqualified. When the vendor complained about the disqualification, I showed the vendor the email. Silence fell.
Options. These are requirements that are not mandatory for the deal, but the vendor should be able to demonstrate that these requirements can be implemented if the customers request them. A representative option is support for double-byte languages; e.g., Chinese. The initial deployment does not require double byte, but the vendor should be able to implement double-byte support upon request. A vendor who does not have this capability is on notice that if he / she wins the job, a request for double-byte support may be forthcoming. The wise vendor will make arrangements to support this request. Failure to implement the option may result in a penalty, depending on the specifics of the license agreement.
Nice-to-have. These are the Star Trek or science fiction requirements that shoot through procurements like fat through a well-marbled steak. A typical Star Trek requirement is that the system deliver 99 percent precision and 99 percent recall or deliver automatic translation with 99 percent accuracy. These are well-intentioned requests but impossible with today’s technology and budgets available to organizations. Even with unlimited money and technology, it’s tough to hit these performance levels.

Creating a Requirements Document

I write a short introduction to the requirements, create a table with the requirements and other data, and provide it to the client for review. After a period of time, it’s traditional to bat the draft back and forth, making changes on each volley. At some point, the changes become trivial, and the document is complete. There may be telephone discussions, face-to-face meetings, or more exotic types of interaction. I’ve participated in a requirements wiki, and I found the experience thrilling for the 20 – somethings at the bank and enervating for me. That’s what 40 years age difference yields — an adrenaline rush for the youngster and a dopamine burst for the geriatrics.

There are different conventions for a requirements document. The US Federal government calls a requirements document “a statement of work”. There are standard disclaimers, required headings for security, an explanation of what the purpose of the system is, the requirements, scoring, and a mind-numbing array of annexes.

For commercial organizations, the requirements document can be an email with the following information:

Brief description of the organization and what the goal is
The requirements, a definition, the metrics for performance or a technical specification for the item, and an optional comment
What the vendor should do with the information; that is, do a dog-and-pony show, set up an online demonstration, make a sales call, etc.
Whom to call for questions.

Whether you prefer the bureaucratic route or a Roman road builder method, you now have your requirements in hand.

Then What?

That’s is a good question. In go-go organizations, the requirements document is the guts of a request for a proposal. Managing an RFP process is a topic for another post. In government entities, the RFP may be preceded by an RFI or Request for Information. When the vendors provide information, a cross-matching of the RFI information with the requirements document (SOW) may be initiated. The bureaucratic process may take so long that the fiscal year ends, funding lost, and the project is killed. Government work is rewarding in its own way.

Whether you use the requirements to procure a search system or whether you put the project on hold, you have a reasonably accurate representation of what a search / content processing system should deliver.

The fly-over provides the framework. The follow up questions deliver detail and metrics. The requirements emerge from the analysis of these information and data. The requirements are segmented into three groups, with the wild and crazy requirements relegated to the “nice to have” category. The customer can talk about these, but no vendor has to be saddled with delivering something from the future today. The requirements document can be the basis of a procurement.

There are some pitfalls in the process I have described. Let me highlight three:

First, this procedure takes time, expertise, and patience. Most organizations lack adequate amounts of each ingredient. As a result, requirements are off kilter, so the search system can list or sink. How can a licensee blame the vendor when the requirements are wacky.

Second, the analysis of the data and information is a combination of analytic and synthetic investigation. Most organizations prefer to use their existing knowledge and gut instinct. While these may be outstanding resources, in my experience, the person who relies on these techniques is guessing. In today’s business climate, guessing is not just risky. It can severely damage an organization. Think about a well-known pharmaceutical company pushing a drug to trial despite it being known to show negative side effects in the company’s own prior research. That’s one consequence of a lousy behind-the-firewall search / content processing system.

Third, requirements are technical specifications. Today, people involved in search want to talk about the user interface. The user interface manifests what is in the system’s index. The focus, therefore, should not be on the Web 2.0 color and features of the interface. The focus must be kept squarely on the engineering specifications for the system.

You can embellish my procedure. You can jiggle the sequence. You may be able to snip out a step or a sub-process. But if you jump over the hard stuff in the requirements game, you will deploy a lousy system, create headaches for your vendor, annoy, even anger, your users, and maybe lose your job. So, get the requirements right. Search is tough enough without starting off on the wrong foot.

Stephen Arnold, February 6, 2008

Written by Stephen E. Arnold · Filed Under Library automation, Vertical search | 1 Comment

Hit Boosting: SEO for Intranet Search Systems

February 5, 2008

The owner of a local marketing company asked me, “Is there such a thing as SEO for an in-house search system?”

After gathering more information about this large health care organization, I can state without qualification, “Yes.”

Let’s define some terms, because the acronym SEO is used primarily to apply techniques to get a public Web page to appear at the top of a results list. Once this definition is behind us, I want to look at three situations (not exhaustive but illustrative) when you would want to use SEO for behind-the-firewall search. To wrap up, I will present three techniques for achieving SEO-type “lift” on an Intranet.

I’m not going to dig too deeply into the specific steps for widely-used search systems. I want to provide some broad guidance. I decided to delete this information from my new study Beyond Search: What to Do When Your Search System Doesn’t Work in order to keep the manuscript a manageable size.

SEO and its variants is becoming more and more important, and I have been considering a short monograph on this topic. I implore the SEO gurus, genii, and mavens to spare me their brilliant insights about spoofing Google, Live.com, and Yahoo. I am not interested in deceiving a public Web search engine. Anyway, my comments aren’t aimed at the public indexing systems. We’re talking about indexing information on servers that live behind a firewall.

Definition Time

SEO means “search engine optimization.” In my view, this jargon should be used exclusively for explaining how a Web master can adjust Web pages (static and dynamic) to improve a site’s ranking in a results list. The idea behind SEO is to make editorial and coding changes so a Web page buried on results page 12 appears on results page 1 even though the content on the Web page doesn’t warrant that high rank. A natural high rank can be seen with this query; go to Google and search for “arnoldit google patents”. My Web site should be at or near the top of the results list. SEO wizards want to make this high ranking happen — not by content alone — but with a short cut or trick. SEO often aims to exploit idiosyncrasies in the search sysetm indexing and ranking procedures. If you want to see a list of about 100 factors that Google allegedly used in the 2004-2005 time period, get a copy of my The Google Legacy. I include a multi-page table and some examples. But my thinking about distorting a relevancy procedures makes me queasy.

When you want to make sure specific content appears on a colleague’s behind-the-firewall, results page, you are performing hit boosting. The idea behind “hit boosting” is that certain organizational content will not appear on a colleague’s results page because it is too new, too obscure, or set forth in a manner that a behind-the-firewall content processing system cannot figure out.

An example from my files says is a memo whose text is in its entirety, “ATTN: Fire Drill at 3 PM. Mandatory. Susan.” Not surprisingly, you would have to be one heck of a search expert to find this document even if you knew it existed. With the latency in most behind the firewall content processing systems, this memo may not be in the index until the fire drill was over and forgotten.

To get this message in front of your colleagues, you need “hit boosting”. Some information retrieval experts just say “boosting” to refer to this function.

What Needs Boosting?

Let me give you an example. a vice president of the United States wanted his home page to come up at the top of a results list on various Federal systems. One system — used by the 6,000 officials and staff of the US Senate — did not index the Veep’s Web content. The only way to make the site appear was to do “hit boosting.” The reason had nothing to do with relevance, timeliness, or any query. The need for hit boosting was pragmatic. A powerful person wanted to appear on certain results pages. End of story. You may find yourself in a similar situation. If you haven’t, you probably will.

A second example is an expansion of the emergency notification about the fire drill. Your colleagues in certain departments — HR, legal, and accounting in my experience — tell you that certain information must be displayed for all employees. I dislike categorical affirmatives, but these folks wallow in them. Furthermore the distinction between a search system, a portal, and a newsfeed is “too much detail”. Some search system vendors have added components to make this news push task easier.

A third example is that a very important document cannot be located. There are many reasons for this. Some systems may perform key word indexing. The terminology of the document is very complex, even arcane. A person looking for this type of legal, scientific, technical, or medical document cannot locate it unless he or she knows the specific terminology used in the document. Searching by more general concepts buries the document in a lengthy result list or chops off the least relevant documents, displaying only the 20 most relevant documents. Some systems routinely reject documents if they exceed certain word counts, contain non-text objects, or is an unsupported file format. Engineers are notorious for spitting out a drawing with a linked broadsheet containing the components in the drawing and snippets of text explaining in geek-speak a new security system.

To recap, you have to use “hit boosting” to deal with requests from powerful people, display content to employees whether those employees have searched for the information or not, or manipulate certain types of information to make it findable.
In my work, the need for “hit boosting” is increasingly. The need rises as the volume of digital information goes up. The days of printing out a message and putting it on the bulletin board by the cafeteria are fast disappearing.

How to Do It

There are three basic techniques for “hit boosting”. I am going to generalize, not select a single system such as the Google Search Appliance or Vivisimo’s system. Details vary by system, but the broad principles I summarize should work.

First, you create a custom search query and link it to an icon, image, or chunk of text. When the user clicks the hot link, the system runs the query and displays the content. For example, you can use the seal of the vice president, use hover text that says, “Important Information from the Vice President”, and use a hot link on text that says, “Click here.” Variations of this approach include what I call “CSS tweaking” accompanied with an iFrame. The idea is that on any results page, you force the information of the moment in front of the user. If this seems like a banner ad or an annoying Forbes’ message, you are correct. The idea is that you don’t fool around with your enterprise search system. You write code to deliver exactly what the powerful person wants. I know this is not “search”, but most powerful people don’t know search from a Queensland cassowary. When you do a demo, the powerful one sees what he / she expects to see.

Second, you read the documentation for your search engine and look for the configuration file(s) that control relevance. Some high-end search systems allow you to specify conditions or feed “rules” to handle certain content. The trick here is to program the system to make certain content relevant regardless of the user’s query. If you can’t find the config file or specific relevance control panel, then you use the search systems API. Write explicit instructions to get content from location A and display it at location B. You may end up with an RSS hack to refresh the boosted content pool, so expect to invest some time mucking around to get the effect you want. Because vendor documentation is often quite like a haiku, you will be doing some experimenting. (Remember. Don’t do this on a production server.) You can also hack an ad display widget into your results page. With this approach, your boosted content is handled as an ad.

Third, you take the content object, rework it into a content type the search system can manipulate. Then you add lots of metadata to this reworked document. You are doing what SEO mavens call keyword stuffing or term stuffing. With some experimentation, you can make one or more documents appear in the context you want. Once you have figured out the right combination of terms to stuff, you can automate this process and “inject” these additional tags into any document you want to boost. (The manual hit boosting techniques should be automated as soon as you know the hack won’t cause other problems.)
Wrap Up

Hit boosting is an important task for system administrators and the politically-savvy managers of a behind-the-firewall search system. If you have other tricks and techniques, please, post them so others can learn.

Stephen Arnold, February 5, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Search | 2 Comments

Simple Math = Big Challenge: MSFT & YHOO

February 4, 2008

I have only a few sections of Beyond Search to wrap up. Instead of being able to think about my updating my description of Access Innovations’ MAIstro, I am distracted by jibber jabber about the Microsoft (NSDQ:MSFT) Yahoo (NSDQ:YHOO) tie up.

Where We Are

First, it’s an offer, isn’t it? Maybe a trial balloon? No cash and stock have changed hands as I write this in the wee hours of Monday, February 4, 2008. Yet, many are in a frenzy over a hostile take over. Think about this word “hostile.” It means antagonistic, unfriendly, enemy. The reason for the bold move? Google, a company that has out foxed Microserfs and Yahooligans for almost a decade.

The number of articles in my various alerts, RSS feeds, and emails is remarkable. Worldwide a Microsoft – Yahoo marriage (even it is helped along with a shotgun) ignites folks’ imagination. Neither Microsoft nor Yahoo will be able to recruit tech wizards, one pundit asserts. Innovation in Silicon Valley will be forever changed, posits another. Sigh.

Sorry. I’m not that excited. I’m interested, but I’m too old, too pragmatic, and too familiar with the vagaries of acquisitions to jump up and down.

Judging from some grousing from Yahooligans, some Yahoo professionals aren’t too keen about working for Microsoft. I have had a hint that some Microsoft wizards aren’t too excited about fiddling with Yahoo’s mind-numbing array of products, services, technologies, search systems, partnerships, and research initiatives.

I think the root concern is trying to figure out how to fit two large operations together, a 1 + 1 = 3 problem. For example, there’s Yahoo Mail and Hotmail Live; Yahoo Panama and Microsoft Ad Center; and Yahoo News and Microsoft’s new services, etc., etc. One little-considered consequence is that Microsoft may end up owning more search systems than any other company. That’s a technology can of worms worthy of a separate essay.

I will tell you who is excited, and, please, keep in mind that this is my opinion. And, once I express my view, I want to offer another very simple (probably too simple for an MBA wizard) math problem. I will end this essay with my now familiar observations. Let’s begin.

Who Benefits?

This is an easy question to answer, and you will probably think that I am stating the obvious. Bear with me because the answer explains why some at Microsoft may not be able to get the right prescription for their deal bifocals. Without the right eye glasses, it’s tough to discern some smaller environmental factors obscured in the billion dollar fusillade fired at Yahoo’s board of directors’ meeting.

Shareholders who can make some money with the Microsoft offer. When there’s money to be made, concerns about technology, culture, and market opportunity are going to finish last. Most shareholders don’t think too much other than the answer to two questions: “How much did I make?” and “What are the tax implications?”
Investment bankers who earn money three ways on a deal of this magnitude. There are, of course, other ways for those in the financial loop to make money, but I’m going to focus on the ones that keep these professionals in blue suits, not orange jump suits. [a] Commissions. Where the is churn, there is a commission. For many investment advisors, buying and selling equals a bigger payday. [b] Bonuses. The mechanics of an investment banker’s bonus are complex. After all, it is a banker dealing with a fellow banker. Mere mortals should steer clear. The idea is simple. Generate churn or a fee, and you get more bonus money. The first three months of a calendar year is bonus and job hopping time on Wall Street. Anyone who can get a piece of the action for a big deal gets cash. [c] Involvement in a big deal acts like a huge electro magnet for more deals. Once Microsoft “thought” of the acquisition, significant positive input about the upside of the deal pours into the potential acquirer.
Consultants. Once a big deal is announced, the consultants [delete apostrophe here] leap into action. The buyer needs analyses, advice, and strategic counsel. The buyer’s minions need tactical advice to answer such questions as “How can we maximize our tax benefits?” and “How can we pay for this with cheap money?” The buyer becomes hungry for advisors of every species. Blue-chip outfits like Bain, Booz, Allen & Hamilton, Boston Consulting Group, and McKinsey & Co. drool in eagerness to provide guidance on lofty strategy matters such as answering the question, “How can I maximize my pay-out?” And “What are the tax consequences of my windfall profit?” Tactical advisors from these firms can provide support on human resource issues and real estate leases, among other matters. In short, buyers throw money at “the problem” in order to be prepared to negotiate or find a better deal.

These three constituencies want the deal to go through. If Microsoft is the buyer, that’s fine. If another outfit with cash shows, that’s okay too. The deal now has a life of its own. Money talks. To get the money, these constituencies have no desire to help Microsoft “see” some of the gaps and canyons that must be traversed. Let’s turn to one practical matter and the aforementioned simple math. Testosterone and money — these are two ways to cloud perception and jazz logic.

More Simple Math

Let’s do a thought experiment, what some German philosophers call Gedankenexperiment. I am not talking about the proposed Microsoft – Yahoo deal, gentle attorneys.

Accordingly, We have two companies, Company Alpha and Company Beta; hereinafter, Company A(lpha) and Company B(eta), neither of which is a real company and should not be construed as having any similarity with any company now in existence.

Company Alpha has a dominant position in a market and wants to gain a larger share of a newer, tangential market. Company A has a proven, well-tuned, aging business model. That business model is a variation on selling subscriptions and generating annuity income from renewals. Company A’s business model works this way. Company A offers a product and then, on a periodic basis, Company A makes a change to an existing product, assessing a fee for customers to get the “new” or “enhanced” version of the product (service).

The idea is that once a subscription base is in place, Company A can predict a certain amount of revenue from standing orders and new orders. Company A has an excellent, stable, cash flow based on this well-crafted business model and periodic fee increases. Although there are environmental factors that put pressure on the proven business model, the customer base is large, and the business model continues to work in Company A’s traditional markets. Company A, aware of exogenous factors — for instance, the emergence of cloud computing and other non-subscription business models — has learned through trial and error that its subscription-based business model does not work in certain new markets. These new markets are potentially lucrative, representing “new” revenue and a threat to Company’s existing revenue stream. Company A wants to acquire a company to increase its chances for success in the new and emerging markets. Company A’s goal is to [a] protect its existing revenue, [b] generate new revenue, and [c] prevent other companies from dominating the new market(s).

Company A has performed a rational, market analysis. Company A’s management has determined that one company only — our Company B — represents a mechanism for achieving Company A’s goals. Company A, by definition, has performed its analyses through Company A’s “eye glasses”; that is, Company A’s proven business model and business culture. “Walking in another person’s moccasins” is easy to say and difficult, if not impossible, to do. Everyone views the world through his own experiential frame. Hence, Company A “sees” Company B as having characteristics, attributes, and capabilities that are, despite some acceptable risks, significant benefits to Company A. Having made this decision about the upside from buying Company B, the management of Company A becomes less able to accept alternative inputs, facts, information, perceptions, and opinions. Company A’s reasoning in its decision space is closed. Company A vivifies what William James called “a certain blindness.” The idea is that each person is “blind” in some way to reality that others can perceive.

The implications of “a certain blindness” in this hypothetical acquisition warrant further discussion:

Culture

Company A has a culture built around a business model that allows incremental product enhancements so that subscription revenue is generated. Company B has a business model built around acquisitions. Company A has a more or less homogeneous atmosphere engendered by the business model or what Company A calls the agenda. Company B is more like a loose federation of separate companies — what some MBAs might call a Ling Temco Vought framework. Each entity within Company B retains its own identity, enjoys wide scope of action, and preserves its own culture. “We do our own thing” characterizes these units of Company B. Company A, therefore, has several options to consider:

Company A can leave Company B as it is. The plus is that not much will change Company B’s operations in the short term. The downside is that the technical problems will not be resolved.
Company A can impose its culture on Company B. You don’t need me to tell you that this will go over like the former Soviet Union’s intervention in Poland in the late 1950s.
Company A can try to make changes gradually. (This is a variation of the option in bullet 2 and will simply postpone rebellion. )

Technology

Company A has a different and relatively homogeneous technology base. Company B has a heterogeneous technology base. Maintaining multiple systems is more costly in general than homogeneous systems. Upon inspection, the technical staff needed to maintain these different systems have specialized to deal with particular technical problems in the heterogeneous environment. Technical people can learn new skills, but this takes time and adds cost. Company A has to find a way to streamline technical operations, reduce costs, and not waste time achieving rationalization. There are at least two ways to do this:

Shift to a single platform, ideally Company A’s
Retrain existing staff to have broader technical skills. With Company B’s staff able to perform more generalized work, Company A can reduce headcount at Company B, thus streamlining work processes and reducing cost.

Competitive Arena

The desirable new market for Company A has taking on the characteristics of what I call a “natural monopoly.” When I reflect on notable events in American business history, I note monopolistic behavior. Some monopolies were spawned by force of will; for example, JP Morgan and finance (this guy bailed out the US Treasury) and Andrew Carnegie and steel (this fellow thought of libraries for little people after pistol-whipping his competitors and antagonists).

Other monopolies — like Bell Telephone and your local electric company — came into being because some functions are more appropriately delivered by one organization. Water and Internet search / advertising, for instance, are subject to such economies of scale, quality of service, and standardization. In short, these may be “natural monopolies” due to numerous demand and cost force.

In our hypothetical example, Company A wants to enter a market which is coalescing and beginning now, based on my research, appears to be forming into a “natural monopoly”. This nameless competitor seems to be following a trajectory similar to that of the original Bell Telephone – AT&T life cycle.

Company A’s race, then, is against time and money. Untoward delay at any point going forward with regard to leveraging Company B means coming in second, maybe a distant second or losing out on the new market.

Instead of owning Park Place (a desirable property in the Parker Brothers’ game Monopoly), Company A ends up with Baltic and Mediterranean Avenues (really lousy properties in the Parker Brothers’ game). If Company A doesn’t get Company B, Company A is trapped in its old, deteriorating business model.

If Company A does acquire Company B, Company A has to challenge the competitor. Company B already has a five-year track record of being a day late and a dollar short. Company A, therefore, has to do everything in its power to make the Company B deal work, which appears to be an all-or-nothing proposition.

Now the math: Action by Company A = unknown, variable, escalating costs.

I told you math geeks would not like this analysis. Company A is betting the farm against long odds. Here’s why:

First, the cultures are not amenable to staff reductions or technological efficiencies; that is, use software and automation, not people, while increasing revenues. Company A, regardless of the money invested, cannot be certain of success. Company B’s culture – business model duality is investment insensitive. In short, money won’t close this gap. Company A’s resistance to cannibalizing its old, though still functioning, business model will be significant. Company A’s own employees will resist watching their money and jobs sacrificed to a great good.

Second, the competitive space is now being captured by the increasingly monopolistic competitor. Unchallenged for some period of time, the monopolistic competitor enjoys momentum and a significant lead in refining its own business model.

In the lingo of Wall Street, Company A can’t get enough “oxygen”; that is, revenue despite its best efforts to reign in the market leader.

Observations

If we assume a kernel of truth in my hypothetical analysis, we can now apply this hypothetical discussion to the Microsoft – Yahoo deal.

First, Microsoft’s business mode (not its technology) is the company’s strength. The business model is also its Achilles’ heel. Just as IBM’s mainframe-centric view of the world make its executives blind to Microsoft, now Microsoft can’t perceive today’s world from outside the Microsoft business model. The Microsoft business model is perhaps the most efficient subscription-based revenue generator in history. But that business model has not worked in the new markets Microsoft’s covets, so the Yahoo deal becomes the “obvious” play to Microsoft’s management. Its obviousness makes it difficult for Microsoft to see other options.

Second, the Microsoft business model is woven into the company’s culture. Cultures are ethnocentric. Ethnocentricity often manifests itself in conflict. Microsoft will have to make prescient, correct cultural decisions quickly and repeatedly. Microsoft’s culture, however, does not typically evidence excellent, rapid-fire decision-making.

Microsoft seems to be putting the company in a situation guaranteed to spark conflict within its own walls, between itself and Yahoo, and between Microsoft and Google. This is a three-front war. Even those with little exposure to military history can see that the costs and risks of a three-front conflict will be high, open-ended, and difficult to estimate.

The hostile bid itself is suggestive that Microsoft could not catch Google without Google, the notion that Microsoft can catch Google with the acquisition requires tremendous confidence in Microsoft’s management. I think Microsoft can make the deal work, but I think that execution must be flawless and that favorable winds push Microsoft along.

If Google continues to race forward, Microsoft has to spend more money to implement efficiencies more quickly. The calculus of catching a moving target can trigger a cost crisis. If costs go up too quickly, Microsoft must fall back on its proven business model. Taking a step backward when resolving the calculus of catching Google is not a net positive.

As you read this essay, you are wondering, “How can this doom and gloom be real?” The buzz about the deal is mostly positive. If you don’t believe me, call your broker and ask him how much your mutual fund will benefit from the MSFT – YHOO tie up.

I’ve spent some time around money types, and I can tell you making money is akin to blood in the water for sharks.

I’ve also been acquired and done the acquiring. Regardless of being the buyer or being the bought, ties ups are tricky. The larger the stakes, the more tricky the tie ups become. When the tie up is designed to halt the Google juggernaut, the calculus of time – cost is hard.

Please, recall, that I’m not saying that stopping Google is impossible for a Microsoft – Yahoo tie up to deliver. Making the tie up work will be difficult.

Don’t agree? That’s okay. Use the comments to set me straight. I’m willing to listen and learn. Just don’t overlook my core points; namely, business models, cultures, and technologies. One final thought: don’t factor out the Google (NSDQ:GOOG).
Stephen Arnold, February 4, 2008

Written by Stephen E. Arnold · Filed Under Library automation, Microsoft, Online (general) | 2 Comments

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search