The Microsoft Yahoo Fiasco: Impact on SharePoint and Web Search

May 5, 2008

You can’t look at a Web log with out dozens of postings about Microsoft’s adventure with Yahoo. You can grind through the received wisdom on Techmeme River, a wonderful as-it-happened service. In this Web log posting, I want to recap some of my views about this remarkable digital charge at a windmill. On this cheery Monday in rural Kentucky, I can see a modern Don Quixote, who looks quite a bit like Steve Ballmer, thundering down another digital hollow.

What’s the impact on SharePoint search?

Zip. Nada. None. SharePoint search is not one thing. Read my essay about MOSS and MSS. They add up to a MESS. I’m still waiting for the well-dressed but enraged Fast Search PR wizard to spear shake a pointed lance at me for that opinion. Fast Search is sufficiently complex and SharePoint sufficiently Microsoftian in its design to make quick movement in the digital swamp all but impossible.

A T Ball player can swing at the ball until he or she gets a hit, ideally for the parents a home run. Microsoft, like the T Ball player in the illustration, will be swinging for an online hit until the ball soars from the park, scoring a home run and the adulation of the team..

Will Fast Search & Transfer get more attention?

Nope. Fast Search is what it is. I have commented on the long slog this acquisition represents elsewhere. An early January 2008 post provides a glimpse of the complexity that is ESP (that’s enterprise search platform, not extrasensory perception). A more recent discussion talks about the “perfect storm” of Norwegian management expertise, Microsoft’s famed product manager institution, and various technical currents, which I posted on April 26, 2008. These posts caused Fast Search’s ever-infallible PR gurus to try and cook the Beyond Search’s goose. The goose, a nasty bird indeed, side-stepped the charging wunderkind and his hatchet.

Will Microsoft use the Fast Search Web indexing system for Live.com search?

Now that’s a good question. But it misses the point of the “perfect storm” analysis. To rip and replace the Live.com search requires some political horse trading within Microsoft and across the research and product units. Fast Search is arguably a better Web indexing system, but it was not invented at Microsoft, and I think that may present a modest hurdle for the Norwegian management wizards.

Read more

A Word’s Meaning Expanded: Microsoft’s Been Googled

May 4, 2008

It’s a Sunday morning in rural Kentucky. The animals have been fed. Mammon’s satisfied with the Kentucky Derby: victory and tragedy.

In the post-race excitement in Harrod’s Creek, I pondered the one-sided flood of postings on Techmeme.com and Megite.com. The theme was the collapse of Microsoft’s plan to thwart Google via a purchase of Yahoo. I’m no business wizard. The entire deal baffled me, but I found one aspect interesting.

As the most recognized brand in the world, the word “google” is the name of a company and it is a synonym for research. It’s a noun, and it’s a very handy way to tell someone how to find an answer; for example, a person tells another, “Just google that company”.

But, the meaning of the word “google” has another dimension. Permit me to explain this

As the Microsoft-set deadline ticked to zero hour. Yahooligans tried to find a way to thwart Microsoft’s intentions. Yahoo announced a “test” with Google for ad sales. Pundits picked up the idea, expanded it, and spiced it with legal shamanism. Yahoo’s executives hinted that working with Google would be interesting.

Google, on the other hand, maintained the Googley silence that makes competitors uncertain of Google’s intentions, Wall Street analysts crazy from hints and lava lamps, and insiders chuckle while chugging Odwalla smoothies.

However, behind the scenes Google and Yahoo decided to cooperate to an as-yet unknown degree in advertising sales.

In the 11th hour meeting in Redmond, Washington, Yahoo mentioned the “g” word. Microsoft’s appetite was spoiled. The meaning of the word “google” has been dilated.

Allow me to illustrate a unary version of this expansion: Yahoo “googled” Microsoft. The meaning is derived from the verb “google” which in this context means derailed Microsoft’s ambitions by utilizing an un-Machiavellian ploy: an advertising deal.

Thus, “Microsoft’s been googled” means that “Microsoft has been given the shaft” or “Microsoft has been thwarted” or “Microsoft has been hosed”.

Synonyms for “google” in this new meaning are screw, befoul, muck up, and toy with.

By extension, we can craft this statement: Google googled Microsoft. In this usage, Google (the company) managed in Googley ways to foul up the Yahoo acquisition. Colloquially, this becomes, “Dudes, Google got you again”.

Stephen Arnold, May 4, 2008

Poking around Google Scholar Service

May 3, 2008

In May 2005, I gave a short talk at Alan Brody’s iBreakfast program. An irrepressible New Yorker, Mr. Brody invites individuals to address a hand-picked audience of movers and shakers who work in Manhattan. I reported to the venue, zoomed through a look at Google’s then-novel index of scholarly information, and sat down.

Although I’ve been asked to address the group in 2006 and 2007, I was a flop. The movers and shakers were hungry for information related to search engine optimization. SEO, as the practice is called, specializes in tips and tricks to spoof Google into putting a Web site on the first page of Google results for a query and ideally in the top spot. Research and much experimentation have revealed that if a Web site isn’t on the first page of a Google results list, that Web site is a loser–at least in terms of generating traffic and hopefully sales.

I want to invest a few minutes of my time taking a look at the information I discussed in 2005, and if you are looking for SEO information, stop reading now. I want to explore Google Scholar. With most Americans losing interest in books and scholarly journals, you’ll be wasting your time with this essay.

Google Scholar: The Unofficial View of This Google Service

Google wants to index the world’s information. Scholarly publications are a small, yet intellectual significant, portion of the world’s information. Scholarly journals are expensive and getting more costly with each passing day. Furthermore, some university libraries don’t have the budgets to keep pace with the significant books and journals that continue to flow from publishers, university presses, and some specialized not-for-profit outfits like the American Chemical Society. Google decided that indexing scholarly literature was a good idea. Google explains the service in this way:

Google Scholar provides a simple way to broadly search for scholarly literature. From one place, you can search across many disciplines and sources: peer-reviewed papers, theses, books, abstracts and articles, from academic publishers, professional societies, preprint repositories, universities and other scholarly organizations. Google Scholar helps you identify the most relevant research across the world of scholarly research.

Google offers libraries a way to make their resources available. You can read about this feature here. Publishers, with whom Google has maintained a wide range of relationships, can read about Google’s policies for this service here. My view of Google’s efforts to work with publishers is quite positive. Google is better at math than it is donning a suit and tie and kow towing to the mavens in Manhattan, however. Not surprisingly, Google and some publishers find one another difficult to understand. Google prefers an equation and a lemma; some publishers prefer a big vocabulary and a scotch.

What a Query Generates

Some teenager at one of the sophisticated research firms in Manhattan determined that Google users are more upscale than Yahoo users. I’m assuming that you have a college education and have undergone the pain of writing a research paper for an accredited university. A mail order BA, MS, or PhD does not count. Stop reading this essay now.

The idea is that you select a topic for a short list of those provided by your teacher (often a graduate student or a professor with an expertise in Babylonian wheat yield or its equivalent). You trundle off to the dorm or library, and you run a query on the library’s research system. If your institution’s library has the funds, you may get access to Thomson Reuters’ databases branded as Dialog or the equivalent offerings from outfits such as LexisNexis (a unit of Reed Elsevier) or Ebsco Electronic Publishing (a unit of the privately held E.B. Stevens Company).

Google works with these organizations, but the details of the arrangements are closely-guarded secrets. No one at the giant commercial content aggregators will tell what its particular relationship with Google embraces. Google–per its standard Googley policy–doesn’t say much of anything, but its non-messages are delivered with great good cheer by its chipper employees.

So, let’s run a query. The ones that work quite well are those concerned with math, physics, and genetics. Babylonian wheat yields, I wish to note, are not a core interest area of the Googlers running this service.

Here’s my query today, May 3, 2008: kolmogorov theorem. If you don’t know what this canny math whiz figured out, don’t fret. For my purpose, I want to draw your attention to the results shown in the screen shot below:

kolmogorov results

Navigate to http://scholar.google.com and enter the bound phrase Kolmogorov Theorem.

As I write this, I am sitting with a person who worked for Gene Garfield, the inventor of citation analysis. He was quite impressed with Google’s generating a hot link to other scholarly articles in the Google system that have cited a particular paper. You can access these by clicking the link. The screen shot below shows you the result screen displayed by clicking on “Representation Properties of Networks”, the first item in the result list above. You can locate the citation link by looking for a phrase after the snippet that begins “Cited by…” Mr. Collier’s recollection of the citation analysis was that Dr. Garfield, a former Bronx cab driver with two PhDs, believed that probability played a major role in determining significance of journal articles. If a particular article were cited by reputable organizations and sources, there was a strong probability that the article was important. To sum up, citation that point to an article are votes. Dr. Garfield came up with the idea, and Messrs. Brin and Page were attentive to this insight. Mr. Page acknowledged Dr. Garfield’s idea in the PageRank patent document.

Read more

Real-Time Analysis: Truviso

May 3, 2008

Truviso’s PR engine pumped some output into my RSS reader this morning. The company, according to its Web site:

analyzes massive volumes of dynamic information–providing continuous visibility and automated reaction to any event, opportunity, or trend in real-time.

I was unfamiliar with the company, founded in 2005 and based in Foster City, California, a stone’s throw from San Mateo, 101, and the Third Avenue donut shop.

The company is in the low-latency data analysis business. What this means is that certain data about financial prices must be processed quickly. Truviso asserts that its technology supports thousands of concurrent queries on a single server. The company’s approach is to run its system in a distributed fashion across thousand of applications, databases, and other systems in the network.

Truvisio appears to be positioned as a business intelligence function. If you want to look at another approach to low-latency content processing, a visit to the Exegy Web site may be helpful. Exegy is profiled in my new study, Beyond Search, and has been identified as a content processing vendor to watch.

Stephen Arnold, May 4, 2008

Mobile Search: What Users Now Do

May 3, 2008

I reported on the update to Sergey Brin’s voice search patent earlier today. ClickZ (May 2, 2008) provided a bit of color for user search behavior on mobile devices. You can read–for a short time at least–ClickZ’s item is derived from Nielsen Mobile and Nielsen Online data in a report called “Total Web”. The summary of the data is here.

The first point I noted is that high-traffic Web sites benefit from mobile users. ClickZ’s “number” is a 13 percent increase in traffic. The absolute value is less important than the uptick. More mobile users translates into more traffic. That’s a good thing.

The second point is that mobile users have some specific mobile access content preferences. I found these data somewhat surprising but upon reflection, I think the ClickZ analysis makes sense. The five services used most frequently by mobile users are:

  1. Weather
  2. Entertainment
  3. Games
  4. Music
  5. Email.

The first three–weather, entertainment, and games–account for usage bumps of more than 20 percent. Music and email pump up usage by 15 percent and 11 percent respectively. Shopping on a mobile device is almost a non-starter.

Search returns a mere two percent increase in traffic. The questions that these data, if we assume them to be close enough for horseshoes, are [a] What’s the impact of voice search on mobile search? and [b] If voice search doesn’t goose usage from its miserable position, what happens to business models predicated on strong mobile advertising? It’s possible that voice will not improve search. After all, who wants to browse results on a tiny display? Voice may open new usage opportunities. Then the challenge becomes the one that has long-plagued online service providers–generating money from users who don’t want to pay for information unless it’s of the “must have” variety.

Stephen Arnold, May 3, 2008

FAQ: The Google Legacy and Google Version 2.0

May 2, 2008

Editor’s Note: In the last few months, we have received a number of inquiries about Infonortics’ two Google studies, both written by Stephen E. Arnold, a well-known consultant working in online search, commercial databases and related disciplines. More information about his background is on his Web site and on his Web log. This FAQ contains answers to questions we receive about The Google Legacy, published in mid-2005 and Google Version 2.0, published in the autumn of 2007.

Do I need both The Google Legacy and Google Version 2.0?

The Google Legacy provides a still-valid description of Google’s infrastructure, explanations of its file system (GFS), its Bigtable data management system (now partly accessible via Google App Engine), and other core technical features of what Mr Arnold calls “the Googleplex”; that is, Google’s server, operating system, and software environment.

Google Version 2.0 focuses on more than 18 important Google patent applications and Google patents. Mr Arnold’s approach in Google Version 2.0 is to explain specific features and functions that the Googleplex described in The Google Legacy supports. There is perhaps 5-10 percent overlap across the two volumes and the more than 400 pages of text in the two studies. More significantly, Google Version 2.0 extracts from Google’s investment in intellectual property manifested in patent documents more operational details about specific Google enabling sub systems. For example, in The Google Legacy, you learn about Bigtable. In Google Version 2.0 you learn how the programmable search engine uses the Bigtable to house and manipulate context metadata about users, information, and machine processes.

You can read one book and gain useful insights into Google and its functioning as an application engine. If you read both, you will have a more fine-grained understanding of what Google’s infrastructure makes possible.

What is the focus of Google Version 2.0?

After Google’s initial public offering, the company’s flow of patent applications increased. Since Google became a publicly-traded company, the flow of patent documents has risen each year. Mr Arnold had been collecting open source documents about Google. After completing The Google Legacy, he began analysing these open source documents using different software tools. The results of these analytic passes generated data about what Google was “inventing”. When he looked at Google’s flow of beta products and the firm’s research and development investments, he was able to correlate the flow of patent documents and their subjects with Google betas, acquisitions and investments. The results of those analyses are the foundation upon which Google Version 2.0 rests. He broke new ground in Google Version 2.0 in two ways: [a] text mining provides information about Google’s technical activities and [b] he was able to identify “keystone” inventions that make it possible for Google to expand its advertising revenue and enter new markets.

Read more

Brin Keeps on Inventing: Voice Interface for a Search Engine

May 2, 2008

On April 11, 2008, the USPTO issued US 7,366,668 B1, “Voice Interface for a Search Engine”. The patent is a continuation of US 7,027,987, filed in 2001 and granted in 2006. Among the inventors are Sergey Brin, one of Google’s founders.

There are some differences between the two patent documents. First, Figure 5 has been modified to explicitly identify inputs and interactions for:

  • A language model
  • A phonetic dictionary
  • Acoustic model
  • Query constraint parameters
  • Query term weights.

The number of claims has been increased, and there are wording changes that, based on my reading of the documents, do some word smithing like adding “Boolean” methods to claim seven and some major surgery such as expanding claim 12 to detail the voice system. You can download a copy of the “old” and “new” version of this invention from the USPTO. Tip: use this system in off-peak hours. Like many US government Web sites, the infrastructure is often unable to cope with traffic.

Three observations are warranted:

First, I recall AT&T’s and Verizon’s telling me that their analysts had a good sense of what Google was doing in the telecommunications market. I also recall that Google’s thrust and feint strategy generated some angst during the recent spectrum auction. As I recall, Verizon has indicated that it would “open” its system to some degree. Since this voice search invention dates from 2001, it’s reasonable to assume that management at AT&T, Verizon, and other telecommunications companies have been monitoring Google’s telco inventions over the last seven years.

Second, search on mobile devices is different from search on notebook computers or mobile devices with larger key boards. The obvious way to make search work on a mobile device is to allow a person to talk to the device. Google’s been grinding away at this problem for a number of years, and it is possible that Google will move more aggressively on this front. An “update” is nothing new in software; an update in a core technical invention like voice search is somewhat more significant.

Finally, the changes to the system and method for voice search seem to make more explicit Google’s willingness to expose some of its computational intelligence techniques. Since 2005, the company’s inventions have included algorithms that work up and down to arrive at a “good enough” value. Other inventions have disclosed an inside-outside method such as the one disclosed in US 7,366,668 B1. If you are a fan of smart software, this voice-search invention is worth some of your time.

My research for Google Version 2.0 documented that when Google fiddles with an invention, the company is not performing random, pointless legal work. Google, despite the purple bean bags and lava lamps, is far from casual in its approach to engineering. I almost deleted this post because US telecommunication companies are better than James Fenimore Cooper’s matchless tracker Chingagook when it comes to things that are Googley.

Stephen Arnold, May 3, 2008

IBM and Google: Replay of IBM and Microsoft?

May 2, 2008

Dan Farber, an outstanding journalist, reported “The IBM-Google Connection”. You need to read this story yourself before it becomes harder to find on the sprawling CNet / News.com Web site. The story describes a burgeoning relationship between the $16 billion Internet advertising company and the $100 billion White Plains, New York, company.

At an IBM Business Partner Leadership Conference in Los Angeles, California, on May 1, 2008, Sam Palmisano (IBM CEO) “chatted” with Eric Schmidt (Google CEO). According to Mr. Farber, the two chief executives touched upon these topics:

  • A joint research project in cloud computing; that’s delivering services and information somewhere on the network, not from software installed on a PC under your desk or from a server room down the hall
  • IBM’s software division and business partners are integrating Google applications and widgets into custom software solutions based on IBM’s “development framework”.

Mr. Farber, with customary acumen, snagged some sound bites; for example: “IBM is one of the key planks of our strategy–otherwise we couldn’t reach enterprise customers,” said Mr. Schmidt.

And, “It is the first time we have taken something from the consumer arena and applied it to the enterprise,” said Mr. Palmisano.

Mr. Farber identifies other areas in which IBM and Google are on the same page, are in sync, and sing from the same hymnal.

A source in Washington, DC, provided me with a copy of a letter sent by IBM to a quasi-government entity in February 2008, in which IBM opines that it understands better than most what Google’s strategy is. I quote from the letter sent in a series of communications to IBM about Google’s ability to morph from friend to competitor with agility. The author of this letter is a senior executive at IBM, and the recipient is a former US government official, now working as a contractor to a branch of the US Federal government:

As a technology and services provider, IBM has relationships with Google that provide us with additional insights into their business. We have studied their technology and actions and, as a result, do not agree with the severity of the threat assessment you [the Washington, DC-based author of earlier communications to IBM] document in your letters. We do not plan to take any additional action at this time.

IBM’s track record with up-and-coming companies has an interesting entry: the deal with Microsoft for an operating system for the personal computer. If Google is the operating system for the cloud, has IBM found a way to remediate its interesting decision in the early 1980s with this Google relationship?

IBM thinks so. Google talks but provides little more than Googley generalizations. Microsoft–the subject of the discourse–isn’t part of the conversation. My hunch is that IBM is certainly confident in its ability to deal with Google. Executive hubris is a constant in some large companies.

I can imagine IBM’s top brass saying, “We have these Google guys right where we want them.”

Stephen Arnold, May 2, 2008

Update: CIO has more on IBM’s plans here. A happy quack to Mr. Arenstein for this link.

Searchenstein: Pensée d’escalier

May 1, 2008

At the Boston Search Engine Meeting, I spoke with a certified search wizard-ette. As you know, my legal eagle discourages me from proper noun extraction in my Web log essay. This means I can’t name the person, nor can I provide you with the name of her employer. You will have to conjure a face less wizard-ette from your imagination. But she’s real, very real.

Set up: the wizard-ette wanted to ask me about Lucene as an enterprise search system. But that was a nerd gambit. The real question was, “Will I be able to graft an add on to perform semantic processing or text mining system on top of Lucene and make the hybrid work?”

The answer is, “Yes but”. Most search and content processing systems are monsters. Some are tame; others are fierce. Only a handful of enterprise search systems have been engineered to be homogeneous.

I knew this wizard-ette wasn’t enthralled with a “yes but”. She wanted a definitive, simple answer. I stumbled and fumbled. Off she drifted. This short essay, then, contains my belated pensée d’escalier.

What Is a Searchenstein?

A searchenstein is a content processing or information access system that contains a great many separate pieces. These different systems, functions, and sub systems are held together with scripts; that is, digital glue or what the code jockeys call middleware. The word middleware sounds more patrician than scripts. (In my experience, a big part of the search and retrieval business reduces to word smithing.)

Searchenstein is a search and content processing system cobbled together from different parts. There are several degrees of searchensteinism. There’s a core system built to a strict engineering plan and then swaddled in bastard code. Instead of working to the original engineering plan, the MBAs running the company take the easier, cheaper, and faster path. Systems from the Big Three of enterprise search are made up of different parts, often from sources that have little knowledge or interest in the system onto which the extras will be bolted. Other vendors have an engineering plan, and the third-party components are more tastefully integrated. This is the difference between a car customization by a cash-strapped teen and the work of Los Angeles after market specialists who build specialized automobiles for the super rich.

searchenstein

This illustration shows the body parts of a searchenstein. In this type of system, it’s easy to get lost in the finger pointing when a problem occurs. Not only are the dependencies tough to figure out, it’s almost impossible to get one’s hand on the single throat to choke.

Another variant is to use many different components from the moment the company gets in the search and content processing business. The complexities of the system are carefully hidden, often in a “black box” or beneath a zippy interface. You can’t fiddle with the innards of the “black box.” The reason, according to the vendor, may be to protect intellectual property. Another reason is that the “black box” is easily destabilized by tinkering.

Read more

BI: The Cat’s Pajamas

May 1, 2008

ITBusiness.ca has a useful discussion of the market for business intelligence. “BI”, as the cognoscenti prefer it, is an umbrella term with fuzzy edges. Business intelligence is the corporate version of military or “real” intelligence; that is, obtaining and analyzing information in order to gain an advantage. There’s a Society for Competitive Information Professionals that promulgates guidelines for the conduct of “intelligence”by organizations. Cognos, now part of IBM, stressed its software’s usefulness for BI. The same leitmotif has been riffed by some search and content processing vendors. But the music of money clattering into the sales tills has been increasing. Now, many vendors in content management to enterprise customer support systems are chanting, “BI, BI, BI”.

I urge you to read the ITBusiness.ca article. Vawn Himmelsbach, the author, provides a useful case about BI’s payoff for a tire company. There’s a good discussion of BI dissolving into other, presumably more easily understood enterprise applications. Even the nemesis of data aggregation gets mentioned, a rarity in most cheerleading about “intelligence” whether SCIP-like or the somewhat more free-form intelligence practiced by government entities. If you are looking for some ideas to help you sell, “intelligence”, the article includes some useful pointers. My hunch is that consultants will find this statement particularly bracing:

For the channel, the opportunity lies in professional services, says Rowe. If you look at the professional services to license ratio, it’s usually a one-to-one ratio up to a three-to-one ratio, meaning if they sell a $50,000 BI license, they can get another $50,000 to $150,000 in professional services.

For more information about business intelligence, check out the white papers on the SAS Institute’s Web site. You can ignore the one I wrote, but the others are quite good.

Stephen Arnold, May 1, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta