Microsoft’s Web Search Strategy Revealed: The Scoble Goldberg Interview

June 16, 2008

Online video does not match my mode of learning. Robert Scoble, a laurel leaf wwearerin the new world of video and text Web logs, conducted an interview with Brad Goldberg.

The interview is part of the Fast Company videos, and it is available here. The interview is remarkable, and I urge you to spend 31 minutes and listen to Brad Goldberg, General Manager of Microsoft Search Business Group.

The interview reveals useful information about the time line for Microsoft to capture market share fro9m Google and Microsoft’s ideas for differentiating itself from Google in Web search.

Surprisingly, there were no references that I could pick up to enterprise search, nor was there any indication that Mr. Goldberg was aware of the Fast Search & Transfer Web search technology which was quite good. As you may know, Fast Search withdrew from Web search in 2003, selling its AllTheWeb.com Web index to Overture. Yahoo gobbled Overture and used bits and pieces of the Fast Search technology recently. The “auto suggest” feature is still available from Yahoo’s AllTheWeb.com site. My tests suggest that today’s AllTheWeb.com uses the Yahoo Search index built by the Slurp crawler and the Fast Search technology for some of the bells and whistles on the site. The news search function is actually quite useful. If you are not familiar with it, you can try it here.

During the interview, Mr. Goldberg uses some sample queries to illustrate his claims about Live.com’s search performance, precision, and recall. I ran the “Paris” query on each of these systems, and I ran comparative queries on this Web log as well. After the interview, I took a look at the 2005 analysis of mainstream Web search systems here so I could gauge how much change has taken place in the last three years. Quick impression: Not much. You may want to perform similar as-you-listen tests. It is easy to see what search system responds most quickly, how the search results differ, and the features that each system makes available.

Three points in Mr. Goldberg’s remarks stuck in my mind. I want to mention each of these and then offer a few observations. Judging from the edgy comments to some my essays, I want you to know that you may not agree with me. That’s okay with me. Please, use the comments section to set me straight. Providing some facts to go along with your push back is helpful to me.

Key Points for Me

1. Parity or Microsoft’s Relevancy Is As Good as Google’s

Mr. Goldberg asserted that the major search services were at parity in terms of relevance and coverage. I found this notion somewhat difficult to comprehend. The data about Web search market share undermines any argument about parity which means, according to my understanding of the word “equality” or “equivalence”. I have had difficulty interpreting comments by whiz kids before, so I may be off base. My thought was that Google continues to gain market share at the expense of both Microsoft and Yahoo. The dis-parity is significant because Google, according to data mavens, accounts for 60 percent of more of user queries in the US. In Europe, the market share is higher. US search systems do not hold commanding leads in China, Korea, and other Eastern markets.

Should parity mean visual appearance, yes, Microsoft is looking more like Google. Here is the result of one of my test queries: “real estate baltimore maryland”.

googlesearch live search

On the surface these look alike. Closer inspection reveals that Google includes a canned form so I can narrow my result by location and property type. Google eliminates a step in looking for real estate in Baltimore. Microsoft’s result does not offer this feature, preferring to show “related searches”. I like the Google approach. I don’t make much use of machine-generated related queries. I have specialized tools to discern relationships in result sets.

If parity refers to pages indexed, I am also baffled. My estimates based on sample queries, data provided by the companies, and the numbers of hits reported for my sample queries suggests that Google has in its index upwards of 32 billion pages. Microsoft’s index is larger than the eight billion pages I estimated in early 2007, but the index is smaller than the 20 billion figure Mr. Goldberg asserted. You can see the disparity yourself if you navigate to USA.gov, which uses the Microsoft index plus some Vivisimo specialized indexing and run the query “ECCS”, which is a specialized term in nuclear power generation. Then run the same query on Google’s government index here. I preferred Google’s results. Neither Microsoft’s nor Google’s results were comprehensive because neither service spiders the entire array of publicly-available Web sites at the Department of Energy. But you can easily run your own queries, and if you find an area where Microsoft Live.com is deeper than Google, please, let me know. Send along your query terms too. Note: Yahoo’s and the AllTheWeb.com indexes are in the middle ground between Google and Microsoft.

To sum up, I can’t buy the parity assertion either on market share or size of the index.

2. Laundry Lists and Innovation

Second, Mr. Goldberg suggested that Google has not changed its results display in a decade. In fact, if I understood him correctly, he suggested that Google’s sticking with the laundry list approach to search results was stifling innovation in search. Here’s the results I saw today (June 16, 2008) when I ran the “Paris” query referenced in the interview.

paris google paris msft

The visual similarity is evident. The laundry list is alive and well on Google’s and Live.com’s result pages. The key differentiator is the inclusion of related search in the Live.com results list. Google’s map, however, seems more useful to me because related searches appear above the map and the inclusion of an address box is a nice feature for me. I get lost in Paris, maybe intentionally, with amazing frequency. Paris Hilton has fallen from favor at Google. She no longer appears at the top result.

My research into Google has revealed a wide range of interfaces. These range from the “ig” portal approach that you can experience here to the dossier format that I included in my keynote at the Buying and Selling eContent conference in April. Google also has distinct interfaces for health and air schedule queries. You can see these yourself by navigating to Google and entering these queries: [a] lga sfo schedule and [b] back pain. Notice that two different result lists appear. Google has a fancy name for this embedded search function. I am pleased to have it save me a few clicks.

Yahoo and AllTheWeb.com continue to use results displays that are quite similar to Google’s. In fact, when I ran the “Paris” query Mr. Goldberg gave as an example, Microsoft’s Live search results were remarkably similar to Google’s. If you run this query on alternative systems such as Exalead.com, you see a similar approach to results. The unique result displays may be found on Cluuz.com or Silobreaker.com. The mainstream search engines copy Google.

It follows, then, that Google experiments with different results displays. Google reviews the user reaction to these alternative displays, and then Google goes back to its laundry list approach. Why? Obviously the laundry list works. If it did not, Google’s market share would be decreasing relative to competitors with snazzier interfaces. Google’s market share is growing, so suggesting that a laundry list does not work very well or that it stifles innovation is to me sort of wacky.

Also, Microsoft emphasizes user experience, which is Apple territory. User experience sounds great, but it can be less filling than a solid blend of functionality, plumbing, and useful features. I think a user experience is not a flashy interface. I think the user experience is driven by a search system’s ability to:

  • Index content that is new or changed quickly
  • Display relevant results on the first page of the hit list so I don’t have to press the page down key to see more results
  • Introduce new features without introducing latency into the system; for example, display maps for city queries, hot links to air line flights, and clustered results for medical searches or a query such as “real estate Baltimore Maryland”

Plumbing

This brings us to plumbing or in the data center world, “iron” and how to build data centers without going broke, suffering outages (poor Amazon!), and having to spend more money than Google to deliver comparable performance.

Mr. Goldberg asserts that Microsoft knows how to scale, is into search for the long haul, and has the resources to reverse the dominance of Google. Maybe. I know that Google has plumbing in place, continues to upgrade it, and keeps a close watch on costs. notably the ones that helped sink the AltaVista.com, now owned and operated by Yahoo.

Microsoft has yet to demonstrate that it can indeed scale. One source at Microsoft told me that his access to computing resources was strictly limited due to a scarcity of certain computing resources. I see latency when I use a Microsoft’s Live.com image search and scroll down the “infinite page”. The latency is sufficiently severe to cause me to abandon the query and rerun it on another vendor’s image search system.

In my poking around the inscrutable Google, I have not heard from my Google sources that developers face resource restrictions. I have heard tht Microsoft relies on Akamai for caching, and for some queries, Live.com is faster at displaying results than Google. Microsoft also acquired Savvis in order to get “edge expertise” to find ways to accelerate response time to queries. Caching is standard in the search game, but I saw a diagram that showed the Microsoft online architectures as a four-layer set up. Caching seemed to be used to reduce demand on the Microsoft server and database systems which are not sufficiently scalable to deliver the response time needed to compete with Google. Maybe Microsoft has cloned the functionality of the Google File System, the Bigtable data management system, and the libraries that take some of the burden off programmers who create massivley parallel applications that run on the Google infrastructure? I have yet to be convinced that Microsoft’s data centers can handle the scale at which Google operates. If you have data to clarify my thinking, please, use the comments section of this Web log to share the information.

Observations

I have been upfront about my uncertainty with regard to the speed, economy, and flexibility of the Microsoft Web search infrastructure. I think that Google’s market share makes it tough for me to accept the parity assertion. I see disparity. And, I have some problems with the idea that Google is killing innovation. For the last five years, Microsoft has been duplicating Google, not leap frogging Google. The timing of Microsoft’s search innovations is simple to work out. Google does something and Microsoft responds. So far that strategy has only allowed Google to increase it share of the Web search market. Microsoft has to find a way to get ahead of Google, not imitate Google with a less efficient and possibly more expensive to operate infrastructure.

Several other observations are warranted:

  1. I cannot understand why the Live.com search team does not pay attention to the Fast Search & Transfer Web indexing technology. The Overture deal did not transfer the Fast Search intellectual property for search. Fast Search’s Web indexing and search works. Compare results on AllTheWeb.com and Live.com search. The Fast Search relevance seems as good as if not better than that of Live.com. I like the Livesearch feature which reduces typographic errors for many users.
  2. Google operates from a reasonably coherent code base. The company has code inconsistencies and its share of technical glitches. My research suggests that Google’s different “flavors” of search come from the same can of digital tuna. Google’s technology chefs can cook up different digital services, but the same tuna is in these different entrées. Microsoft, based on my understanding of the company’s technology, lacks the “same can of tuna” approach in search. There is search in SQL Server. There are two types of SharePoint search, MOSS and ESS. There is the Live.com search. There is search in Dynamics CRM and so on. The complexity and cost of so many different search technologies is out of step with Google’s approach, which I admit is not perfect, but Google is the company with the market share and the tail wind.
  3. The notion of habitual behavior in online search is not well understood. There has been some chatter about the ease with which a person can jump to a different search engine with a click. My view is that once an online behavior is set, many users resist change. Microsoft’s grip on the brower software has not been sufficiently powerful to prevent Google’s dominance of the market. Now Mr. Goldberg wants me to believe that he can turn around a decade of losing teams. Resorting to the ploy of paying users to run queries on Live.com system tells me that Microsoft is running out of options in search.

I am neither for or against either Google or Microsoft. I found the interview with Brad Goldberg interesting because it helped me assess Microsoft’s present thinking about Web search. Let me know your thoughts.

Stephen Arnold, June 18, 2008

Comments

2 Responses to “Microsoft’s Web Search Strategy Revealed: The Scoble Goldberg Interview”

  1. Lewis Shepherd on June 17th, 2008 9:18 am

    Great article, and great interview (in a great series), thanks for posting. By the way, my impression on FAST is that Microsoft has had to wait patiently until the legal approvals were all final before beginning any true integration work, but that a great amount of planning has gone on internally.

    But a couple of minor quibbles:

    “Mr. Goldberg asserted that the major search services were at parity in terms of relevance and coverage” – but then you measure parity by market-size and index size. What about relevance? Admittedly that requires more of a technical review but you could take a shot at an impressionistic “user-experience” assessment of relevance. In my experience they’re equivalent, with a slight edge to Live because of Google’s encroaching ads.

    On coverage (index-size), why did you go to a third-party site (gov) as a proxy for Live? Why not just try this: search on Google and on Live.com for the word “sex” … one of the more prevalent single-term queries. Or try the term “free” another very prevalent term, or the word “web”. Depending on the query, Live seems to hold its own, indicating their index size is certainly closer to Google than you portray.

    thanks again!

  2. Ross on June 17th, 2008 9:19 am

    You are a little bit off here in one of the things you said. Microsoft didn’t acquire SAVVIS but they simply bought outright data centers that it was leasing from SAVVIS. They are still two very separate companies.

    Ross
    http://www.hostdisciple.com

  • Archives

  • Recent Posts

  • Meta