Microsoft Powerset: Is There a Role for Amazon?

June 27, 2008

On May 10, 2008, I offered some thoughts about Microsoft’s alleged interest in Powerset. You can find this bit of goose quacking here.

In case you missed the flurry of articles, essays, and opinion pieces, more rumors of a Microsoft Powerset tie up are in the wind. Matt Marshall ignited this story with his write up “Microsoft to Buy Semantic Search Engine Powerset for $100 Million Plus”. You must read this here. The most interesting statement in the essay is:

Google has generally dismissed Powerset’s semantic, or “natural language” approach as being only marginally interesting, even though Google has hired some semantic specialists to work on that approach in limited fashion.

My research for BearStearns last year revealed that Google has more than “some specialists” working on semantic issues. Alas, that document “Google’s Semantic Web: the Radical Change Coming to Search
and the Profound Implications to Yahoo! & Microsoft” is no longer easily available. There is some information about the work of Dr. Ramanathan Guha in my Google Version 2.0 study, but the publisher insists on charging people for the analysis of Dr. Guha’s five patent applications. Each of these comes at pieces of the semantic puzzle in quite innovative ways. If Dr. Guha’s name does not ring a bell, he worked on the documents that set forth the so-called Semantic Web.

So, Google is–according to this statement by Mr. Marshall not too keen on Powerset-style semantics. I agree, and I will get to the reasons in the Observations section of this essay.

The story triggered a wave of comments. You can find very useful link trails at Techmeme.com and Megite.com. The one essay you will want to read is Michael Arrington’s “Microsoft to Buy Powerset? Not Just Yet.” By the time you read this belated write up, there will be more information available. I enjoy Mr. Arrington’s writing, and his point about the Powerset user interface is dead accurate. We must remember that user’s are creatures of habit, and the user community seems to like type a couple of words, hitting the enter key, and accepting the first three or four Google results as pretty darn good.

powerset-233x300

Semantic technology is very important. Martin White and I are working on a new study, and at this point it appears that semantic technology is something that belongs out of site. Semantic technology can improve the results, but like my late grandmother’s girdle and garters, the direct experience is appropriate only for a select few. Semantic technology seems to share some similarities with this type of best-left-unseen experience from my childhood.

An Amazon Connection?

My interest in a Microsoft Powerset deal pivots around some information that I believe to have a kernel of truth buried in it. Earlier this year, I learned the Microsoft had a keen interest in Amazon’s database technology. Actually, the interest was not in the Oracle database that sites, like a black widow spider in the center of a Web, but in the wrapper that Amazon allegedly used to prevent direct access to the Oracle tables from creating some technical problems.

Amazon had ventured into new territory, tapping graduate students from the Netherlands, open source, specialist vendors, and internal Amazon wizards to build its present infrastructure. Amazon has apparently succeeded in creating a Google-like infrastructure at a fraction of the cost of Google’s own infrastructure. Amazon also has fewer engineers and more commercial sense than Google.

In the last 18 months, Amazon has pushed into cloud computing, Amazon Web services, and jump starting a wide range of start ups needful of a sugar daddy. I recently wrote about Zoomii.com, one innovator surfing on the Amazon Web services “wave”. You can read that essay here.

Microsoft needs a NASCAR engine for its online business. Microsoft is building data centers. But compared to Amazon and Google, Microsoft’s data centers are a couple of steps behind, based on my research work.

At one meeting in Seattle, I heard that Microsoft was “quite involved” with Amazon. When I probed the speaker for details, the engineer quickly changed the subject.

Powerset–if my sources are correct (which I often doubt)–is using Amazon Web services for some its processing. If true, we have an interesting possibility that Microsoft may be pulled into an even closer relationship with Amazon.

I am one of the people who thought that Microsoft would be better able to compete in the post-Google world if Microsoft bought Amazon. Now let me get to my thinking, and, as always, I invite comments. First, Microsoft would gain Amazon’s revenue and technical know how. Arguably these assets could provide a useful platform for a larger presence in the online world.

Second, Microsoft gains the cloud-based infrastructure that Amazon has up and running. From my point of view, this approach makes more sense than trying to whip Windows Server and SQL Server into shape. The Live.com services could run on Amazon or, alternatively, the whopping big Microsoft data centers could be used to provide more infrastructure for Amazon. An added benefit is that Microsoft–despite its spotty reputation for engineering–seem to me to be more disciplined than Amazon’s engineers. I have heard that Amazon pivots on teams that can be fed with a pizza. While good for the lone ranger programmers, the resulting code can be tough to troubleshoot. Each team can do what it needs to do to resolve a problem. The approach may be cheaper in the short run, but in my opinion, may create the risk of a cost time bomb. A problem can be tough to troubleshoot and then fix. Every minute of downtime translates to a loss in credibility or revenue.

Third, the Powerset technology is going to require some robust computing infrastructure to scale, refresh indexes, and serve “answers” (not result lists) to users. I have seen the Wikipedia demonstration, but indexing Wikipedia is not indexing the Web. I am not knocking Powerset in particular because, in my experience, any of this next-generation semantic technology can be a resource piggy. In the late 1990s, I had some experience with a semantic system that required $20 million in servers to support four simultaneous users. Don’t believe it, do you? Well, scaling issues may be why semantic technology has not set the world on fire. Inxight Software, a Xerox PARC content processing company, ended up as a unit of SAP. Inxight’s technology was nifty, but semantic processes are–shall we say–challenging in many ways.

Powerset technology’s indexing 30 billion plus static Web sites and an equal or larger number of dynamic Web sites will need some serious computing resources. Refreshing the index won’t be trivial either. Oh, there is query processing to consider as well. So, Microsoft may find itself looking for ready-to-run infrastructure to scale Powerset. Building data centers is expensive and slow work. Branded server gear takes weeks, sometimes month to arrive after an order is placed. Scaling has to be economical and quickly available.

Finally, Google is becoming more proactive, and I think moves by Microsoft will be countered in two ways: [a] more transparency from Google executives that say, “Look, we are really advanced and the competitors are less advanced”; and [b] direct innovations such as the ramp up in advertising, developer activities, and search features like “ig” or individualized Google.

If the Microsoft Powerset deal goes through, Google will accelerate the release of its semantic features. My research suggests that Google can deliver a lot more semantic bang that it now offers. Keep in mind that Google’s approach is to keep semantic technology as a function that supports new services and features. Once Googzilla begins to lumber forward, Microsoft will have to kick Powerset into high gear; otherwise, Google will continue to move ahead, widening its lead over Microsoft in the Internet world.

Please, keep in mind that I am sharing thoughts from my mined out domicile in one of Kentucky’s many hollows. I am not suggesting that anyone other than my mongrel dog shares my views.

Observations

Let me wrap up this essay, with these observations:

  1. Microsoft is going to buy companies, and when it does, Microsoft must deal with integration, scaling, and culture. My hunch is that scaling data centers running Windows Server is going to be expensive and complicated. I don’t have an opinion about integrating non-Microsoft technology with Microsoft technology until I see more of the Microsoft Fast Search product line. Culture, I don’t know anything about that either. Two search factors–cost and complexity–mean that Microsoft has to outspend Google to draw even with Google and then find ways to leap frog Google.
  2. Powerset has an indexing approach that can, like Hakia’s, be a resource hungry application. Powerset is going to need servers, bandwidth, and money–lots of money to deliver on its semantic approach to search.
  3. Amazon, not Yahoo, may be a more attractive acquisition target at this time. Amazon has engineers close to the Microsoft campus. Amazon has plumbing that delivers Web services. Amazon has its fingers on the pulse of a number of interesting cloud-based companies. Amazon has developers, customers, and a growing business. Microsoft could find itself needful of a Powerset plus Amazon combo.

I will continue to think about the implications of an alleged Microsoft Powerset tie up.

Stephen Arnold, June 27, 2008

Comments

One Response to “Microsoft Powerset: Is There a Role for Amazon?”

  1. Jonathan on April 28th, 2009 3:55 am

    Thanks for this post. I have been doing some research about Powerset lately, and the informations you give are among the best on how to put them in context with semantic search generally.
    Today, almost a year after the acquisition, would you have the same thoughts?

  • Archives

  • Recent Posts

  • Meta