Big Data, Big Implications for Microsoft

July 17, 2009

In March 2009, my Overflight service picked up a brief post in the Google Research Web log called “The Unreasonable Effectiveness of Data.” The item mentioned that three Google wizards wrote an article in the IEEE Intelligent Systems journal called “The Unreasonable Effectiveness of Data.” You may be able to download a copy from this link.

On the surface this is a rehash of Google’s big data argument. The idea is that when you process large amounts of data with a zippy system using statistical and other mathematical methods, you get pretty good information. In a very simple way, you know what the odds are that something is in bounds or out of bounds, right or wrong, even good or bad. Murky human methods like judgment are useful, but with big data, you can get close to human judgment and be “right” most of the time.

When you read the IEEE write up, you will want to pay attention to the names of the three authors. These are not just smart guys, these are individuals who are having an impact on Google’s leapfrog technologies. There’s lots of talk about Bing.com and its semantic technology. These three Googlers are into semantics and quite a bit more. The names:

Alon Halevy, former Bell Labs researcher and the thinker answering to some degree the question, “What’s after relational databases”?”
Peter Norvig, the fellow who wrote the standard textbook on computational intelligence and smart software
Fernando Pereira, the former chair of Penn’s computer science department and the Andrew and Debra Rachleff Professor.

So what do these three Googlers offer in their five page “expert opinion” essay?

First, large data makes smart software smart. This is a reference to the Google approach to computational intelligence.

Second, big data can learn from rare events. Small data and human rules are not going to deliver the precision that one gets from algorithms and big data flows. In short, costs for getting software and systems smarter will not spiral out of control.

Third, the Semantic Web is a non starter so another method – semantic interpretation – may hold promise. By implication, if semantic interpretation works, Google gets better search results plus other benefits for users.

Conclusion: dataspaces.

See Google is up front and clear when explaining what its researchers are doing to improve search and other knowledge centric operations. What are the implications for Microsoft? Simple. The big data approach is not used in the Powerset method applied to Bing in my opinion. Therefore, Microsoft has a cost control issue to resolve with its present approach to Web search. Just my opinion. Your mileage may vary.

Stephen Arnold, July 17, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Google, Microsoft, News, Online (general), Search, Semantic, Text analytics, Text processing

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.