Big Data, Big Implications for Microsoft
July 17, 2009
In March 2009, my Overflight service picked up a brief post in the Google Research Web log called “The Unreasonable Effectiveness of Data.” The item mentioned that three Google wizards wrote an article in the IEEE Intelligent Systems journal called “The Unreasonable Effectiveness of Data.” You may be able to download a copy from this link.
On the surface this is a rehash of Google’s big data argument. The idea is that when you process large amounts of data with a zippy system using statistical and other mathematical methods, you get pretty good information. In a very simple way, you know what the odds are that something is in bounds or out of bounds, right or wrong, even good or bad. Murky human methods like judgment are useful, but with big data, you can get close to human judgment and be “right” most of the time.
When you read the IEEE write up, you will want to pay attention to the names of the three authors. These are not just smart guys, these are individuals who are having an impact on Google’s leapfrog technologies. There’s lots of talk about Bing.com and its semantic technology. These three Googlers are into semantics and quite a bit more. The names:
- Alon Halevy, former Bell Labs researcher and the thinker answering to some degree the question, “What’s after relational databases”?”
- Peter Norvig, the fellow who wrote the standard textbook on computational intelligence and smart software
- Fernando Pereira, the former chair of Penn’s computer science department and the Andrew and Debra Rachleff Professor.
So what do these three Googlers offer in their five page “expert opinion” essay?
First, large data makes smart software smart. This is a reference to the Google approach to computational intelligence.
Second, big data can learn from rare events. Small data and human rules are not going to deliver the precision that one gets from algorithms and big data flows. In short, costs for getting software and systems smarter will not spiral out of control.
Third, the Semantic Web is a non starter so another method – semantic interpretation – may hold promise. By implication, if semantic interpretation works, Google gets better search results plus other benefits for users.
Conclusion: dataspaces.
See Google is up front and clear when explaining what its researchers are doing to improve search and other knowledge centric operations. What are the implications for Microsoft? Simple. The big data approach is not used in the Powerset method applied to Bing in my opinion. Therefore, Microsoft has a cost control issue to resolve with its present approach to Web search. Just my opinion. Your mileage may vary.
Stephen Arnold, July 17, 2009