Microsoft and Big Data
October 21, 2009
Short honk A happy quack to the reader who sent me information about a free book from Microsoft. The book is The Fourth Paradigm: Data-Intensive Scientific Discovery. I downloaded the file and whipped through the 270-page volume edited by Tony Hey, Stewart Tansley, and Kristin Tolle. The book is a collection of essays by Microsoft professionals and experts. The book is dedicated to Jim Gray, who joined Microsoft from Digital Equipment. I cited some of Jim Gray’s work in my three part series about Microsoft’s data center architecture. The essays are grouped by discipline; for example, health and medicine, number crunching for scientific research, and communication, for example.
If this book were offered for sale, I would buy it. I think it is a useful look at big data.
I made several notes on a placemat in the restaurant in which the goslings and I ate today. I offer these as my opinion, so you may disagree:
- The DEC influence struck me as easy to spot. In fact, some of the lingo and the examples reminded me of Google’s technical papers. Google had a strong DEC flavor in its early days. I wonder if this influence was intentional, accidental, or an indication that the work of the those behind Alta Vista has reached outside the world of search and retrieval.
- The big data theme is one reason why architectures that don’t scale are likely to disappoint users and stakeholders. As I scanned the pages, I wondered in the Microsoft technology is going to be affordable and sufficiently flexible to handle the wide range of applications the Microsoft researchers and industry experts reference in their essays.
- The notion of big data and computational pervasiveness in many disparate fields triggered in my mind questions about “smart” software. Some of the work referenced cannot be performed by traditional methods. The need for “intelligent agents” became evident to me. I don’t think of Microsoft’s technology as being focused on making smart software. Today I had to move a Word 2007 document to my Mac in order to get a 2007 Word feature to work.
If I were back in grade school and writing a book report, I would label the work “Googley”. If my teacher were a Microsoft employee, I would probably get an F. Writing about big data is not the same as manipulating at scale big data. Microsoft wants to make clear that it is in the big data game. I have an open mind.
Stephen Arnold, October 21, 2009