Microsoft.com in 1999

July 12, 2008

In my previous essay about Jim Gray’s Three Talks in 1999, I mentioned that he and his team had done an excellent job of summarizing trends in data center design, online infrastructure options, and cost analysis of power and name brand hardware. If you have not read that essay, I invite you to take a look at it here. You may want to download the Power Point here. The document does not carry a copyright mark, but I am reluctant to post it for my readers. Please, keep in mind that Microsoft can remove this document at any time. One of the baseline papers referenced in this 1999 Three Talks document is no longer available, and I have a resource working on tracking it down now.

I invite you to look at this diagram. I apologize for the poor quality of the graphic, but I am using an image in Mr. Gray’s 1999 presentation which has been crunched by the WordPress program. I will make some high level observations, and you will be able to download the 1999 PowerPoint and examine the image in that document.

gray diagram 1998

I want to keep the engineering jargon to a minimum. Half of my two to four Web log regulars are MBAs, and I have been asked to clarify or expand on a number of technical concepts. I will not provide that “deep dive” in my public Web log. Information of that type appears in my for-fee studies. If this offends you, please, stop reading. I have to make a decision about what is placed on the Web log as general information and what goes in the studies that pays for the blood sucking leeches who assist me in my research.

The Diagram: High-Level Observations

The set up of Microsoft.com in 1999–if Mr. Gray’s diagram is accurate–shows islands of two types. First, there are discrete data centers; for example, European Data Center, Japan Data Center, and Building 11. Each of these appear to be microcosms of the larger set up used in North America. European and Japan Data Centers are identical in the schematic. I took this to mean that Microsoft had a “cookie cutter” model. This is a good approach, and it is one used by many online services today. Instead of coming up with a new design for each data center, a standard plan is followed. Japan is connected to the Internet with a high speed OC3 line. The European Data Center connection is identified as Ethernet. When you print out Mr. Gray’s Three Talk presentation, you will see that details of the hardware and the cost of the hardware is provided. For example, in the Japan Data Center, the SQL Server cluster uses two servers with an average cost of $80,000. I know this number seems high, but Microsoft is using brand name equipment, a practice which the material I have reviewed suggests continues in 2008.

Second, there is a big island–a cluster of machines that provide database services. For example, there are “Live SQL Servers” with an average cost of $83,000, SQL Consolidators at a cost of $83,000, a feeder local area network to hook thee two SQL Server components together. I interpret this approach as a pragmatic way to reduce latency when hitting the SQL Server data stores for reading data and to reduce the bottlenecks that can occur when writing to SQL Server. Appreciate that in 1999, SQL Server lacked many of the features in the forthcoming SQL Server update. Database access is a continuing problem even today. In my opinion, relational databases or RDBMS are not well suited to handle the spikes that accompany online access. Furthermore, in this design, there is no provision I can see in this schematic for distributing database reads across data centers. We will return to the implications of this approach in a moment.

Third, notice that there are separate clusters of servers in an even bigger island, probably a big data center. Each performs a specific function. For example, there is a search cluster identified as “search.microsoft.com” and an ActiveX cluster identified as “activex.microsoft.com”. Presumably in a major data center or possibly two data centers connected by a high speed line in North America, the servers are hard wired to perform specific functions. The connections among the servers in the data centers use a very sophisticated and expensive in 1999 dollars a fiber ring or more precisely Fiber Distributed Data Interface. (FDDI is a 100 Mbps fiber optic LAN. It is an ANSI standard. It accommodates redundancy.) Microsoft’s own definition here says:

[The acronym] stands for Fiber Distributed Data Interface, a high-speed (100 Mbps) networking technology based on fiber optic cable, token passing, and a ring topology.

To me, the set up is pragmatic, but it suggests putting every thing in one, maybe two places. In 1999, demand was lower than today obviously. With servers under one roof, administration was simplified. In the absence of automated server management systems, technicians and engineers had to perform many tasks by walking up to a rack, pulling out the keyboard, and directly interacting with the servers.

Finally (there are many other points that can be explored, of course), note that one FDDI ring connects to the primary node (not a good word but the diagram shows the FDDI rings in this type of set up) to a secondary FDDI ring. Some services are mirrored such as home.microsoft.com and support.microsoft.com. Others such as premium.microsoft.com and “ftp://ftp.microsoft.com” are not.

My Opinion

This set up or more accurately “data center architecture” is much more sophisticated than what Chris Kitze and I used for The Point (Top 5%) of the Internet. We started that service in 1993 and sold it in 1996. In the period from 1996 to 1999, data center engineering for Internet access advanced rapidly. Compared to The Point (Top 5%) of the Internet in 1999, Microsoft’s approach was extremely sophisticated. Gigabit switches, OC3 data lines, and servers that started at $29,000 were top notch solutions compared to Lycos’ infrastructure.

In a word, Microsoft spent money to get the best equipment and provisioning available. Our puny Sun 10s and 20s were expensive to lease and maintain. Microsoft used Windows systems and software, which almost certainly made some tasks easier to perform.

The upside of the approach in 1999 was:

  • Pragmatic design
  • High quality hardware to minimize downtime from inferior components
  • Redundancy for certain core functions
  • High-speed interconnects
  • Segregation of the Internet from the Microsoft Internet
  • Cookie cutter design for geographically dispersed data centers.

The downside of the approach was:

  • Expensive to build because top-notch hardware and services were used
  • A failure could result in the loss of some services; for example, “premium” services
  • Isolating and engineering around SQL Server to optimize access to data tables
  • A consolidated data center with certain servers specifically configured to provide specific services. This principle allowed expansion by Microsoft’s “scale up and scale out” technique. “Up” means adding hardware dedicated to a specific function to reduce latency and bottlenecks in that function. “Out” means building another data center and putting more hardware using the “cookie cutter” design model online. Load balancing could then distribute demand to available resources.

What’s Interesting: Google in 1999

Let me pick up a theme in my earlier post on this topic. The link is here if you wish to revisit that essay. Microsoft’s engineers were designing data centers at almost the same time that Google’s engineers were working on the same problem. Microsoft and Google had engineers with experience working at Digital Equipment. Microsoft had researched Google’s approach, probably by asking about BackRub and talking with Stanford people about design principles used by Stanford graduate students. In fact, Jim Gray’s Three Talks’ PowerPoint does a good job of explaining most of what Google was doing.

Google’s decisions seem to be to go in a very different direction from the git go. In fact, after looking over this diagram and comparing it to the information I have gathered about Google’s architecture, decisions made in 1998 by both companies sealed their fate in online.

Hindsight is better than trying to figure out what’s going on at the moment when decisions are made. But hindsight in this case is particularly useful. We can see why Google has raced ahead in search and other online services while Microsoft has struggled to “catch up”. My hypothesis is that because of the decisions made in 1998 and 1999, Google embarked on an approach to data center architecture and infrastructure that allowed incremental improvement at a lower cost. Google’s engineering approach delivered comparatively better performance at a lower cost. Furthermore, Google’s work on making the Linux operating system do more administrative work contributed to Google’s price-performance advantage. I will pick up this idea in another essay.

For now, let me highlight a few differences in this table:

Microsoft Google Comment
Top drawer hardware Commodity hardware Long term payoff for Google
Isolate SQL Server for performance Solve problem with non-RDMS technology Eliminates bottlenecks and need for exotic engineering work arounds
Scale up and out Massively parallel distributed approach to scale Lower cost approach for Google
Traditional approach to redundancy Self adjusting approach to failure; many copies of data Google shifts from load shifting to redundant systems to use another copy of the data
Humans perform many administrative operations Software handles administrative operations Fewer engineers needed under Google’s approach; therefore, lower costs
Improve performance by adding hardware Improve performance by optimizing via software and math tricks Google’s performance benefit per hardware dollar is higher than Microsoft’s

Remember, this is my personal interpretation of the information I have. I am not asserting that either Microsoft or Google will agree with my analysis. This is a my personal Web log.

So What?

In my opinion, the factors identified in this table have determined the fate of both Microsoft and Google. Microsoft’s approach requires more money to deliver comparable performance. For a company of Microsoft’s size in 1999, the design decisions were the right ones, based on engineering best practices then. Google, on the other hand, took the best ideas from research computing, mixed in lessons learned by the Digital Equipment engineers, and the logic of solving puzzles in high-performance computing.

Microsoft is, therefore, launched on a route that requires the company to spend a great deal of money for its data centers at the outset of the competition between the two companies. Microsoft also has to deal with the fact that to match Google’s performance, Microsoft has to either retool its software for performance or buy more hardware to match the speed of Google’s system. Microsoft has to have more engineers because in 1999 automating data center operations, replacing failed components (Google just ignored routine hardware failures because its system used another device), and configuring systems (Google’s version of Linux includes some operations to speed deployment of additional servers).

In short, the decisions made by Microsoft made it a digital Lexus. Google’s engineering approach made it a got rod. The Lexus has one advantage. It’s a first class ride. When it breaks, the Lexus can be repaired, usually at a premium price. To make the Lexus go faster requires a major overhaul. Google, on the other hand, built a hot rod and continued to tweak and improve the basic design over a decade. Now, it’s cheaper to goose the hot rod than the Lexus. The hot rod has a big lead, built up over 10 years of speeding without major problems. The Lexus cannot catch up unless the driver abandons the automobile and gets a jet powered helicopter.

This in engineering terms means that Microsoft has to find a way to fly over Google, land ahead of Google, and accelerate ahead of Google. Let Google eat dust and try to find a way to make its hot rod into a helicopter.

Unless Google crashes and destroys its hot rod, Microsoft cannot catch up because of engineering decisions made in 1999 that have acted like a speed governor. The problem is sufficiently challenging at this time that not even Microsoft’s billions can buy an off-the-shelf solution. Something radically new is needed to stop Google.

Microsoft can hope lawyers do the job. Maybe a depression will break the Google business model. But in a technical race, Microsoft is in a very tough position.

Agree? Disagree? Help me refine or revise my thinking on this important topic.

Stephen Arnold, July 12, 2008

Comments

2 Responses to “Microsoft.com in 1999”

  1. Microsoft: 1999 to 2008 : Beyond Search on July 14th, 2008 12:03 am

    […] Microsoft.com in 1999 […]

  2. Microsoft Cedes Data Center Leadership to Google : Beyond Search on January 24th, 2009 12:04 am

    […] Architecture in 1999 here […]

  • Archives

  • Recent Posts

  • Meta