SQL Server: Bringing the Plow Horse to the Race Track for the Derby

July 10, 2008

SQL Server has bought a lot of dog food in Harrod’s Creek. We got paid to figure out why SQL Server back up and replication crashed and burned. We got paid to make SQL Server go faster. We got paid to grunt through scripts to figure out why reports were off by one. Yep, we like that plow horse. It works like a champ for most business database needs. You can use Access as a front end. You can make some nice looking forms with Microsoft tools with some fiddling.

sql diagram

This is a Microsoft diagram. The release date is August, maybe September 2008. More information is here.

But, when the old plow horse has to amble through petabytes of data, SQL Server is not the right animal for the job. In order to search gigabytes of normalized tables, you need to find a way to short cut the process. One of my colleagues figure out a way to intercept writes, eject them, and build a shadow index that could be searched using some nifty methods. Left to its own devices, SQL Server would stroll through processes, not gallop.

I spoke with a skeptic today. Her comments caused me to think about SQL Server in a critical way. Are these points valid? Let’s follow the plow horse idea and see if there’s hay in the stall.

Selected Features

Like she said to me, “A different data management animal is needed, right?”

Will SQL Server 2008 be that beast? Here’s what she told me about the most recent version of this data work horse:

  • An easier to use report builder. I thought the existing report tools were pretty spiffy. Guess I was wrong.
  • Table compression. A good thing but the search still takes some time. Codd databases have their place, but the doctor did not plan for petabyte-scale tables, chubby XML tables, and the other goodies that modern day 20-somethings expect databases to do.
  • More security controls. Microsoft engineers are likely to spark some interest from Oracle, a company known for making security a key part of its database systems.
  • Streamlined administrative controls. Good for a person on a salary. Probably a mixed blessing for SQL Server consultants.
  • Plumbing enhancements. We like partitioned table parallelism because it’s another option for whipping the plow horse.

These are significant changes, but the plow horse is still there, she asserted. She said, “You can comb the mane and tail. You can put liquid shoe polish on the hooves. You can even use a commercial hair conditioner to give the coat a just groomed look. But it is still a plow horse, designed to handle certain tasks quite well.”

Microsoft’s official information page is here. You can find useful links on MSDN. I had somewhat better luck using Google’s special purpose Microsoft index. Pick your poison.

Observations

If you are Microsoft Certified Professional, you probably wonder why I am quoting her plow horse analogy. I think SQL Server 2008 is a vastly improved relational database. It handles mission critical applications in organizations of all sizes 24×7 with excellent reliability when properly set up and resource. Stop with the plow horse.

Let’s shift to a different beast. No more horse analogies. I have a sneaking suspicion that the animal to challenge is Googzilla. The Web search and advertising company uses MySQL for routine RDBMS operations. But for the heavy lifting, Googzilla has jumped up a level. Technically, Google has performed a meta maneuver; that is, Google has looked at the problems of data scale, data transformation (a function that can consume as much as 30 percent of an IT department’s budget), and the need to find a way to do input output and read write without slowing operations to a tortoise-like pace.

So, Microsoft is doing database; Google is doing data management of which database operations are a sub set and handled by MySQL and the odd Oracle installation.

What’s the difference?

In my experience, when you have to deal with large amounts of data, Dr. Codd’s invention is the wrong tool for the job. The idea of big static databases that have to be updated in real time is an expensive proposition, not to mention difficult. Sure, there are work arounds with exotic hardware and brittle engineering techniques. But when you are shoving petas, you don’t have the luxury of time. You certainly don’t have the money to buy cutting edge gizmos that require a permanent MIT engineer to baby sit the system. You want to rip through data as rapidly as possible yet have an “as needed” method to querying, slicing, dicing, and transforming.

That’s her concern, and I guess it is mine too, with regard to SQL Server 2008. The plow horse is going to be put in the Kentucky Derby, and it will probably finish the race, just too slow to win or keep the fans in their seats. The winners want to cash in their tickets and do other interesting things.

When it comes to next generation data manipulation systems, Googzilla may be the creature to set the pace for three reasons:

  1. Lower cost scaling
  2. Optimized for petabyte and larger data
  3. Distributed, massively parallel operation.

Agree? Disagree? Let me know. Just have some cost data so I can get back to my informant.

Stephen Arnold, July 10, 2008

Comments

3 Responses to “SQL Server: Bringing the Plow Horse to the Race Track for the Derby”

  1. Charlie Hull on July 10th, 2008 8:32 am

    Relational databases are never going to be particularly good at some kinds of search, as they simply aren’t built for the job. I wrote an article about the recently:
    http://www.biosmagazine.co.uk/op.php?id=913
    Hope this is informative.

  2. Stephen E. Arnold on July 10th, 2008 10:08 am

    Charlie, thanks for posting. As I said, I like SQL Server, Access, even SharePoint. My concern is that as organizations get more digital data, the Codd model won’t scale without lots of money and effort. A new approach is needed. Otherwise an organization will have plow horse (Codd RDBMS or variant) looking good but the wrong animal for the race ahead.

    Stephen Arnold, July 10, 2008

  3. Charlie Hull on July 10th, 2008 10:22 am

    We’ve found that often the best approach is a combination of the two: RDBMS for structured data and a full-text search engine for the rest. It’s pretty feasible to connect the two using SQL or something similar, and to set up triggers and stored procedures to make sure the full-text indexes are updated when new information appears.

  • Archives

  • Recent Posts

  • Meta