Cuil Your Jets: Take Offs Are Easier than Landings
July 28, 2008
Digging through the rose petals is tough going. Cuil, it seems, has charmed those interested in Web search.
Balanced Comments Are Here
Among the more balanced commentaries are:
- Search maven Danny Sullivan here who says, “Can any start-up search engine “be the next Google?” Many have wondered this, and today’s launch of Cuil (pronounced “cool’) may provide the best test case since Google itself overtook more established search engines.”
- Michael Arrington, TechCrunch here, says, “Cuil does a good job of guessing what we’ll want next and presents that in the top right widget. That means Cuil saves time for more research based queries.”
- David Utter, WebProNews here, says, “The real test for Cuil when it comes back will be how well it handles the niche queries people make all the time, expecting a solid result from very few words.”
Now, I don’t want to pull harder on this cool search stallion’s bit. I do want to offer several observations:
First, the size of indexes don’t matter. If I am looking for the antidote to save a child’s life, the system need only return one result–the name of the antidote. The “size matters” problem surfaced decades ago when ABI / INFORM, a for fee database with a typical annual index growth of about 50,000 new records, found itself challenged as “too small” by a company called Management Contents. Predicasts jumped on the bandwagon. The number of entries in the index does not correlate to satisfying a user’s query. The size of the index provides very useful data which can be used to enhance a search result, but size in and of itself does not translate to “good results”. For example, on Cuil, run the query beyond search. You will see this Web log’s logo mapped to another site. This means nothing to me, but it shows that one must look beyond the excitement of a new system to explore.
Second, the key to consumer search engines is dealing with the average user who types 2.3 terms per query. The test query spears on Cuil returns the expected britney spears hits. Enter the term britny, and you get very similar results, but the graphics rotate plucking an image from one site and mashing it into the “hit”. Enter the query “brittany” and you get zero hits for Ms. Spears, super star. The fuzzy spelling logic and the synonym expansion is not yet tailored for the average user who can spell Ms. Spears more than 400 ways if I recall a comment made by Googler Jeff Dean several years ago.
Third, I turned on safe search and ran my “brittany” query. Here’s what I saw in the inset that allows me to search by category.
I like Playboy bunnies, and we have dozens of them hanging around the computer lab here in Harrods Creek. However, in some of the libraries in the Commonwealth of Kentucky, a safe search function that returns a hutch of Playboy bunnies can create some excitement.
Fourth, it is not clear to me what learnings from WebFountain, Dr. Patterson’s Google patent documents, and Mr. Monier’s learnings from the AltaVista.com/eBay/Google experiences have or have not found their way into this service. Search is a pretty difficult challenge as Microsoft’s struggles attest over the last 12 or 13 years. My hunch is that there are some facets to the intellectual property within Cuil that warrant a lawyer with a magnifying glass.
Net Net
I applaud the Cuil team for getting a service up and running. Powerset was slow out of the starting blocks and wrangled a pay day with a modest demo. Cuil, in a somewhat snappier, way launched a full service. Over the coming weeks and months, the issues of precision, recall, relevance, synonym expansion, and filters that surprise will be resolved.
I don’t want to suggest this is a Google killer for several reasons. First, I learned from a respected computer scientist that a Gmail address set up for a test and not released seemed to have been snagged in a Cuil crawl. Subsequent tests showed the offending email address was no longer in the index. My thought was that the distance between Cuil and Google might not be so great. Most of the Cuil team are Xooglers, and some share the Stanford computer science old school spirit. Therefore, I want to see exactly how close or far Cuil and Google are.
Second, the issue of using images from one site to illustrate a false drop on another site must be resolved. I don’t care, but some may. Here’s an example of this error for the query beyond search.
If this happens to a person more litigious than I, Cuil will be spending some of its remaining $33 million in venture funds to battle an aggrieved media giant. Google has learned how testy Viacom is over snippets of Beavis and Butt-head. Cuil may enjoy that experience as well.
To close, exercise Cuil. I will continue to monitor the service. I plan to reread Dr. Patterson’s Google patent documents this week as well. If you want to know what she invented when working for the GOOG, you can find a eight or nine page discussion of the inventions in Google Version 2.0. A general “drill down” notion is touched upon in these documents in my opinion.
And, keep in mind, the premise of The Google Legacy is that Google will be with us for a long time. Cuil is just one examples of the Google “legacy”; that is, Xooglers who build on Google’s approach to cloud based computing services.
Stephen Arnold, July 28, 2008
Cool Discussion of Cuil
July 28, 2008
Xooglers Anna Patterson and Louth man Tom Costello (husband and wife brains behind Xift which sold to AltaVista.com and Recall), Louis Monier (AltaVista.com top wizard), and Russell Power (worked on TeraGoogle) teamed up to create a next-generation Google. Michael Liedtke’s New Search Engine Claims Three Times the Grunt of Google is worth reading. You can find one instance of the write up here.
TechCrunch wrote about Cuil in 2007. You can read that essay here. The key points in the TechCrunch write up were that Cuil can index Web content faster and more economically than Google. Venture funding was $33 million, which is a healthy chunk for search technology.
Mr. Liedtke pulls together some useful information. For me, the most interesting points in the write up were:
- The Cuil index contains 120 billion Web pages.
- Cuil is derived from an Irish name.
- The search results will appear in a “magazine like format”, not a laundry list of results.
- Google has looked the same for the last 10 years and will look the same in the next 10 years.
Although Dr. Patterson left Google in 2006, she authored several patent documents related to search. I profiled these documents in Google Version 2.0, and these provide some insight into how Dr. Patterson thinks about extracting meaning from content. The patent documents are available from the USPTO, and she is listed as the sole inventor on the patent applications.
Observations
If Cuil’s index contains 120 billion Web pages, it would be three times larger than Google’s Web page index of 40 billion Web pages and six times larger than Live.com 20 billion page index. Google has indexed structured data which makes the index far larger, but Google does not reveal the total number of items in its index. The “my fish was this big” approach to search is essentially meaningless without context.
The AltaVista.com connection via Louis Monier is important. A number of AltaVista.com engineers have not joined Google. One company–Exalead–has plumbing that meets or exceeds Google’s infrastructure. My thought is that Cuil will include innovations that Google cannot easily retrofit. Therefore, if Exalead has a killer infrastructure, it is likely that Cuil will have one too. As Mr. Liedtke’s article points out, Google has not changed search in a decade. This observation comes from Dr. Patterson and may have some truth in it. But as Google grows larger, radical change becomes more difficult no matter how many lava lamps there are in the Mountain View office.
The experience Dr. Costello gained in the Web Fountain work for IBM suggests that text analytics will get more than casual treatment. Analytics may play a far larger role in Cuil than it did in either Recall, Xift, or Google for that matter.
The knowledge DNA of the Cuil founders is important. There’s Stanford University, the University of Washington, and AltaVista.com. I make quick judgments about new search technology by looking for this type of knowledge fingerprint.
Other links you may find useful:
- Cuil bios are here.
- Independent Ireland write up about the company is here.
- A run down of Xooglers who jumped ship is here and here.
- A brief description of Xift is here.
- Recall info is here. Scroll down to this headline “Recall Search through Past”
- WebFountain architecture info is here. You have to download the section in which you have interest.
With $33 million in venture funding, it’s tough to determine if Cuil will compete with Google or sell out. This company is on my watch list. If you have information to share about Cuil, please, post it in the comments section.
Stephen Arnold, July 28, 2008
Amazon: Server to Server Chattiness
July 27, 2008
In general, observers are pleased with Amazon’s explanation about the recent outage. You can read what Amazon offered as the “inside scoop” here. Center Networks has a useful wrap up plus links to the company’s earlier comments about the Amazon outage. For me, the most interesting point in the Center Networks’ write up was its gentleness. The same thought wove itself through Profy.com’s take on the issue, reminding me that Amazon is offering a low cost service. You can read Profy’s view here. Message passing in distributed, parallelized environments is an issue. Too many messages and the system chokes and simply quits working. Anyone remember the nCube’s problems? Too few messages, the massively parallel system becomes like a sleep over for your daughter’s seven middle school chums. Message passing is an issue in the older Microsoft data center architectures, which I wrote about in a series of three posts for this Web log. To solve the problem in 2006, if I interpret the Microsoft diagram I included in my write up, Microsoft “threw hardware at the problem”. You can read this essay here. Redmond then implemented a variety of exotic and expensive mechanisms to keep servers in sync, processes marching like the Duke University drill team, and SQL Server happily insulated from the chioking hands of Microsoft’s own code ninjas. Is there a better or at least less trouble-free way to accomplish messaging? Yes, but these techniques require specialized functions in the operating system, changes to who sends what messages and how the “master” in a procedure talks to “slaves”, and a rethink about how to keep message traffic from converging on a single “master”. Chattiness does not do justice to the technical complexity of the problem. You may want to navigate to Google.com and run a query on these terms: “Google File System”, “Chubby”, and “Jeffrey Dean”. The first term refers to add ons to Linux that Google implemented nine years ago. The second refers to one piece of Google plumbing for file locking and unlocking with references to the BigTable data management system. And “Dr. Dean” is a former AltaVista.com wizard who has become one of the people given the job of explaining how Google tackled messaging and related problems since 1999. I summarize most of the broad ideas in The Google Legacy (Infonortics 2005), but reading the primary source information can be illuminating. Google’s solution is not without its weaknesses, of course. “Chattiness” is not one of these vulnerabilities. In terms of total operations cost at Amazon, the AWS services get a tiny slice of the action. Amazon is using its infrastructure to squeeze value from its plumbing. I anticipate other issues going forward, and Amazon will address them. Over time, AWS will resolve “chattiness”. Perhaps the next problem will be minimized and repositioned as “an under cooked failed soufflé”? Wordsmithing is not engineering in my opinion. Agree? Disagree? Help me understand Amazon’s explanation of “offline”.
Stephen Arnold, July 28, 2008
Database Giant Oracle Hooks Non Profit Institute in Bar Harbor
July 27, 2008
Oracle dominates certain sectors of the database industry. Its Secure Enterprise Search known as SES10g has been a data step child. I attended a presentation about Oracle’s “new” enhancements to SES10g several months ago. Since that presentation by an engaging and supremely self confident Oracle executive, I haven’t heard much about SES10g. I did learn that one unit of Oracle has been selling Google Search Appliances. If indeed this is true, Oracle’s senior management is either very canny or unable to make Oracle’s units sing from the same page in the hymnal.
After a particularly interesting day trying to find a medical professional to provide information about my mom, I sat in the hospital’s lovely cafeteria and fired up my news reader. The item that caught my eye was “The Jackson Laboratory Improves Enterprise Search with Oracle”, written by Oracle and published on the Technology Marketing Corporation’s Web site here.
The “news” was that a non profit institute in Bar Harbor, Maine had licensed SES10g to provide “its employees and visitors to its Web site with relevant, secure and customizable search results about the organization’s research, courses, resources and services.” Please, read the complete release here. You may want to read the white papers about Oracle and its SES10g system.
One white paper–Secure Enterprise Search Version 10.1.8.2 An Oracle Technical White Paper October 2007″. The white paper is quite useful, and it includes diagrams showing the SES10g’s “architecture”. You will also see code snippets and learn that SES10g performs most of the functions of other big name search systems.
The hook for SES10g is security. The white paper touches lightly on the need to license various other Oracle components to obtain the maximum security capabilities for SES10g. That information is available on the Oracle Web site; for example, navigate to Oracle.com and search for Oblix.
I navigated to Jax.org to check out the genetics lab’s Web site. I ran the query “databases” and got a list of hits. I clicked on one of the hits from the first page of results and received this result:
The hit was a null set. Maybe I’m just unlucky or Jackson Institute has not yet deployed SES10g.
After this bit of bad luck, I started to formulate the notion that this is a PR puff piece. But, I thought, why would a giant company like Oracle issue a news release unless there was substance–steel inside the boxing glove?
Stephen Arnold, July 27, 2008
AOL: What about Relegence
July 27, 2008
I am now longer surprised with the deep cloud of unknowing that reduces visibility in pundits’ reports. Someone (who shall remain nameless) sent me a report from a well known search “expert” (who shall also remain nameless) reporting on the changes at America Online.
AOL fumbled some of their opportunities in the last three years. Now the company is the darling of the analysts’ eye because the company is dumping services. I have a partner who is a former Naval officer. He remarked, “When you jettison stuff, you want to try and stay afloat.” Not much more deep thinking is required to understand why AOL Pictures, BlueString, and an online back up service called Xdrive are history. None has much traction compared to Flickr, YouTube.com, and literally dozens of cloud-based back up services. Here’s a link to a reasonably good summary of the received wisdom about AOL’s actions.
The report I mentioned earlier talked about solid AOL services; namely, email, instant messaging, and chat rooms. The consultant generated a laundry list of other AOL services, which you can find here without paying a consultant to prepare a custom study for you.
One interesting service, which is named “Money”, is reasonably useful. In fact, for some types of company research, one can argue that it is as good as either Yahoo Finance or Google’s Yahoo Finance clone aptly named Google Finance. I heard that one of the female engineers who worked on Yahoo Finance, jumped to Google to work on Google Finance, but that may be Silicon Valley chatter. There are some similarities.
AOL has muffed the bunny on its promotion of its service. First, navigate to http://money.aol.com. There are a number of point and click options. These range from changing the page layout to scanning through categories of information, blogs, headlines, and videos.
In November 2006, AOL acquired Relegence Corporation, originally started in Israel. This company–essentially unknown and untracked in the search and content processing world of New York punditry–developed technology that monitors, formats, and displays real time content streams. The company’s approach presaged the Connotate system (now the object of much love from Goldman Sachs) and Exegy (in deep mind meld with the financial and military intelligence sectors) in 1999.
This is the Relegence tool bar. Users have one click access to news and other features.
In 2006, I thought that The Relegence Corporation was a very solid real-time financial services news engine, providing market and business intelligence to global buy-side and sell-side institutions. The company had an R&D center in Israel. In 2004, Relegence hooked up with search vendor X1, but that tie up has dropped off my radar. Relegence’s automated infrastructure aggregated relevant structured and unstructured information from internal resources and external research, including blogs, Web sites, email and over thousands of third-party sources in English and other languages. Relegence’s could deliver customized news in real-time to any communications device. When I looked at Relegence at the time of the AOL deal, I thought that Relegence was in a position to give InfoDesk, founded by a former Bell Labs wizard Sterling Stites, a run for the money.
Google’s Search Appliance and ERP: Comfy Together
July 26, 2008
Mike Faust has crafted a very useful, pragmatic overview of the steps in hooking a Google Search Appliance to an enterprise resource planning (ERP) system. I review quite a few articles, and this one is a keeper. His “Implementing Enterprise Search within Your ERP Application” is a solid roadmap. It includes code samples that are meaty and easy to follow. You can access the full text of his article here.
The most significant take away for me was this point:
The good news is that many application vendors already have OneBox modules available for use with their applications. In any case, using Google Search Appliance’s OneBox functionality can get you on the road to using a single search to access information throughout your enterprise.
This comment makes it clear that Google’s search appliance is gaining muscles and impressing admirers with its robustness. Search vendors can bad mouth the GSA as a toy. But dissing the GOOG may be easier than keeping it out of a client organization. Customers want the GSA, and Google, despite their miserable track record for returning phone calls and getting to meetings on time, is selling a boat load of these gizmos.
Stephen Arnold, July 26, 2008
Microsoft’s Browser Rank
July 26, 2008
I heard about Browser Rank a while ago. My take on the technology is a bit different from that of the experts, wizards, and pundits stressing the upside of the approach. To get the received “wisdom”, you will want to review these analyses of this technology:
- Microsoft’s own summary of the technology here. The full paper is here. (Note: I have discovered that certain papers are no longer available from Microsoft.com; for example, the DNABlueprint document. Snag this document in a sprightly manner.)
- Steve Shankland’s write up for CNet here. The diagram is a nice addition to the article.
- Arnold Zafra’s description for Search Engine Journal here.
By the time you read this, there will dozens of commentaries.
Here’s my take:
Microsoft has asserted that it has more than 20 billion pages in its index. However, indexing resources are tight, so Microsoft has been working to find ways to know exactly which pages to index and reindex without spidering the bulk of the Web pages each time. The answer is to let user behavior generate a short list of what must get indexed. The idea is to get maximum payoff from minimal indexing effort.
This is pretty standard practice. Most systems have a short list of “must index” frequently links. There is a vast middle ground which gets pinged and updated on a cycle; for example, every 30 days. Then there are sites like the Railway Retirement Board, which gets indexed on a relaxed schedule, which could be never.
Microsoft’s approach is to take a bunch of factors that can be snagged by monitoring user behavior and use these data to generate the index priority list. Dwell time is presented in the paper as radically new, but it isn’t. In fact, most of the features have been in use or tested by a number of search systems, including the now ancient system used by The Point (Top 5% of the Internet), which Chris Kitze, my son, and I crafted 15 years ago.
We too needed a way to know only the specific Web sites to index. Trying to index the entire Web was beyond our financial and technical resources. Our approach worked, and I think Microsoft’s approach worked. But keep in mind that “worked” means users looking for popular content will be well served. Users looking for more narrow content will be left to fend for themselves.
I applaud Microsoft’s team for bundling these factors to create a browser graph. The problem is that scale is going to make the difference in Web search, Web advertising, and Web content analytics. Big data returns more useful insights about who wants what under what situation. Context, therefore, not shortcuts to work around capacity limitations is the next big thing.
Watch for the new IDC report authored by Sue Feldman and me on this topic. Keep in mind that this is my opinion. Let me know if you agree or disagree.
Stephen Arnold, July 26, 2008
Google: Chubby and Paxos
July 26, 2008
The duo is not Cisco and Poncho or the Lone Ranger and Silver. Paxos is closer to leather biker gear and a Harley Davidson belt buckle. The outfit gets some panache and the biker’s pants stay properly slung. You may want to read the 16 pages of Googley goodness here. The paper is “Paxos Made Live–An Engineering Perspective.” One of the interesting facts about this paper is that Tushar Chandra has emerged as a spokesperson for Google. You can read my translation of some of his recent comments here.
In this brief essay, I want to identify three of the points discussed in this 2007 paper that are of particular interest to me. But before I highlight these points I want to provide some context. Chubby is a mechanism to keep processes from acting like hungry kindergartners running to the milk and cookies. Chubby keeps order and get the requests filled quickly without having two six year olds getting into a knock down fight over a graham cracker.
Chubby is pretty nifty technology, representing a major advance over the file and record locking schemes used for Codd databases. When I mention this point to IBM DB2 or Oracle wizards, I am greeted with hoots of laughter. “Google has nothing we don’t have and we have file and record locking schemes that are much better,” I was told in May 2007 in the IBM booth at a major trade show. No problem. I believe IBM and Oracle. I just hope their customers believe them, when Google reveals the efficiency of Chubby. You can learn more about Chubby in my 2005 The Google Legacy and my 2007 Google Version 2.0, or you can read this Google white paper. File and record locking for reads and writes is one of the hot spots in many database systems. Some companies turn cartwheels to figure out how to perform writes without screwing up read response time. Believe me, some of these outfits do Cirque de Soleil type acrobatics to work around the database read write problems.
Second, Chubby is not new. When a Google technical paper appears, Google is not revealing a work in progress. My analysis of Google engineering papers and patent documents suggests a careful staging of each information release. When a paper appears, the technology is up, running, and locked in. A competitor learning about a Google innovation from a patent document or a Google technical paper is learning about something that is two to five years “old”; that is, the company has been working on a problem and figured out a bunch of possible solutions. The one soluti0on that makes it into the Google production environment is a good one. When the Googlers talk about an innovation, the competitor who decides to respond is late out of the starting gate. Neither of my two Google studies contained “new” information. I was reporting what was ancient history for Googzilla.
Paxos
Now what’s a Paxos?
Paxos is not one thing. It is a collection of protocols that allow a system to adapt to failures. Google has lots of servers, so there are many failures. Chubby sits between the Google File System and Google’s BigTable (a data management system, not a traditional relational database). Wikipedia can deliver some less than stellar information, but the write up for Paxos struck me as reasonably good, and the information will get you anchored in the notion. The diagrams won’t be of much use, but the Google diagrams are almost equally opaque. The reason is that the flow diagrams don’t make much sense unless you have some experience with smart software in a failure prone environment. Based on the style of writing and the type of diagrams in the Paxos write up, my hunch is that a Google-grade brain contributed a thought of two the the Wikipedia entry. The external links reinforce my conclusion that this is a pretty reliable description of the flavors of Paxos. Of course, it’s tough to determine which “flavor” or “flavors” are part of the Google library.
A typical Google performance table. Google compares its processes to themselves, not to commercial alternatives. These data suggest that Google is doing the work of a cluster of high performance machines on a single commodity server. The key number is operations per second, which works out to 38,400 operations per second for 20 workers (clients). What’s remarkable is that throughput is 3.6 times greater for for the larger test database. In other words, as the data get bigger, the throughput goes up. © 2007 Google, Inc.
In my vastly simplistic way, Paxos is one tiny cog in Google’s library of smart algorithms. The algorithms crank mindlessly through a procedure writing values. Another process watches these values. When an anomaly becomes evident, the watching process “checks” with other processes and reaches a consensus about what action to take. It sounds really democratic and time consuming. The method is neither. The consensus is not like a human vote. When a group of processes return an acceptance value, the “master” decision is made automatically when a majority of the processes return a proposed value to the master.
Keep in mind that this occurs in a massively parallel computing environment. These types of system level processes occur with near zero latency. This type of master-slave set up is a feature of other core Google processes; for example, the Google File System itself. I describe the advantages of Google approach in The Google Legacy, and I will not repeat that information here. I think it is sufficient to point out that the approach has some very significant benefits, and most of Google’s competitors are racing to duplicate functionality that Google has had in operation for at least eight years.
Google’s Data Rissoto Attracts Italian Legal Eagles
July 25, 2008
PaidContent.org’s Dianne See Morrison reported on July 25, 2008, that Google has spoiled the data risotto in Italy. The Italians are picky about recipes, and the GOOG is alleged to “failing to adequately [sic] monitor third-party content posted to their [sic] Web site.” You can Ms. Morrison’s interesting article here. The issue has been simmering for two years. The content seems to be video. The Italian issue joins similar actions in France and Spain. The Wall Street Journal’s Alessandra Galloni filed a new item about the Italian tussle here. By the time you read my comments, the WSJ’s link may be dead.
I’m no attorney. I don’t even have a hankering to spend time with my own attorney. I can point to my 2005 study The Google Legacy here. In that study, researched and written in late 2003 and 2004, I compiled a list of the vulnerabilities Google faced at that time. In the top three were legal actions.
My research provided me with quotes and publicly accessible documents that revealed the following items:
- Google delegates functions and asks that those making decisions analyze data and use those data to back up their decisions. This method is different from more political or social procedures used by some other organizations. For example, at Lycos in 1994, face to face discussions took place and many decisions were a collaborative effort, not a data driven effort.
- Google’s founders are logical, maybe to a fault. If a statement says X, then it “means” X. Google, therefore, looks at rules and guidelines and reasons that these documents mean what they say. Anyone with any experience in the halls of Congress or Parliament know that what a word “means” is more slippery than a 10 gram blob of mercury. However, once logic locks in, the logic dictates the argument. Google executives appear to me to believe that the company is complying and making an effort to comply with rules, laws, and guidelines.
- Google’s engineers have come up with a number of patent documents addressing content related issues. The company is focusing resources on the problem of content that finds its way on to the Google system that may be problematic.
I processed these items surfaced by my research and drew the conclusions I set forth in The Google Legacy. First, Google is a disruptive force of significant proportions. The culture of Google exists in the eye of a hurricane. Inside Google, it’s calm. Outside of Google tempests rage. Lawyers thrive because those with alleged grievances don’t know how to “get their way” with Google. The logic of the mathematician does not mesh smoothly with the logic of the lawyer.
I am also reasonably confident that Google believes that it is behaving within the letter and spirit of the law as Google understands those promulgations. I know this may sound crazy because legal actions are coming fast and furious.
I know that Google values mathematics and clever solutions. Google is chock full of really smart people who can look at this mathematical expression and resolve it without hesitation:
(a, b) x (c, d) = (a x c – b x d, a x d + b x c)
Individuals who can bring mathematical reasoning and deep technical knowledge to bear on a “problem” often arrive at solutions that are inscrutable to the average bear. Google makes engineering and business decisions with this type of insight or cleverness. It is not surprising to me that people see Google as an arrogant bunch of gear heads, indifferent to the needs of other businesses. The notion of “getting it” works within Google; it does not work too well in other organizations.
The result?
Lawsuits. Lots and lots of litigation. An infinity of legal eagles, no matter how light weight, can settle on Googzilla and slow it down, knock it over, and maybe pluck out its innards.
I want to see how Italy’s legal eagle react to the Google risotto.
Stephen Arnold, July 25, 2008
More After Market Parts for SQL Server
July 25, 2008
Microsoft is rushing to address some of the challenges SQL Server gives licensees. There is scaling and data management, among other challenges.
Microsoft is going outside to get some assistance. The company said it was acquiring DATAllegro, a privately held
company with data warehouse appliances. An appliance is one or more servers, drives, and software that can be taken out of the box and plugged in. Now these appliances are not like your mom’s toaster. Compared to the build-it-yourself approach to data warehousing, DATAllegro’s approach reduces installation and deployment time.
Elizabeth Montalbano does a good job of explaining the deal in her “Microsoft to Buy Data Warehouse Appliance Vendor” story here.
This url will go dead in a day, maybe two, so click quickly. Even though I do some for fee research for IDC, I find the search system for its publicly accessible content maddening.
Will this acquisition along with the Zoomix data quality purchase make SQL Server the database for tomorrow’s enterprise?
No, with the rapid growth in digital data and information, Codd databases–even with extensions–are not the product for the petabyte-scale that large organizations now must address.
I know that the Sybase core has been rewritten, optimized, and turbo charged. SQL Server is darn good for what it is–a relational database. The difficulty is that like Oracle and IBM DB2, the data challenge for which Cobb was so right is very wrong for heavy lifting that is riding a freight train directly to the door of the enterprise.
One of my two or three readers complained that my essays sound like Google’s PR machine. Nothing could be more wrong. Dear old GOOG won’t speak to me, and I am viewed as an annoyance that should head for the retirement home.
I won’t mention Google’s data management technologies. If you are a SQL Server DBA, you have lots of spare time to learn about Sawzall because the new and improved SQL Server is a cake walk.
Stephen Arnold, July 25, 2008
Update July 31, 2008: Mark Madsen’s “What the Microsoft DatAllegro Deal Means for Customrs, Vendors, and BI in Intelligence Enterprise” reveals some chilling information. He writes here, “Regardless of any roadmap, the acquisition won’t affect SQLServer users for at least two years, and more likely three due to the multi-year development cycle SQLServer has been on.” For SQLServer customers struggling with bottlenecks and data management headaches, this news–if Mr. Madsen is right–means more pain and no gain from this Microsoft DatAllegro tie up.