Digital Convergence: A Blast from the Past

July 15, 2008

In 1999, I wrote two articles for a professional journal called Searcher. The editor, Barbara Quint, former guru of information at RAND Corporation, asked me to update the these two articles. I no longer had copies of them, but Ms. Quint emailed my fair copies, and I read my nine-year old prose.

The 2008 version is “Digital Convergence: Building Blocks or Mud Bricks”. You can obtain a hard copy from the publisher, Information Today here. In a month or two, an electronic version of the article will appear in one of the online commercial databases.

My son, Erik, who contributed his column to Searcher this month as well, asked me, “What’s with the mud bricks?” I chose the title to suggest that the technologies I identified as potential winners in 1999 may lack staying power. One example is human assigned tags. This is indexing, and it has been around in one form or another since humans learned to write. Imagine trying to find a single scroll in a stack of scrolls. Indexing was a must. What’s not going to have staying power is my assigning tags. The concept of indexing is a keeper; the function is moving to smart software, which can arguably do a better job than a subject matter expert as long as we define “better” as meaning faster and cheaper”. A “mud brick” is a technology that decomposes into a more basic element. Innovations are based on interesting assemblages of constitute components. Get the mix right and you have something with substance, the equivalent of the Lion’s Gate keystone.

lion-gate-mycenae-2a

Today’s information environment is composed of systems and methods that are durable. XML, for example, is not new. It traces its roots back 50 years. Today’s tools took decades of refinement. Good or bad, the notion of structuring content for meaning and separating the layout information from content is with us for the foreseeable future.

Three thoughts emerged from the review of the original essays whose titles I no longer recall.

First, most of today’s hottest technologies were around nine years ago. Computers were too expensive and storage was too costly to make wide spread deployment of services based on antecedents of today’s hottest applications such as social search and mobile search, among others.

Second, even though I identified a dozen or so “hot” technologies in 1999, I had to wait for competition and market actions to identify the winners. Content processing, to pick one, is just now emerging as a method that most organizations can afford to deploy. In short, it’s easy to identify a group of interesting technologies; it’s hard for me to pick the technology that will generate the most money or have the greatest impact.

Read more

Microsoft Zips Up Zoomix

July 15, 2008

Data are a problem. Microsoft, despite some of its resellers’ and cheerleaders’ assurances, lacked data cleansing tools. No longer. On July 14, 2008, Microsoft bought Zoomix, a company with a system that “uses guided self-learning technology to easily build a knowledge of how to parse, match, classify and clean data, and applies what it has learned to every new piece of information fed into the system, even if it has not encountered similar data before.”

Why is this important? Fiddling with data–normalization, clean up, transformation, and other arcana–can consume more than 30 percent of an information technology budget each year. You can read the Zoomix news release about the deal here. The company says that it delivers a software-based self-learning data quality engine.

Zoomix’s management has buzzword fever. You will have to do some careful reading to figure out what a PIM Accelerator, a BI Accelerator, MDM Accelerator, and UNSPSC Auto Classification do. Hint. Eliminate most of the manual intervention required in traditional data cleansing processes.

Ars Technica has a useful description of the Zoomix technology here.

My research suggests that Zoomix technology will make SQL Server licensees happy. The auto classification technology could bring much needed robustness to the current versions of SharePoint. Zoomix technology can improve some of the native Fast Search & Transfer processes as well, but that use of the technology will take longer to deploy.

Stephen Arnold, July 15, 2008

Google: SLAs Here and QoS’s Not Far Behind

July 14, 2008

Stephen Shankland’s essay “Google Aims to Earn Business Trust in the Cloud” provides a good summary of Google’s service level agreement for Gmail. An SLA specifies a level below which a an operation  will perform. Mr. Shankland’s focus is on the white hot topic of cloud computing. Organizations are reluctant to trust any information to a service that is flaky. The most important point for me in his essay was:

Google is trying to communicate better with users and customers…

SLAs alone are not enough to reduce risk for an organization contemplating moving services and information “out there” in the cloud. This summer is what I have called “the summer of transparency”. Everywhere I look I see Google executives chattering about the technical innards, the child care services, privacy, and other Google issues. One of my clients said, “Enough Google already.”

But there is an interesting aspect of Google’s SLA that warrants mentioning. In late 1998 or early 1999, Google engineers began working on a Quality of Service invention. You can read more about this invention by downloading US7142536, Communications Network Quality of Service System and Method for Real Time Information, Filed December 14, 2000. Granted November 28, 2006. Google’s SLA is just one component in a broader set of guarantees which extend into the methods often associated with assuring system level functions, not application level functions.

With “guarantees” like these, Google can assert that risks associated with moving to the cloud are to some degree ameliorated.

Stephen Arnold, July 15, 2008

Social Networking Is Hot: Users Love It and So Do Intelligence Professionals

July 14, 2008

Once I fought through the pop up ad and the request to provide information about why I am growing to hate the InfoWorld Web site, I was able to read Paul Krill’s essay “Enterprises Become the Battleground for Social Networking.” Mr. Krill explains that social networking services are gaining traction outside the consumer market. MySpace.com, Facebook.com, Bebo.com, and dozens of other services make it easy to connect with friends in cyberspace. Citing a number of industry authorities and thought leaders, Mr. Krill provides a useful run down of the benefits of social networking. in commercial organizations, not-for-profit outfits, and governmental agencies. interest in social networking is rising.

The most interesting portion of the essay is the comments from an individual identified as MattRhodes. Mr. MattRhodes is a supporter of Gartner Group and its report on social networking. He writes:

… businesses aren’t making the use they should do of social communication. That consumers are getting more and more used to social networking and other social tools is well known by those of us who work in the industry. The reasons are simple – they actually offer a new and different way of communicating.

This assertion is indeed true. Also true is the interest in social networking. Technologies and services that work in the consumer Web migrate into organizations as well. Social networking, therefore, is going to play an important part in the information technology mix.

Amidst this violent agreement among myself, Mr. Krill, and Mr MattRhodes, there lurk some flashing yellow warning signals. In my opinion, some issues to ponder include:

  • Social networking provides a potent monitoring tool. Employees, users, indeed, anyone using the watched system can be tracked. Intelligence can be extracted. Individuals taking actions that are counter to the organization’s interest can be identified and appropriate action taken. The essence of social networking is not collaboration; social networking generates useful user behavior data and potentially more useful metadata.
  • Organizations have secrets. Social networking systems add doors and windows through which secrets can escape or watched. Most organizations have security provisions but actual security is breachable. Automated security systems that eliminate tedious permission set up by a security professional make it possible to reduce certain costs. The flip side is that most organizations have flawed security procedures, and the information technology department does what it can with its available resources. The security for certain social networking services can be a time bomb. No one knows that problem is there until the bomb goes up. Damage, depending on the magnitude of the bomb, can be insignificant or horrific.
  • New employees, comfortable with the mores of the evolving social networking world, bring different values and behaviors to online activity. Granted, some new hires will be gung ho and sing the company song each morning. Other new hires will take an informal approach to mandates about what information to share. Are you familiar with the actual behavior of graduates of one of India’s prestigious high school? I think this approach will characterize some of the new hires’ use of social networking.

To repeat: I think social networking and its underlying technology is important. I see many benefits. My experience suggests that those who cheerlead may want to spend a bit more time in the library reading about security vulnerabilities of real time, fluid, social functions. There’s a reason undercover agents make “friends” with persons of interest. The important relationships are not focused on finding a fourth for a golf outing.

Stephen Arnold, July 14, 2008

Microsoft Yahoo: Search Realities

July 14, 2008

The Wall Street Journal, the New York Times, and Reuters have covered the most recent Microsoft Yahoo mating dance in excruciating detail. If you have not seen these three media giants’ take on the Yahoo snub of Microsoft’s and Mr. Carl Ichan’s most recent offers, just navigate to one of these links:

  • New York Times here but you have to register: Angle is shift from saber rattling to escalating conflict
  • Reuters here: Angle is “guaranteed ad revenue” for five years
  • Wall Street Journal here: Angle is impasse that will lead to an “incredible dance”

You can explore links galore on Techmeme.com and Megite.com. I can’t add much to these reports of this ménage à trois. I would  like to point out that when some sort of deal goes through, search gains a new urgency. Here’s why:

  1. Google faces a real pit bull in the legal squabble with Viacom. Based on my research findings, Google may for the first time face a perfect story: lousy economy, escalating annoyance from developers over the Apps flap, and the privacy monsoon unleashed with the YouTube usage data decision. Now is the time to strike Google, but if the internecine warfare continues, Microsoft may miss this opportunity to deal a potentially devastating blow to the GOOG
  2. Yahoo is in disarray. Open source is a great idea. Cutting deals with Google is a great idea. The problem is that when one looks at the long term impact of these great ideas, the great ideas undermine the foundation of Yahoo. Better shore up that foundation before the basement fills with water and undermines the entire shotgun house
  3. Capturing headlines is not the same as making money. Microsoft itself needs to concentrate its forces, set priorities, and get down to business with regards to [a] Web search and [b] enterprise search. The senior management of any organization has a finite amount of attention and energy. Whatever is available needs to be focused on closing the gap with Googzilla and making gains in the severely fragmented enterprise search sector.

No doubt business school case writers are sharpening their pencils. Unless Microsoft can resolve this Yahoo business, the company may miss its chance at the brass ring. Google can settle with Viacom, mend its fences, and rebuild its lead with regard to Microsoft. Agree? Disagree? Help me fill in the gaps in my understanding.

Stephen Arnold, July 14, 2008

Puzzling Business Intelligence Research Result

July 14, 2008

Joe O’Halloran’s essay “UK businesses Shunning Real-Time Data Analysis” reports that Progress Software, owners of the EasyAsk search system, issued findings from commissioned research. Progress revealed that, based on its research, 70 percent of UK businesses have no intention of analyzing data in real time. Most companies, according to Progress, are quite content to analyze historical data. You can read about Progress Software real time services here. I did a quick look through the Progress Web site and could not find the referenced study. What puzzled me was that the narrow interest in real time processing does not mesh smoothly with Apama, Progress’ real time solution. I also wondered if the finding suggests that EasyAsk’s index updating need not be particularly speedy. My research suggests that access to current data is important. Systems that deliver “old” or “stale” results contribute to the dissatisfaction users express toward information retrieval systems. Check out computer Weekly’s essay and Progress Software’s information about its products. Then read about Progress Software’s real time products. I think the research suggests that real time is not a hot button for most businesses. Also, could business intelligence not be the donkey to carry some information-centric products out of the financial desert?

Stephen Arnold, July 14, 2008

SharePoint Tip: Unexpected Error Fix

July 14, 2008

A happy quack to Mahesh Anandan for “WSS 3.0/MOSS 2007 Troubleshoot Uxpected Error” here. SharePoint administrators may encounter an “unexpected error” message. The problem cuts across Web parts. Mr. Anandan provides two useful tips. I don’t want to steal his thunder. The first tip gtets you some information, but his second tip and script is very useful. The trick is to make a change oto the web.config file. It seems values must be set to true; namely, CallStack and AllowPageLevelTrace. I’m sure there’s a great reason why these switches were not set this way by Microsoft engineers, but Mr. Anandan has come to our rescue.

Stephen Arnold, July 14, 2008

Slap on a Sleepy Sunday: MSFT Will Fight the GOOG for Enterprise Search Dominance

July 14, 2008

Imagine my blood pressure spiking from sleepy Sunday to the Super Bowl of Enterprise Search. Chris Gilmer wrote “Google Will Not Take Enterprise Search Away from Us–Microsoft” and The Search Engine Marketing Web log published this quite interesting essay. You can read the full text of Mr. Gilmer’s blockbuster here.

The key point in addition to a link to the feisty The Register is:

Microsoft thinks that enterprise search belongs to them. Even though Google has been in the game since the early 2000’s.

My thought, after writing several essays about Microsoft’s approach to online architecture, was, “So, this is news?” After asking myself this rhetorical question, I recalled:

  • Google was founded in 1998 after a short acquaintance with BackRub. Microsoft was well aware of Google in 1999. In fact, Microsoft had a better handle on Google’s engineering than almost any other company by 1999.
  • Enterprise search is an interesting segment. I have asserted that enterprise search is dead. If Microsoft is going to hold off Google, the weapons will be Exchange and SharePoint, not one of the many flavors of search that Microsoft now owns.
  • Microsoft is now taking Google more seriously. My research suggests that Google was perceived as an aberration, not a “real” company.

My only criticism of Mr. Gilmer’s essay is that I did not learn anything. You may and if true, keep reading those search engine optimization publications. You are on the same wave length.

Stephen Arnold, July 14, 2008

Microsoft: 1999 to 2008

July 14, 2008

I have written one short post and two longer posts about Microsoft.com’s architecture for its online services. You can read each of these essays by clicking on the titles of the stories:

I want to urge each of my two or three Web log readers to validate my assertions. Not only am I an addled goose, I am an old goose. I make errors as young wizards delight in reminding me. On Friday, July 11, 2008, two of my engineers filled some gaps in my knowledge about X++, one of Microsoft’s less well-known programming languages.

the perils of complexity

The diagram shows how complexity increases when systems are designed to support solutions that do not simplify the design. Source: http://www.epmbook.com/complexity.gif

Stepping Back

As I reflected upon the information I reviewed pertaining to Microsoft.com’s online architecture, several thoughts bubbled to the surface of my consciousness:

First, I believe Microsoft’s new data centers and online architecture shares DNA with those 1999 data centers. Microsoft is not embracing the systems and methods in use at Amazon, Google, and even the hapless Yahoo. Microsoft is using its own “dog food”. While commendable, the bottlenecks have not been fully resolved. Microsoft uses scale up and scale out to make systems keep pace with user expectations of response time. One engineer who works at a company competing with Microsoft told me: “Run a query on Live.com. The response times in many cases are faster than our. The reason is that Microsoft caches everything. It works, but it is expensive.”

Second, Microsoft lacks a cohesive code base and a new one. With each upgrade, legacy code and baked in features and functions are dragged along. A good example is SQL Server. Although rewritten from the good old days with Sybase, SQL Server is not the right tool for peta-scale data manipulation chores. Alternatives exist and Amazon and Yahoo are using them. Microsoft is sticking with its RDBMS engine, and it is very expensive to replicate, cluster, back up with stand by hardware, and keep in sync. The performance challenge remains even though user experience seems as good if not better than the competition’s. In my opinion, the reliance on this particular “dog food” is akin to building a wooden power boat with unseasoned wood.

Third, in each of the essays, Microsoft’s own engineers emphasize the cost of the engineering approaches. There is no emphasis on slashing costs. The emphasis is on spending money to get the job done. In my opinion, spending money to solve problems via the scale up and scale out approach is okay as long as there are barrels of cash to throw at the problem. The better approach, in my opinion is to engineer solutions that make scaling and performance as economical as possible and direct investment at finding ways to leap frog over the well-known, long-standing problems with the Codd database model, inefficient and latency inducing message passing, and dedicated hardware for specific functions and applications then replicating these clusters. And, finally, using more hardware that is, in effect, sitting like an idle railroad car until needed. What happens when the money for these expensive approaches becomes less available?

Read more

Microsoft.com in 2006

July 13, 2008

In late 2006, I had to prepare a report assessing a recommendation made to a large services firm by Microsoft Consulting. One of the questions I had to try and answer was, “How does Microsoft set up its online system?” I had the Jim Gray diagram which I referenced in this Web log essay “Microsoft.com in 1999”. To be forthright, I had not paid much attention to Microsoft because I was immersed in my Google research.

I poked around on various search systems, MSDN, and eventually found a diagram that purported to explain the layout of Microsoft’s online system. The information appeared in a PowerPoint presentation by Sunjeev Pandey, Senior Director Microsoft.com Operations and Paul Wright, Technology Architect Manager, Microsoft.com Operations. On July 13, 2008 the presentation was available here. The PowerPoint itself does not appear in the Live.com index. I cannot guarantee that this link will remain valid. Important documents about Microsoft’s own architecture are disappearing from MSDN and other Microsoft Web sites. I am reluctant to post the entire presentation even though it does not carry a Microsoft copyright.

I want to spell out the caveats. Some new readers of this Web log assume that I am writing news. I am not. The information in this essay is from June 2006, possibly a few months earlier. Furthermore, as I get new information, I reserve the right to change my mind. This means that I am not asserting absolutes. I am capturing my ideas as if I were Samuel Pepys writing in the 17th century. You want real news? Navigate elsewhere.

My notes suggest that Messrs Pandey and Wright prepared a PowerPoint deck for use in a Web case about Microsoft’s own infrastructure. These Web casts are available, but my Verizon wireless service times out when I try to view them. You may have better luck.

Microsoft.com in 2006

Here is a diagram from the presentation “Microsoft.com: Design for Resilience. The Infrastructure of www.microsoft.com, Microsoft Update, and the Download Center.” The title is important because the focus is narrow compared to the bundle of services explained in Mr. Gray’s Three Talks PowerPoint deck and in Steven Levi and Galen Hunt “Challenges to Building Scalable Services.” In a future essay, I will comment on this shift. For now, let’s look at what Microsoft.com’s architecture may have been in mid-2007.

2006 architecture

Microsoft.com Mid-2006

This architecture represents a more robust approach. Between 1995 and 2006, the number of users rose from 30,000 per day to about 17 million per day. In 2001, the baseline operating system was Windows 2000. The shift to Microsoft’s 64-bit operating system took place in 2005, a year in which (if Messrs Pandey and Wright are correct) Microsoft.com experienced some interesting challenges. For example, international network service was disrupted in May and September of 2005. More tellingly, Microsoft was subject to Denial of Service attacks and experience network failures in April and May of 2005. Presumably, the mid-2006 architecture was designed to address these challenges.

The block diagram makes it clear that Microsoft wanted to deploy an architecture in 2006 that provided excellent availability and better performance via caching. The drawbacks are those that were part of the DNA of the original 1999 design–higher costs due to the scale up and out model and its use of name brand, top quality hardware and the complexity of the system. You can see four distinct tiers in the architecture.

Information has to move from the Microsoft Corp. network to the back end network tier. Then the information must move from the back end to the content delivery tier. Due to the “islands” approach that now includes distributed data centers, the information must propagate across data centers. Finally, the most accessed data or the highest priority information must be make available to the Akamai and Savvis “edge of network” systems. Microsoft, presumably to get engineering expertise and exercise better control of costs, purchased two adjoining data centers from Savvis in mid-2007 for about $200 million. (Note: for comparison purposes, keep in mind that Microsoft’s San Antonio data center cost about $600 to $650 million.)

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta