Google’s Cloud Computing Infrastructure Lead May Be Growing
October 24, 2008
Cloud computing has become commonplace. In the last 48 hours, Amazon pumped steroids into the Amazon Web Services product line. To refresh your memory, check out this write up by Andrea James in the Seattle Tech Report here. Rumors have been flying about Microsoft’s cloud ambitions. Information about “Strata” is fuzzy like a cirrus cloud, Microsoft executives have been providing forecasts of a bold new service offering. For a useful recap of this rumor, read Chris Crum’s “Microsoft’s Next OS a Nod to the Stratosphere” in Web Pro News here. Other vendors blasting off from mother earth to loftier realms include IBM, Intel, Rackspace, and other big name firms.
One of the most interesting documents I have read in months is a forthcoming technical paper from Microsoft’s Albert Greenberg, Paranta Lahiri, David Maltz, Parveen Patel, and Sudipta Sengupta. The paper is available from the ACM as document 978-1-60558-181-1/08/08. I have a hard copy in my hand, and I can’t locate a valid link to an online version. The ACM or a for fee database may help you get this document. In a nutshell, “Towards a Next Generation Data Center Architecture: Scalability and Commoditization” explains some of the technical innovations Microsoft is implementing to handle cloud-based, high-demand, high-availability applications. Some of the information in the paper surprised me. The innovations provide a good indication of the problems Microsoft faced in its older, pre-2008 data centers. It was clear to me that Microsoft is making progress, and some of the methods echo actions Google took as long ago as 1998.
What put the Amazon and Microsoft cloud innovations into sharp relief for me was US2008/0262828 “Encoding and Adaptive Scalable Accessing of Distributed Models.” You can download a copy of this document from the easy-to-use USPTO system. Start here to obtain the full text and diagrams for this patent application. Keep in mind that a patent application does not mean that Google has or will implement the systems and methods disclosed. What the patent application provides is a peep hole through which we can look at some of the thinking that Google is doing with regard to a particular technical issue. The peep hole may be small, but what I saw when I read the document and reviewed the drawings last night (October 24, 2008) sparked my thinking.
Before offering my opinion, let’s look at the abstract for this invention, filed in February 2006 in a provisional application. Keep in mind that we are looking in the rear view mirror here, not at where Google might be today. This historical benchmark is significant when you compare what Amazon and Microsoft are doing to deal with the cloud computing revolution that is gaining momentum. Here’s Google’s summary of the invention:
Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.
In typical Google style, there’s a certain economy to the description of an invention involving such technical luminaries as Jeff Dean and 12 other Googlers. The focus of the invention is on-the-fly machine translation. However, the inventors make it clear that the precepts of this invention can be applied to other applications as well. As you may know, Google has expanded its online translation capability in the last few months. If you have not explored this service, navigate to http://translate.google.com and try out the system.
The claims for this patent document are somewhat more specific. I can’t run through the 91 claims in this patent document. I can highlight one, and I will leave review of the other 90 to you. Claim 5 asserted:
The system of claim 4, wherein: the translation server comprises: a plurality of segment translation servers each operable to communicate with the translation model server, the language model servers and replica servers, each segment translation server operable to translate one segment of the source text into the target language, a translation front end to receive the source text and to divide the source text into a plurality of segments in the source language, and a load balancing module in communication with the translation front end to receive the segments of the source text and operable to distribute the segments to the segments to the segment translation servers for translation based on work load at the segment translation servers, the load balancing module further operable to direct translated segments in the target language from the segment translation servers to the translation front end.
The claim makes reasonably clear the basic nesting architecture of Google’s architecture. What impressed me is that this patent document, like other recent Google applications, makes use of an infrastructure as platform. The computational and input output tasks are simply not an issue. Google pretty clearly feels it has the horsepower to handle ad hoc translation in real time without worrying about how data are shoved around within the system. As a result, higher order applications that were impossible even for certain large government agencies can be made available without much foot dragging. I find this remarkable.
This patent document, if Google is doing what the inventors appear to be saying, is significantly different from the innovations I just mentioned from such competitors as Amazon and Microsoft. Google in my opinion is making it clear that it has a multi-year lead in cloud computing.
The thoughts that I noted as I worked thorough the 38 pages of small print in this patent document were:
- Google has shifted from solving problems in distributed, massively parallel computing to developing next-generation cloud-centric applications. Machine translation in real time for a global audience for free means heavy demand. This invention essentially said to me, “No problem.”
- Google’s infrastructure will become more capable as Google deploys new CPUs and faster storage devices. Google, therefore, can use its commodity approach to hardware and experience significant performance gains without spending for exotic gizmos or try to hack around bottlenecks such as those identified in the Microsoft paper referenced above.
- Google can, with the deployment of software, deliver global services that other companies cannot match in terms of speed of deployment, operation, and enhancement.
I may be wrong and I often am but I think Google is not content with its present lead over its rivals. I think this patent document is an indication that Google can put its foot on the gas pedal at any time and operate in a dimension that other companies cannot. Do you agree? Disagree? Let me learn where I am off base. Your view is important because I am finishing a write up for Infonortics about Google and publishing. Help me think straight. I even invite Cyrus to chime in. The drawings in this patent application are among Google’s best that I have seen.
Stephen Arnold, October 24, 2008
Time May Be Running Out for the New York Times
October 24, 2008
Henry Blodget’s “New York Times (NYT) Running on Fumes” is an important Web post. You can read the full text here. The New York Times was one of the main revenue drivers for the Nexis news service. Lexis was the legal side of the online service that moved from start up to Mead Paper and eventually to Reed Elsevier, the Frankenstein company with Dutch and English ownership. Along the way, the NYT decided to pull its full text content from the Nexis service. The NYT, like many newsosaurs, assumed that its print reputation would translate to riches for the New York Times Co. What happened was that Nexis never regained its revenue horsepower. The NYT floundered in indexing, online, and its “new media” operations. I find it amusing to reflect on the unexpected consequences, the New York Times’s online decisions triggered. Indeed, some of today’s challenges are outgrowths of management’s inability to think at an appropriate level of abstraction about the impact of online on traditional “dead tree” operations.
Mr. Blodget’s analysis summarizes a quarter century of operations in an increasingly online world. The result is a potential financial crisis for the Gray Lady, as the newspaper is fondly known. For me, the most important comment in Mr. Blodget’s analysis which you will want to read in its entirety was:
The company has only $46 million of cash. It appears to be burning more than it is taking in–and plugging the hole with debt. Specifically, it is funding operations by rolling over short-term loans–the kind that banks worldwide are canceling…
When I read this passage, I immediately visualized another BearStearns’s meltdown with confused professionals so confident of their future and power wandering around on New York sidewalks with banker boxes. If Mr. Blodget’s analysis is accurate (and I think it is dead on), changes will be coming to the New York Times. I anticipate downsizing, crazy pop ups on the online service, and a smaller news hole. My daily delivery in rural Kentucky is likely to be replaced with a US mail option. Someone will step forward and buy the property, maybe Rupert Murdoch, maybe a billionaire with a yen to control a major US daily?
Do you think the New York Times could have saved itself with a more prescient online strategy? I do. Agree? Disagree? Help me learn.
Stephen Arnold, October 24, 2008
Silobreaker: Two New Services Coming
October 24, 2008
I rarely come across real news. In London, England, last week I uncovered some information about Silobreaker‘s new services. I have written about Silobreaker before here and interviewed one of the company’s founders, Mats Bjore here. In the course of my chatting with some of the people I know in London, I garnered two useful pieces of intelligence. Keep in mind that the actual details of these forthcoming services may vary, but I am 99% certain that Silobreaker will introduce:
Contextualized Ad Retrieval in Silobreaker.com.
The idea is that Silobreaker’s “smart software” called a “contextualization engine” will be applied to advertising. The method understands concepts and topics, not just keywords. I expect to see Silobreaker offering this system to licensees and partners. What’s the implication of this technology? Obviously, for licensees, the system makes it possible to deliver context-based ads. Another use is for a governmental organization to blend a pool of content with a stream of news. In effect, when certain events occur in a news or content stream, an appropriate message or reminder can be displayed for the user. I can think of numerous police and intelligence applications for this blend of static and dynamic content in operational situations.
Enterprise Media Monitoring & Analysis Service
The other new service I learned about is a fully customizable online service that delivers a simple and effective way for enterprise customers to handle the entire work flow around their media monitoring and analysis needs. While today’s media monitoring and news clipping efforts remain resource intensive, Silobreaker Enterprise will be a subscription-based service that will automate much of the heavy lifting that either internal or external analysts must perform by hand. The Silobreaker approach is to blend–a key concept in the Silobreaker technical approach–in a single intuitive user interface disparate yet related information. The enterprise customers will be able to define monitoring targets, trigger content aggregation, perform analyses, and display results in a customized web-service. A single mouse click allows a user to generate a report or receive an auto-generated PDF report in response to an event of interest. Silobreaker has also teamed up with a partner company to add sentiment analysis to its already comprehensive suite of analytics. Currently in final testing phase with large multinational corporate test-users and due to be released at end of 2008/early 2009.
Silobreaker is a leader in search enabled intelligence applications. Check out the company at www.silobreaker.com. A happy quack to the reader who tipped me on these Silobreaker developments.
Stephen Arnold, October 23, 2008
Able2Act: Serious Information, Seriously Good Intelligence
October 23, 2008
Remember Silobreaker? The free online aggregator provides current events news through a contextual search engine. One of its owners is Infosphere, an intelligence and knowledge strategy consulting business. Infosphere also offers a content repository called able2act.com. able2act delivers structured info in modules. For example, there are more than 55 000 detailed biographies, 200,000-plus contacts in business and politics, company snapshots, and analyst notebook files, among others. Modules cover topics like the Middle East, global terrorism, and information warfare. Most of the data, files, and reports are copyrighted by Infosphere, a small part of the informatioin is in the public domain. Analysts update able2act to the tune of 2,000 records a week. You access able2act by direct XML/RSS feed, the Web site, or even feed into your in-house systems. The database search can be narrowed by making module searches, such as searching keywords only in the “tribes” module. We were able to look up the poorly reported movements of the Gandapur tribe in Afghanistan. Please, take a look at the visual demonstration is available online here. We found it quite good. able2act is available by subscription. The price for a government agency to get full access to all modules starts at $70,000 a year. Only certain modules are available to individual subscribers. You can get more details by writing to opcenter at infosphere.se.
Stephen Arnold, October 23, 2008
Two New Animals: Newsosaur and Yahoosaur
October 22, 2008
Alan D. Mutter’s “Reflection of a Newsosaur” is a very good Web log post. You can find the Web log at http://newsosaur.blogspot.com and “Fat Newspaper Profits Are History” here. Mr. Mutter points out that newspapers are going to have to live with declining profits. He cites a number of papers that have debt that adds to broader sector woes such as declines in sales and circulation. He does a solid job of explaining the interplay of certain cost factors for publishers. His analysis does not apply just to newspapers. Any book, magazine, or journal publisher cranking out hard copies faces the same set of problems. The data in this article are worth saving because he has done a better job of identifying key figures and metrics than some of the high-priced consultants hired to help traditional publishers adapt to today’s business realities. For me, the keystone comment in Mr. Mutter’s analysis was:
Although the economy will recover in the fullness of time, there are very real doubts about whether newspapers still have the time, resources and ingenuity to migrate to a viable new financial model to assure their long-term survival.
After reading this article, I realized that traditional publishers, not the author of the Web log, are Newsosaur. What also occurred to me was that Yahoo is becoming a high profile Yahoosaur. As a 15 year old Internet company, Yahoo’s management faces problems that its business model and management pool cannot easily resolve.
Keep in mind that newsosauri are trapped in the dead tree problem; that is, a fungible product in an environment where young people don’t buy newspapers or read them the way their parents and grandparents did. Advertisers want to be in front of eyeballs attached to people who will buy products and services.
Yahoo may be the first identified Yahoosaur. The company’s financial results and the layoffs are not good news. The deal with Google may be in jeopardy. Yahoo’s home run technology plays like the push to open source and BOSS may not have the traction to dig the company out of its ecological niche. I think the Yahoosaur and the Newsosaur are related.
Mr. Mutter provides a useful description of the traditional publishing company woes. Perhaps he will turn his attention to the Yahoosaur.
Stephen Arnold, October 22, 2008
Nutter on the Future of Search
October 22, 2008
Blaise Nutter’s “Three Companies That Will Change How We Search” here offers an interesting view of three vendors who are competing with Google. The premise of the article is that there is room for search innovation. The five page write up profiles and analyzes Blinkx (video search spin out from some folks at Autonomy), Mahalo (journalist turned search entrepreneur Jason Calcanis), and Cuil (Anna Patterson and assorted wizards from Google, IBM, and elsewhere). As I understand the analysis, the hook is different for each company; for example:
- Blinkx. Indexes the content in the video, not just be metadata, for 26 million videos
- Mahalo. Community search engine with humans not software doing the picking of results
- Cuil. A big index with a magazine style layout.
The conclusion of the article is that innovation is possible and that each of these sites does a better job of addressing user privacy.
For me, the most interesting comment in the write up was this comment:
David and Goliath fought on a level battlefield, but Google doesn’t.
My view on each of these search systems is a bit different from Mr. Nutter’s. I do agree that Google presents a large challenge to search start ups. In fact, until a competitor can leap frog Google, I doubt that users will change their surfing behavior regardless of Google’s policy regarding privacy. Google monitors to make money. Money is needed to scale and provide “free” search.
This brings me to the difference between Mr. Nutter’s analysis and mine. First, for any of these services to challenge Google in a meaningful way, the companies are going to need cash, lots of cash. In today’s economic climate, I think that these firms can get some money, but the question is, “Will it be enough if Google introduces substantially similar features?” Second, each of these services, according to Mr. Nutter, offers features Google doesn’t provide. I don’t agree. Google is indexing content of videos and audios. In fact I wrote about a patent application that suggests Google is gearing up for more services in this area here. Google is essentially social, which is a big chunk of the notion of user clicks. The “ig” or individualized Google offers a magazine style layout if you configure the new “ig” interface to do it. It’s not Cuil, but it’s in the ballpark.
For me, the question is, “What services are implementing technology that has the potential to leap frog Google as Google jumped ahead of AltaVista.com, MSN.com, and Yahoo.com in 1998? In my opinion it’s none of these three services profiled by Mr. Nutter. “Let many flowers bloom”. But these have to be of hearty stock, have the proper climate, and plenty of nurturing. None of these three services is out of the greenhouse and into the real world, and I think their survival has to be proven, not assumed. Search innovations are often in the eye of the beholder, not in the code of the vendor.
Stephen Arnold, October 20, 2008
Dataspaces in Denmark: The 2008 Boye Conference
October 22, 2008
Earlier this year, the engaging Janus Boye asked me to give a talk and offer a tutorial at his content management and information access conference. The program is located here, and you will see a line up that tackles some of the most pressing issues facing organizations today. The conference is held in Arhus, Denmark. My first visit was a delight. I could walk to a restaurant and connect. Arhus may be one of the most wired and wireless savvy cities I’ve visited.
About a year ago, before Google decided I was Kentucky vermin, I discovered in the open source literature, a reference to a technology with which I was not familiar. In the last year, I have pulled this information thread. After much work, I believe I have discovered the basics of one of Google’s most interesting and least known technology initiatives.
Source: http://www.lohninger.com/helpcsuite/img/kohonen1.gif
Unlike some of the other innovations I have described in my 2005 The Google Legacy and my 2007 Google Version 2.0 reports. Those documents relied extensively on Google’s own patent documents. This most recent discovery reports information in Bell Labs’s patents, various presentations by Google researchers, and published journal articles with unusual names; for example, “Information Manifold”. The research also pointed to work at Stanford University and a professor who, I believe, has been involved to some degree with Google’s team leader. I also learned of a Google acquisition in 2006, which does not appear in the Wikipedia list of Google acquisitions. Although the deal was reported in several Web logs, no one dug into the company’s technology or its now-dark classified ad site.
Google Gets Input from Arkansas Church
October 22, 2008
Ah, the great and wise Google received some input from the New Hope Fellowship in the high-tech center, Springdale, Arkansas. Harrod’s Creek, Kentucky, takes a back seat to the folks in Springdale, Arkansas. Will Google listen? Hard to say. You can read the story of inputs in Juan Carlos Perez’s “Google Fixes Problem with Apps Start Page” here. The church was nuked with Google’s careless coding. Mr. Perez quotes the church’s media director, one John Jenkins as advising Google:
Our users were trained to access their mail through the Start page. Once that didn’t work, they could not access e-mail, which is critical to our work. We had to send paper memos around on how to access the mail without going through the Start page. Very frustrating. Google must improve communication with business customers if they wish to be competitive in the corporate IT space. The 2-sentence ‘we’re working on it’ blurbs posted in the [online discussion] groups are an unacceptable way to treat business clients.
Will Google accept advice from New Hope Fellowship? In my opinion, Google is Googley. I’m not. You may not be. The New Hope outfit is probably not Googley or the church person would have figured out how to get the mail despite the outage. What about Einstein’s “wise one”? Nah, he doesn’t work at Google. Just read the Google blurbs.
Stephen Arnold, October 20, 2008
Cloud Computing: What’s Required
October 20, 2008
Seeking Alpha ran a long analysis by Gregory Ness titled “Cloud Computing: What Are the Barriers to Entry and IT Diseconomies.” I thought the analysis was quite good. Not surprisingly, I had several thoughts occur to me, but I find it stimulating to read thoughtful work by an individual who approaches a subject in a helpful, informative way. You can find the full text here. The most useful portion of the write up for was the discussion of the infrastructure. The gap between Google and the also-rans in the Web search game boil down to plumbing. Mr. Ness understands its importance. I don’t agree with his assertion that we have entered “Infrastructure 2.0.” My view is that Google built on AltaVista.com’s experiences and applied itself to addressing fundamental issues such as file and record locks and unlocks, minimizing message overhead in massively parallel systems, and confronting the problems of traditional Codd database structures in its first year or two of existence. Since that time, Google has continued to make incremental improvements in its decade old system. Companies trying to catch Google are not going to get very far if those firms try to embrace Infrastructure 2.0 as more than a word envelope. Amazon–a company which seems to get more mileage from modest R&D and information technology investments than others–has made good progress, but I doubt that its engineering foundation is as robust as Google’s. But Google, like Amazon, can fall over as the recent Gmail outage proves. Nevertheless, plumbing is important. When I was wandering around Crete, I saw some ruins that we thousands of years old. Those ruins had terracotta water drains visible. Plumbing is old stuff. I don’t think archaeologists talk about “Plumbing 2.0.” Despite my dislike of the “2.0” reference, this is a good bit of work. A happy quack to Mr. Ness.
Stephen Arnold, October 20, 2008
Boom Is Lowered Gently on Yahoo
October 20, 2008
Kara Swisher lowers the boom on Yahoo gently in “What Yahoo’s Looming Costs Cuts Actually Mean (Not as Many Layoffs as You Think), which appeared on October 17, 2008. The hook for the write up is Yahoo’s firing people. I won’t cite a number because whatever that number is it won’t mean as many as I think. With regards to Yahoo, I don’t think much about layoffs. These are inevitable, and regardless of what the company will do in the next three or four months, Yahoo’s sitting on a cost time bomb. Nuking employees won’t do much. If you are a believer in Yahoo, you will enjoy the new announcements cogently summarized by ReadWriteWeb here.
Here’s what my research has turned up.
Yahoo has numerous search systems, search licenses, search initiatives, and search technologies. Today it is desirable to have a less heterogeneous technical sandbox. Not at Yahoo. Overture has a primitive search system, which I could no longer find on the redesigned Yahoo site. No problem because traffic for Yahoo advertising seems to be stable or gently undulating like long slow waves in the moonlight. There are two “flavors” of email and search delivered from the Stata Labs acquisition. No problem. Since the acquisition of Stata Labs, I can find email in the Yahoo system. There’s the Web search. Again no problem it is neither better or worse than Google’s Web search but Google has carried the day for now. There’s Flickr search. There’s other search systems kicking around. One reader reminded me that Yahoo’s real shopping search is Kelkoo. More information here. You could fiddle with the InQuira powered help search system until recently. I like using it to locate “cancel service”. Give Help a whirl here. For a laugh look at this attempt to “improve” Yahoo help.
If I am happy with these different search systems in general, why do I think collectively these very same systems are Yahoo’s cost time bomb. Three reasons:
- It costs money to maintain different systems. Staff, consultants, hardware. The more an organization has, the more it must spend for information technology.
- Heterogeneous systems means staff are not easily interchangeable. This means that Yahoo has to either hire more consultants or live with hacks that may operate like small roadside improvised explosive devices. Yahoo doesn’t know when a fix is going to create a problem elsewhere. These are unbudgeted fixes until one goes pop. CFOs don’t like this type of pop.
- Adding a new feature or function means that Yahoo either has to pick a horse to ride, thus keeping other systems in a position of imposed obsolescence or find a wizard who can produce a fix that works across heterogeneous systems. If this path is followed, see item 2 above.
Yahoo is busy creating new, new things. The hard fact for Yahoo is that much of the underpinnings are old, old things. You don’t fix these problems by firing people. You fix these problems by facing the reality of the infrastructure and making even more difficult decisions about technology, actions, and services. Firing people is expedient, and it will grease the skids for whatever Yahoo’s current pet consultant company recommends. But these steps, like Ms. Swisher’s analysis, lowers the boom gently on a ship that is struggling with flawed engineering. The ship, gently reader, she is not sea worthy.
Stephen Arnold, October 21, 2008

