Not HAL: Computational Intelligence at Google
October 26, 2008
“Thinking Ahead with Google” by Elise Ackerman and Scott Harris is a very good article about a subject near and dear to some Googlers’ hearts–computational intelligence. You must read the full text here. The old term “artificial intelligence” or AI is not too popular. AI is science fiction. Computational intelligence is pragmatic. The story opens with a reference to a 2002 comment by Sergey Brin about the future of search. The analogy was to the HAL computer in “2001: A Space Odyssey”. HAL, as you may recall, went off his rocker. Accordingly, Google is the “borg,” shorthand for cyborg. The subject of “smart software” is not one that turns up in daily newspapers. I commend the San Jose Mercury News for tackling the subject.
Ms Ackerman and Mr. Harris report that Google will support the “Singularity University” announced at a conference called Singularity Summit. The idea is that “smart computing” is important and needs a focal point. The most important comment in the article for me was this:
The meeting was reported by technology writer Nicholas Carr in his blog Rough Type, after one of the participants blogged about it. But don’t look for the item on Google; organizers requested the information be taken down.
My recollection is that Mr. Brin delivered a talk at a Google developer conference in 2007. That talk did not become available on Google’s YouTube.com. Apparently, the support for a better HAL does not extend to making in depth information available. In my opinion, Google is the computational intelligence singularity. Google’s patent documents are chock full of references to smart software; for example, US20070198481 has little smart fellows named janitors running around autonomously. The janitors clean up data and resolve ambiguities in certain procedures. Check out the San Jose Mercury News story and take a peek at how janitors get smart. Like I said, the computational singularity is Google. I’m fuzzy with regards to “Singularity University”. It might be another Google recruiting method. If you are somewhat paranoid, don’t read Kevin Kelly’s “Evidence of a Global SuperOrganism” here. The creature is wearing one of those flashing Google lapel pins.
Stephen Arnold, October 26, 2008
Microsoft Financials and Online
October 26, 2008
Update October 26, 2008: I just read a very interesting comment about Microsoft’s financials at Gizmodo. You can find the full text of the article here. The article’s title was “Microsoft Still Has a Vista Problem.” For me, the key analysis was:
…More people might be PCs lately, but they’re other, less profitable versions. Microsoft makes about $70 per Vista PC, but less than half of that on a netbook Windows license, which now makes up more of the Windows mix than ever. So the Windows division actually saw a 4 percent drop in operating income for the quarter. And it’s likely not going to get better with Windows 7 looming so conspicously on the horizon. [Bits]
Original Post
I was on the road when Microsoft released its financial report for the first quarter of 2009. Pressed for time, I turned to eWeek’s Microsoft Watch. Joe Wilcox does a good job of summarizing the key points about Microsoft’s view of the world. I navigated to “Microsoft Q1 by the Numbers” here and was not disappointed. I noted that Microsoft is on its way toward $70 billion. That’s good, I thought. I scanned down the discussion and noticed that the revenue for “Online Services”, a catchall for Microsoft’s various anti Google- and pro advertising-centric activities, generated $671 million in “Fiscal 2008”. The projection for “Fiscal 2009” was $770 million. Most companies would be thrilled to have a 10th of this revenue, maybe 1/100th. But in the context of Microsoft $770 million is a tiny sliver revenue. Google, on the other hand, generates about $20 billion from this sector. So, despite the solidity of the overall financial performance, I gasped at this disparity. Microsoft has been working hard to close the gap with Google. These numbers suggest that Microsoft hasn’t done a very good job. Even more unsettling was the table in Mr. Wilcox’s article that reports Online Services as a money losing proposition. Here’s the chart that caught my attention:
Mr. Wilcox’s comment is even more telling:
The division hemorrhaged capital yet again, even as Microsoft claims gains. The division lost $480 million on $770 million revenue. Online advertising revenue grew 15 percent year over year to $72 million. Agency revenue from aQuantive topped ad sales at $98 million.
Steve Lohr adds some Vista color to the overall financial results. These data are pretty negative as well. I don’t care too much about desktop operating systems, but you may find the information useful. The story “Microsoft’s Vista Problem by the Numbers” is here.
I don’t have much to add except that unless Microsoft can close the gap, Google will maintain and perhaps increase its lead in the online area. With cloud computing rushing from the horizon to Harrod’s Creek, Kentucky, Google may find that its attack on the enterprise becomes even easier. Google does not have to retool, rework, or reengineer anything. The enterprise is a logical extension of its core. In homage to the late night commercial for an absorbent cloth, “Sham Wow.”
Stephen Arnold, October 26, 2008
Portfolio Magazine on the Microsoft Fast Problem
October 25, 2008
Portfolio Magazine has a solid, interesting story about the police raid on Microsoft Fast in Oslo, Norway, earlier in October 2008. You can read the full text of the story here. A quote from the addled goose found its way into this story. I must admit that my observation that when the police raid a company, seize data, and scurry back to their secure facility, the company has lost control of its future. If I had been the editor on the story, I would have sent my remark to the bit bucket. The Portfolio story summarizes a number of important actions prior to the police raid. These range from board members squabbling to allegations of improper financial dealings to a precipitous drop in revenues without warning shareholders or Wall Street. I know something about Fast Search & Transfer Enterprise Search Platform. I know less about what Microsoft plans to do with that amalgamation of aging code, open source, and acquired technologies. I do know that Microsoft thought it was a great idea to spend $1.23 billion for a vendor whose files and other information are now in the capable hands of Norwegian police. I have some experience with police and intelligence officials in Scandinavia. My impression is that the reputation for investigative and intelligence excellence is well deserved. Microsoft has its hands full with Google. Now the company has to deal with its Google-killing acquisition spending time giving depositions, digging through email for information, and facing the astounding costs of litigation. Microsoft has to close the search gap between itself and Google. Any distraction from this mission is a benefit to Google. I wonder who did the due diligence on this deal for Microsoft. If you know, let me know. I would like to try and interview the person. I bet I could learn something useful.
Stephen Arnold, October 25, 2008
Exalead: Making Headway in the US
October 25, 2008
Exalead, based in Paris, has been increasing its footprint in the US. The company has expanded its US operation and now it is making headlines in information technology publications. The company has updated its enterprise search system CloudView. Peter Sayer’s “Exalead Updates Enterprise Search to Explore Data Cloud” here provides a good summary of the system’s new features. For me, the most important comment in the Network World article was this comment:
Our approach is very different from Google’s in that we’re interested in conversational search,” he [the president of Exalead] said. That ‘conversation’ takes the form of a series of interactions in which Exalead invites searchers to refine their request by clicking on related terms or links that will restrict the search to certain kinds of site (such as blogs or forums), document format (PDF, Word) or language.”
Exalead’s engineering, however, is the company “secret sauce.” My research revealed that Exalead uses many of the techniques first pioneered by AltaVista.com, Google, and Amazon. As a result, Exalead delivers performance on content and query processing comparable to Google’s. The difference is that the Exalead platform has been engineered to mesh with existing enterprise applications. Google’s approach, on the other hand, requires a dedicated “appliance”. Microsoft takes another approach, requiring customers to adopt dozens of Microsoft servers to build a search enabled application.
On a recent trip to Europe, I learned that Exalead is working to make it easy for a licensee to process content from an organization’s servers as well as certain Internet content. Exalead is an interesting company, and I want to dig into its technical innovations. If I unearth some useful information, I will post the highlights. In the meantime, you can get a feel for the company’s engineering from its Web search and retrieval system. The company has indexed eight to nine billion Web pages. You can find the service here.
Stephen Arnold, October 25, 2008
Dead Tree Outfits and Online
October 25, 2008
Reflections of a Newsosaur snagged my attention on October 24, 2005. The article “Voodoo Newspaper Economics” here struck a chord. I have been thinking about the plight of companies whose business model is under siege. Companies don’t have a super hero to rescue them. Even if they did, that super hero would probably get news on a mobile device. I don’t think there is a super hero able to come to the rescue of what I call “dead tree outfits.” The Newsosaur must have been on my wavelength. You must read the Newsosaur’s analysis. For me, the most compelling point in the write up was:
For the record, the secular forces dragging down newspapers are: Declining readership, shrinking advertising, high fixed costs and growing online competition that makes it increasingly difficult to charge the premium ad rates that were possible prior to the Internet.
None of these points shouts, “Digital.” But in my opinion, these “secular forces” are subject to some painful economic realities. For example, declining readership is a function of demographics. This means that those who are fond of print newspapers are a declining species. Without eyeballs, ad revenue flags. The online competition may be surprised to find itself as a cause of traditional publishing’s problems. Today’s online ecosystem flourished around the dead tree outfits, swarming over the traditional publishers’ online efforts like kudzu. So now we have citizen Web log writers with audiences larger than some daily newspapers. The torch has been passed, and its sparks are setting the dead tree outfits on fire. To put out the blaze, the dead tree outfits pour red ink on the blaze. Not surprisingly, the consequences are unpleasant.One other point in the Newsosaur’s article warrants highlighting; to wit:
If the company abandoned print but were able to double its online sales to $20 million, it would lose $14 million in a year, for an operating margin of a negative 70%. To break even, the prototypical publication would have to more than triple its sales from the current levels. To make a profit of 15%, the company would have to quadruple it sales.
When I read these words, the conclusion seems obvious. Dead tree outfits will fall in the forest. Will anyone hear? Will anyone care? I like traditional newspapers. In a few years, folks like me will be playing bingo in the retirement village. The demographics, not the economics, put the final nails in some traditional publishing companies’ coffins.Stephen Arnold, October 25, 2008
Twine’s Semantic Spin on Bookmarks
October 25, 2008
Twine is a company committed to semantic technology. Semantics can be difficult to define. I keep it simple and suggest that semantic technology allows software to understand the meaning of a document. Semantic technology finds a home inside of many commercial search and content processing systems. Users, however, don’t tinker with the semantic plumbing. Users take advantage of assisted navigation, search suggestions, or a system’s ability to take a single word query and automatically hook the term to a concept or make a human-type connection without a human having to do the brain work.
Twine, according to the prestigious MIT publication Technology Review, is breaking new ground. Erica Naone’s article “Untangling Web Information: The Semantic Web Organizer Twine Offers Bookmarking with Built In AI” stop just short of a brass band enhanced endorsement but makes Twine’s new service look quite good. You must read the two part article here. For me, the most significant comment was:
But Jim Hendler, a professor of computer science at Rensselaer Polytechnic Institute and a member of Twine’s advisory board, says that Semantic Web technologies can set Twine apart from other social-networking sites. This could be true, so long as users learn to take advantage of those technologies by paying attention to recommendations and following the threads that Twine offers them. Users could easily miss this, however, by simply throwing bookmarks into Twine without getting involved in public twines or connecting to other users.
Radar Networks developed Twine. The metaphor of twine invokes for me a reminder of the trouble I precipitated when I tangled my father’s ball of hairy, fibrous string. My hunch is that others will think of twine as tying things together.
You will want to look at the Twine service here. Be sure to compare it to the new Microsoft service U Rank. The functions of Twine and U Rank are different, yet both struck me as sharing a strong commitment to sharing and saving Web information that is important to a user. Take a look at IBM’s Dogear. This service has been around for almost a year, yet it is almost unknown. Dogear’s purpose is to give social bookmarking more oomph for the enterprise. You can try this service here.
As I explored the Twine service and refreshed my memory of U Rank and Dogear, several thoughts occurred to me:
- Exposing semantic technology in new services is a positive development. The more automatic functions can be a significant time saver. A careless user, however, could lose sight of what’s happening and shift into cruise control mode, losing sight of the need to think critically about who recommends what and from where information comes.
- Semantic technology may be more useful in the plumbing. As search enabled applications supplant key word search, putting too much semantic functionality in front of a user could baffle some people. Google has stuck with its 1950s, white refrigerator interface because it works. The Google semantic technology hums along out of sight.
- The new semantic services, regardless of the vendor developing them, have not convinced me that they can generate enough cash to stay alive. The Radar Networks and the Microsofts will have to more than provide services that are almost impossible to monetize. IBM’s approach is to think about the enterprise, which may be a better revenue bet.
I am enthusiastic about semantic technology. User facing applications are in their early days. More innovation will be coming.
Stephen Arnold, October 25, 2008
Google’s Cloud Computing Infrastructure Lead May Be Growing
October 24, 2008
Cloud computing has become commonplace. In the last 48 hours, Amazon pumped steroids into the Amazon Web Services product line. To refresh your memory, check out this write up by Andrea James in the Seattle Tech Report here. Rumors have been flying about Microsoft’s cloud ambitions. Information about “Strata” is fuzzy like a cirrus cloud, Microsoft executives have been providing forecasts of a bold new service offering. For a useful recap of this rumor, read Chris Crum’s “Microsoft’s Next OS a Nod to the Stratosphere” in Web Pro News here. Other vendors blasting off from mother earth to loftier realms include IBM, Intel, Rackspace, and other big name firms.
One of the most interesting documents I have read in months is a forthcoming technical paper from Microsoft’s Albert Greenberg, Paranta Lahiri, David Maltz, Parveen Patel, and Sudipta Sengupta. The paper is available from the ACM as document 978-1-60558-181-1/08/08. I have a hard copy in my hand, and I can’t locate a valid link to an online version. The ACM or a for fee database may help you get this document. In a nutshell, “Towards a Next Generation Data Center Architecture: Scalability and Commoditization” explains some of the technical innovations Microsoft is implementing to handle cloud-based, high-demand, high-availability applications. Some of the information in the paper surprised me. The innovations provide a good indication of the problems Microsoft faced in its older, pre-2008 data centers. It was clear to me that Microsoft is making progress, and some of the methods echo actions Google took as long ago as 1998.
What put the Amazon and Microsoft cloud innovations into sharp relief for me was US2008/0262828 “Encoding and Adaptive Scalable Accessing of Distributed Models.” You can download a copy of this document from the easy-to-use USPTO system. Start here to obtain the full text and diagrams for this patent application. Keep in mind that a patent application does not mean that Google has or will implement the systems and methods disclosed. What the patent application provides is a peep hole through which we can look at some of the thinking that Google is doing with regard to a particular technical issue. The peep hole may be small, but what I saw when I read the document and reviewed the drawings last night (October 24, 2008) sparked my thinking.
Before offering my opinion, let’s look at the abstract for this invention, filed in February 2006 in a provisional application. Keep in mind that we are looking in the rear view mirror here, not at where Google might be today. This historical benchmark is significant when you compare what Amazon and Microsoft are doing to deal with the cloud computing revolution that is gaining momentum. Here’s Google’s summary of the invention:
Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.
In typical Google style, there’s a certain economy to the description of an invention involving such technical luminaries as Jeff Dean and 12 other Googlers. The focus of the invention is on-the-fly machine translation. However, the inventors make it clear that the precepts of this invention can be applied to other applications as well. As you may know, Google has expanded its online translation capability in the last few months. If you have not explored this service, navigate to http://translate.google.com and try out the system.
The claims for this patent document are somewhat more specific. I can’t run through the 91 claims in this patent document. I can highlight one, and I will leave review of the other 90 to you. Claim 5 asserted:
The system of claim 4, wherein: the translation server comprises: a plurality of segment translation servers each operable to communicate with the translation model server, the language model servers and replica servers, each segment translation server operable to translate one segment of the source text into the target language, a translation front end to receive the source text and to divide the source text into a plurality of segments in the source language, and a load balancing module in communication with the translation front end to receive the segments of the source text and operable to distribute the segments to the segments to the segment translation servers for translation based on work load at the segment translation servers, the load balancing module further operable to direct translated segments in the target language from the segment translation servers to the translation front end.
The claim makes reasonably clear the basic nesting architecture of Google’s architecture. What impressed me is that this patent document, like other recent Google applications, makes use of an infrastructure as platform. The computational and input output tasks are simply not an issue. Google pretty clearly feels it has the horsepower to handle ad hoc translation in real time without worrying about how data are shoved around within the system. As a result, higher order applications that were impossible even for certain large government agencies can be made available without much foot dragging. I find this remarkable.
This patent document, if Google is doing what the inventors appear to be saying, is significantly different from the innovations I just mentioned from such competitors as Amazon and Microsoft. Google in my opinion is making it clear that it has a multi-year lead in cloud computing.
The thoughts that I noted as I worked thorough the 38 pages of small print in this patent document were:
- Google has shifted from solving problems in distributed, massively parallel computing to developing next-generation cloud-centric applications. Machine translation in real time for a global audience for free means heavy demand. This invention essentially said to me, “No problem.”
- Google’s infrastructure will become more capable as Google deploys new CPUs and faster storage devices. Google, therefore, can use its commodity approach to hardware and experience significant performance gains without spending for exotic gizmos or try to hack around bottlenecks such as those identified in the Microsoft paper referenced above.
- Google can, with the deployment of software, deliver global services that other companies cannot match in terms of speed of deployment, operation, and enhancement.
I may be wrong and I often am but I think Google is not content with its present lead over its rivals. I think this patent document is an indication that Google can put its foot on the gas pedal at any time and operate in a dimension that other companies cannot. Do you agree? Disagree? Let me learn where I am off base. Your view is important because I am finishing a write up for Infonortics about Google and publishing. Help me think straight. I even invite Cyrus to chime in. The drawings in this patent application are among Google’s best that I have seen.
Stephen Arnold, October 24, 2008
Time May Be Running Out for the New York Times
October 24, 2008
Henry Blodget’s “New York Times (NYT) Running on Fumes” is an important Web post. You can read the full text here. The New York Times was one of the main revenue drivers for the Nexis news service. Lexis was the legal side of the online service that moved from start up to Mead Paper and eventually to Reed Elsevier, the Frankenstein company with Dutch and English ownership. Along the way, the NYT decided to pull its full text content from the Nexis service. The NYT, like many newsosaurs, assumed that its print reputation would translate to riches for the New York Times Co. What happened was that Nexis never regained its revenue horsepower. The NYT floundered in indexing, online, and its “new media” operations. I find it amusing to reflect on the unexpected consequences, the New York Times’s online decisions triggered. Indeed, some of today’s challenges are outgrowths of management’s inability to think at an appropriate level of abstraction about the impact of online on traditional “dead tree” operations.
Mr. Blodget’s analysis summarizes a quarter century of operations in an increasingly online world. The result is a potential financial crisis for the Gray Lady, as the newspaper is fondly known. For me, the most important comment in Mr. Blodget’s analysis which you will want to read in its entirety was:
The company has only $46 million of cash. It appears to be burning more than it is taking in–and plugging the hole with debt. Specifically, it is funding operations by rolling over short-term loans–the kind that banks worldwide are canceling…
When I read this passage, I immediately visualized another BearStearns’s meltdown with confused professionals so confident of their future and power wandering around on New York sidewalks with banker boxes. If Mr. Blodget’s analysis is accurate (and I think it is dead on), changes will be coming to the New York Times. I anticipate downsizing, crazy pop ups on the online service, and a smaller news hole. My daily delivery in rural Kentucky is likely to be replaced with a US mail option. Someone will step forward and buy the property, maybe Rupert Murdoch, maybe a billionaire with a yen to control a major US daily?
Do you think the New York Times could have saved itself with a more prescient online strategy? I do. Agree? Disagree? Help me learn.
Stephen Arnold, October 24, 2008
Silobreaker: Two New Services Coming
October 24, 2008
I rarely come across real news. In London, England, last week I uncovered some information about Silobreaker‘s new services. I have written about Silobreaker before here and interviewed one of the company’s founders, Mats Bjore here. In the course of my chatting with some of the people I know in London, I garnered two useful pieces of intelligence. Keep in mind that the actual details of these forthcoming services may vary, but I am 99% certain that Silobreaker will introduce:
Contextualized Ad Retrieval in Silobreaker.com.
The idea is that Silobreaker’s “smart software” called a “contextualization engine” will be applied to advertising. The method understands concepts and topics, not just keywords. I expect to see Silobreaker offering this system to licensees and partners. What’s the implication of this technology? Obviously, for licensees, the system makes it possible to deliver context-based ads. Another use is for a governmental organization to blend a pool of content with a stream of news. In effect, when certain events occur in a news or content stream, an appropriate message or reminder can be displayed for the user. I can think of numerous police and intelligence applications for this blend of static and dynamic content in operational situations.
Enterprise Media Monitoring & Analysis Service
The other new service I learned about is a fully customizable online service that delivers a simple and effective way for enterprise customers to handle the entire work flow around their media monitoring and analysis needs. While today’s media monitoring and news clipping efforts remain resource intensive, Silobreaker Enterprise will be a subscription-based service that will automate much of the heavy lifting that either internal or external analysts must perform by hand. The Silobreaker approach is to blend–a key concept in the Silobreaker technical approach–in a single intuitive user interface disparate yet related information. The enterprise customers will be able to define monitoring targets, trigger content aggregation, perform analyses, and display results in a customized web-service. A single mouse click allows a user to generate a report or receive an auto-generated PDF report in response to an event of interest. Silobreaker has also teamed up with a partner company to add sentiment analysis to its already comprehensive suite of analytics. Currently in final testing phase with large multinational corporate test-users and due to be released at end of 2008/early 2009.
Silobreaker is a leader in search enabled intelligence applications. Check out the company at www.silobreaker.com. A happy quack to the reader who tipped me on these Silobreaker developments.
Stephen Arnold, October 23, 2008
Able2Act: Serious Information, Seriously Good Intelligence
October 23, 2008
Remember Silobreaker? The free online aggregator provides current events news through a contextual search engine. One of its owners is Infosphere, an intelligence and knowledge strategy consulting business. Infosphere also offers a content repository called able2act.com. able2act delivers structured info in modules. For example, there are more than 55 000 detailed biographies, 200,000-plus contacts in business and politics, company snapshots, and analyst notebook files, among others. Modules cover topics like the Middle East, global terrorism, and information warfare. Most of the data, files, and reports are copyrighted by Infosphere, a small part of the informatioin is in the public domain. Analysts update able2act to the tune of 2,000 records a week. You access able2act by direct XML/RSS feed, the Web site, or even feed into your in-house systems. The database search can be narrowed by making module searches, such as searching keywords only in the “tribes” module. We were able to look up the poorly reported movements of the Gandapur tribe in Afghanistan. Please, take a look at the visual demonstration is available online here. We found it quite good. able2act is available by subscription. The price for a government agency to get full access to all modules starts at $70,000 a year. Only certain modules are available to individual subscribers. You can get more details by writing to opcenter at infosphere.se.
Stephen Arnold, October 23, 2008