Google and Disruption: Will It Work Tomorrow?

April 15, 2010

Editor’s Note: The text in this article is derived from the notes prepared by Stephen E Arnold’s keynote talk on April 15, 2010. He delivered this speech as part of Slovenian Information Days in Portoroz, Slovenia.

Thank you, Mr. Chairman. I am most grateful for the opportunity to address this group and offer some observations about Google and its disruptive tactics.

I started tracking Google’s technical inventions in 2002. A client, now out of business, asked me to indicate if “Google really had something solid.”

My analysis showed a platform diagram and a list of markets that Google was likely to disrupt. I captured three ideas in my 2005 monograph “The Google Legacy“, which is still timely and available from Infonortics Ltd. in Tetbury, Glos.

The three ideas were:

First, Google had figured out how to add computing capacity, including storage, using mostly commodity hardware. I estimated the cost in 2002 dollars as about one-third what companies like Excite, Lycos, Microsoft, and Yahoo and were paying.

Second, Google had solved the problem of text search for content on Web pages. Google’s engineers were using that infrastructure to deliver other types of services. In 2002, there were rumors that Google was experimenting with services that ranged from email to an online community / messaging system. One person, whose name I have forgotten, pointed out that Google’s internal network MOMA was the test bed for this type of service.

Third, Google was not an invention company. Google was an applied research company. The firm’s engineers, some of whom came from Sun Microsystems and AltaVista.com, were adepts at plucking discoveries from university research computing tests and hooking them into systems that were improvements on what most companies used for their applications. The genius was focus and selection and integration.

Google is an information factory, a digital Rouge River construct. Raw materials enter at one end and higher value information products and services come out at the other end of the process.

In my second Google monograph, funded funded in part by another client, I built upon my research into technology and summarized Google’s patent activities between 2004 and mid 2007. Google Version 2.0: The Calculating Predator, also published by Infonortics Ltd., disclosed several interesting facts about the company.

First, Google’s inventions clustered into some distinct groups. These included telephony, online financial transactions mostly embedded in patent documents about advertising, semantic technology, and data management, among others. As we know now, Google has entered the telecommunications market and the company is broadening its financial services to include online merchant accounting services, micro cash charges for some videos, and energy trading–the same business in which Enron made and lost its fortune.

Second, Google’s investment in advertising technology moved beyond text ads on a Web page. Google disclosed ads that could “follow” a person who navigated away from a base page and the use of algorithms that would put an advertiser’s message in the most likely to be clicked on place in an email or personalized browser. What my research revealed is that Google’s Chrome browser uses the “container” technology to allow ads to be tailored to a specific user in a specific context. The idea is that the semantic innovations of Dr. Ramanathan Guha and other Google engineers “hook” together to make the Google ad system intelligent.

Third, Google’s scientists had put considerable effort into making certain human-intermediated tasks work in “smart software”. Google calls the smartest of its smart software “janitors”. In the American comic strip “Dilbert”, the smartest person is the “janitor.” This seems to have been a Google joke, but it is no laughing matter that Google has erected a system that does a good job of using computational intelligence to deliver services at a cost lower than a competitor using less efficient methods.

I was fortune to receive additional support from another client for my third Google study, “Google: The Digital Gutenberg.” In this 2009 study, available from Infonortics, Ltd., I blended my review of Google’s technical open source information with the publicly-accessible services anyone can use on Google.com. The exception, of course, is China.

The discoveries for me from the 18 months of work on Google: The Digital Gutenberg were three points:

First, while Google management insists that Google is not a traditional publisher, Google is for anyone who cares to use its services, a publishing platform.

Second, Google’s smart software can take items of information and assemble them into reports or dossiers. If you examine the admittedly hard to read figure from Google’s patent document “US20070198481, “Automatic Object Reference Identification and Linking in a Browseable Repository”, published August 23, 2007. This type of information display is a far cry from a Google laundry list of results. What’s interesting is that Google has not made this type of mashup of information available, and few outside of Google know this technology to create a report exists.

Third, Google’s platform can work like an integrated manufacturing plant. In the early 20th century, American industrialist Henry Ford envisioned his Rouge River facility as a huge system that could take raw materials in at one end and produce finished automobiles at the other. Google’s present server / software infrastructure is similar to this vision. The result is that Google has some capabilities that are resident in the system. When needed, these can be turned on. When eBay purchased StumbleUpon.com, Google rolled out its “recommendations” feature in iGoogle within a single working day.

I want to talk to you today about the research that will be reported in my fourth Google study, “Google Beyond Text.” Like my other three Google studies, I rely on Google’s open source technical papers and Google’s patent documents. The scope of this study is narrowed to rich media. I admit that “rich media” includes a number of large markets, including a market dominated by major record labels and Apple Computer, broadcast television, online games which is undergoing a renaissance with the mobile device revolution now underway, and digital video. Digital video includes amateur programs like those produced by teenagers about their interests as well as commercial television programs.

In the time we have remaining, I want to comment on three Google technologies that have not been given much coverage at conferences, in the trade press, or among the many Google watchers who write blogs and consultants’ reports.

I will close by sharing with you one of my more troublesome hypotheses. Please, if you have a question you would like to ask, jot it down. I will try to answer as many of these as time permits.

“Google Beyond Text” makes clear that Google has to move from the world of text to the world of rich media. From a technical point of view, the shift looks quite easy. YouTube.com is a major online destination. Every minute, Google users upload about 24 hours of video. Google distributes most of its rich media content for free. There are different opinions about Google’s operating YouTube.com at a profit. What’s clear is that Google’s text ad business is generating about $24 billion dollars in revenue each year, so Google can afford to subsidize rich media.

Let me highlight three inventions, disclosed in Google’s patent applications, and then turn to the surprising hypothesis my data suggest to me.

First, Google has invested in a content delivery system. However, Google’s approach is to build a network-centric system. The approach is that Google’s smart software figures out what content should be placed at the edge of its network and how to deliver that content at the highest possible speed and lowest possible cost. I don’t have time to walk through the math Google engineers disclose, but it is very similar to the type of logic that has been in use in the Google File System for more than nine years. The idea is to distribute certain shards of data in multiple places. When content is required, the information can come from servers with the best cost / performance characteristics. The user gets rich media quickly and Google reduces its content delivery costs.

Second, Google has put significant effort in methods for indexing rich media. These range from setting up human-intermediated systems like those used in Google Translate to systems powered by smart software. The human intermediation is crowdsourced; that is, you are free to add metatags, index terms, and descriptors but you are not compensated for this work. The smart software angle generates metadata based on what others have looked at, objects within the videos themselves such as celebrities or scene content, user behavior. This is the use of clicks to provide an indication which videos are like another video for a specific person or query. These methods are less computationally burdensome than converting spoken text to ASCII, performing optical character recognition on the content, and then indexing the content. Google uses file name, close caption information, and existing index terms plus any tags generated from its system. You can try this system on YouTube.com and judge for yourself how well it works.

Third, Google has jumped into what I call next generation personalization. I titled the section of my new monograph that discusses this robust personalization as the “You Network.” The idea is intriguing. Google taps into its information about a specific user. The system then automatically identifies other rich media content that is likely to be of interest to you. Yes, a specific person. You can explore some of this information by viewing your “Dashboard” at http://www.google.com/dashboard. You can see your own history at http://google.com/history. You will need a Google account to view the information. I am not a fan of television, but my young engineers tell me that the idea of a personalized television guide is of great interest because it will discover programs that may otherwise be missed or save them time when looking for a program of interest.

Finally, I want to mention Google’s hardware inventions. You are familiar with the Google phone and the Google Android operating system. What few people know is that Google has disclosed inventions for a set top box. In the US, this device connects to a network. The device performs the programming guide assembly and intermediates between the content and the user. The Google set top box exists and is for sale in the Google store to employees only. What’s interesting is that Google is disclosing inventions that create a rich media equivalent of its integrated textual content system.

In closing, let me show you a diagram of how a television production or motion picture is created. I know that in other countries, the methods vary. But for my purposes, I want to call to your attention the multiplicity of people, skills, and processes required to get “American Idol” or “Avatar” before the American consumer.

In this diagram, I have shaded gray the functions that the inventions disclosed in Google’s patent documents or the functions currently available to Google users. Notice that three functions are largely unaffected; that is, the producer, the director, and the production work where specialists perform tasks such as building sets or performing. Note: some camera work is already automated.

The point is that Google’s disruptive potential in rich media is identical to the tactics used in telecommunications and financial / back office services. Google makes functions available and allows developers, college student, and anyone with interest to download code and create applications. At the same time, Google generates a large number of beta products, including the Nexus One phone or the set top box with its partner MIPS. No single action is a blockbuster like Apple’s orchestration of the iPad release.

The method is to move incrementally and allow users to figure out what can be done with Google building blocks. Google itself allows its engineers to experiment in a similar manner. When Google’s management discerns a trend in its click-stream data, the company may create a product. A good example is Google Apps and how that set of technology has produced Google Apps for the Enterprise. Google is gaining traction in the enterprise, but it is undoubtedly a bit of a surprise to Microsoft that Google seems to be focusing on the Salesforce.com market, not the market for SharePoint.

This is the seep, surround, and emerge strategy that characterizes Google’s approach.

Will this method work in rich media? Frankly, I don’t know. Google has been quite successful in its ability to disrupt traditional advertising, telecommunications, Web search, enterprise software, electronic mail, and other markets.

In the rich media sector, Google has made some mistakes. Let me highlight three:

First, Google has become embroiled in a high profile, possibly pivotal, legal battle with the media giant Viacom. In addition to the $1.0 billion in damages that Viacom seeks, Google finds itself struggling with regard to copyright in rich media. Regardless of the outcome of the legal matter, the disclosures from both parties have tarnished both Google’s and Viacom’s reputations. By itself, this may be nothing. On the other hand, Apple, Hulu.com, and Yahoo have largely avoided such imbroglios.

Second, Apple controls the music sector. I know that “control” makes many vendors and competitors gasp. The reality is that if you name high impact, for fee music, podcast, and television services, Apple appears on most lists either at the top spot or in the top two or three in the league table.

Third, Google is missing some important components. Despite the richness of its technologies, Google needs the type of functionality available from such companies as Catch Media. Recently Google purchased Episodic, a streaming video service. Is this acquisition enough to give Google an edge over Apple, whose grip on the integrated consumer rich media market seems to be growing?

In closing, I want to point out that Google’s success is remarkable. However, I think 2010 marks an important point in the company’s history. Google is dependent on online advertising for revenues. This means that Google must find a way to capture the attention of customers on the new types of devices which display text but also rich media.

Google has a powerful global band. The company has billions in cash reserves. The firm’s technical staff is either the best or among the best in the world among online companies. The company has seized upon “openness” as a marketing theme. Apple and Microsoft are more “closed.” I am not sure if the notion of “openness” is sufficient to grab market share from these companies.

Google is a formidable competitor, but rich media may be one sector where Google may find itself relegated to a back office or utility function. This can be a lucrative business, but it may turn out that Google has met its match in rich media.

If you think about Google’s missteps in social media, the company seems to be facing a challenger as troublesome as Apple in the booming social media market. Facebook, according to some analysts, has surpassed Google in traffic in the US and is becoming a preferred source of advertising.

Has the 11 year run for Google come to an end? We will know by the end of 2010 or early 2011.

Posted by Stuart Schram IV, April 15, 2010

This is sponsored by the Slovenian government.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Google, Rich media, Technology, Text analytics, Text processing

Comments

One Response to “Google and Disruption: Will It Work Tomorrow?”

sperky undernet on June 28th, 2010 6:53 am

What if Google Dossier could shake the globe just by carrying out its mission?
An example is the Rollingstone hullabaloo used to shed past General McChrstal of his responsibilities, an “interview” in which the then General made no first person critical comments about the President, his administration or the war effort. So is this really about @mmhastings and two scheduled short talks with McChrystal in Paris that became a month-long opportunity during the Eyjafjallajokull volcano no-fly ash time? Is this about off-the-record becoming ambient everybody-knows due to familiarity? So whats happening here? See the October 2, 2009 (no typo) story “Why Obama Must Follow Drucker’s (Not McChrystal’s) Advice” http://www.huffingtonpost.com/steven-g-brant/why-obama-must-follow-dru_b_309184.html And then look at another celebrated straight talker Lieutenant General (ret) United States Marine Corps Paul K. Van Riper of Millenium Game fame. How come he wasn’t fired when he refused to play and made abrasive remarks? Is it because they weren’t directed at the President?
How can anything but this be the resultant opinion: President Obama, your actions and your words are “seemingly stupid”. And according to CNN, Afghan leadership gained more credibility seeming to agree. You gonna fire him too?

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.