FAQ: The Google Legacy and Google Version 2.0
May 2, 2008
Editor’s Note: In the last few months, we have received a number of inquiries about Infonortics’ two Google studies, both written by Stephen E. Arnold, a well-known consultant working in online search, commercial databases and related disciplines. More information about his background is on his Web site and on his Web log. This FAQ contains answers to questions we receive about The Google Legacy, published in mid-2005 and Google Version 2.0, published in the autumn of 2007.
Do I need both The Google Legacy and Google Version 2.0?
The Google Legacy provides a still-valid description of Google’s infrastructure, explanations of its file system (GFS), its Bigtable data management system (now partly accessible via Google App Engine), and other core technical features of what Mr Arnold calls “the Googleplex”; that is, Google’s server, operating system, and software environment.
Google Version 2.0 focuses on more than 18 important Google patent applications and Google patents. Mr Arnold’s approach in Google Version 2.0 is to explain specific features and functions that the Googleplex described in The Google Legacy supports. There is perhaps 5-10 percent overlap across the two volumes and the more than 400 pages of text in the two studies. More significantly, Google Version 2.0 extracts from Google’s investment in intellectual property manifested in patent documents more operational details about specific Google enabling sub systems. For example, in The Google Legacy, you learn about Bigtable. In Google Version 2.0 you learn how the programmable search engine uses the Bigtable to house and manipulate context metadata about users, information, and machine processes.
You can read one book and gain useful insights into Google and its functioning as an application engine. If you read both, you will have a more fine-grained understanding of what Google’s infrastructure makes possible.
What is the focus of Google Version 2.0?
After Google’s initial public offering, the company’s flow of patent applications increased. Since Google became a publicly-traded company, the flow of patent documents has risen each year. Mr Arnold had been collecting open source documents about Google. After completing The Google Legacy, he began analysing these open source documents using different software tools. The results of these analytic passes generated data about what Google was “inventing”. When he looked at Google’s flow of beta products and the firm’s research and development investments, he was able to correlate the flow of patent documents and their subjects with Google betas, acquisitions and investments. The results of those analyses are the foundation upon which Google Version 2.0 rests. He broke new ground in Google Version 2.0 in two ways: [a] text mining provides information about Google’s technical activities and [b] he was able to identify “keystone” inventions that make it possible for Google to expand its advertising revenue and enter new markets.
In a nutshell, you obtain significant detail about what Google could do and how Google could move aggressively into such markets as publishing, for example. Mr Arnold identifies six market sector targets that Google could attack “by flipping a bit” and little advance warning to incumbents in those sectors.
How are Mr Arnold’s studies different from the ones I can buy on Amazon for $20 or $30?
There are three differences between mass market books about Google and Google products and Mr Arnold’s two studies. These are:
- Both The Google Legacy and Google Version 2.0 are based almost entirely upon engineering and technical analyses of open source information. The public statements and marketing collateral issued by Google tell only a part of the Google story. Mr Arnold was the first analyst to look behind the cheerful public relations facade of the company and document the engineering that makes Google such a formidable competitor in cloud-centric services
- Mass market books often skip over technical details and complex business issues. Mr Arnold addresses these topics and tries to explain their impact on users, competitors and market sectors.
- Both studies present Google information from little-known, often hard-to-get or expensive technical documents. Mr Arnold’s work has been funded by various commercial organisations since 2002 and therefore represents a multi-year research and analytic effort that is not afforded mass market authors who write to a deadline and within a publisher’s cash advance.
Because he warehouses open source information about Google, even if certain materials are removed from publicly-accessible web servers, he has an archived copy to use in his research. Google itself questioned Bear Stearns about the source of information about one of Google’s semantic initiatives. The investment bank was able to point the Google attorneys to the open source for the information. Google itself, therefore, is not fully cognizant of what information about the company’s technology is available. Mr Arnold, a specialist in open source intelligence, has made it his business to locate and analyse these data.
The value of Mr Arnold’s analyses is substantiated by the use of his research in investment analysis by some of the world’s leading investment banks. Portions of his work have been circulated internationally by such firms as Bear Stearns and used for proprietary analysis by numerous other firms in the US and elsewhere.
I have tried to locate some of the materials Mr. Arnold has shown me with urls on them. These documents, in many cases, have disappeared. Duplicating Mr Arnold’s analyses with a search engine is, therefore, quite difficult, maybe even impossible.
How much technical knowledge do I need to read these studies?
A person with a college education and basic knowledge of online can read these studies. There are a few code samples and some technical diagrams. As publisher, I have tried to make certain that clear explanations for the more complex subjects has been added to the text. Mr Arnold provides numerous examples and makes use of analogies to make certain mathematical concepts clear. You do not have to know the properties of a Sierpinski triangle. Mr Arnold includes an illustration that makes the redundant nature of certain Google engineering immediately evident. These studies are not of the John Grisham variety, but you will not need a PhD in mathematics to make sense of them either.
Is The Google Legacy, written in 2005, out of date?
Mr Arnold’s technical analyses are focused on broad, foundation issues. You can read The Google Legacy today, and you will note that certain Google products and services have moved from beta to full launch. The information, on the whole, is as valid today as it was when the study appeared three years ago. One of Mr Arnold’s findings is that Google relied upon the learning of the AltaVista.com team, broad findings in computer science, and well-known numerical recipes to make its “money cupcake”. Consequently, Google’s deeper architecture and its applications-centric approach helped to expand its revenue base. You will find that the information is quite fresh and useful in understanding why companies such as Microsoft and Verizon are having difficulty responding to the Google challenges taking place now.
Why are these studies so expensive?
Compared to a mass market non-fiction book, specialist reports such as The Google Legacy and Google Version 2.0 appeal to a sophisticated and specialist segment of the market. Because of the amount of time invested in analysing the data, we need to recover the costs of creating the report and generating a reasonable return on the effort. These studies changed the way in which the financial analysts in a number of investment banks perceived Google. The value of the information is high, and it remains difficult to obtain. Few Google watchers probe the company with the sophisticated business intelligence tools Mr Arnold and his team rely upon. The research is tedious and time-consuming, which has an impact on the pricing of this work.
Who is Stephen Arnold?
Mr Arnold is now an independent consultant. You can read a long biography here, or you can read a shorter biography here. Prior to jettisoning retirement in 1991, he worked at four “real” jobs: NUS (Halliburton Nuclear) in Rockville, Maryland; Booz, Allen & Hamilton in Washington, DC, Crystal City, Virginia, and Manhattan), the Courier Journal & Louisville Times Co., and finally Ziff Communications in Manhattan. He is easy to Google, and you can read quite a bit of his work on his Web site and his Web log.
How do I know his analyses are correct?
You cannot. He does a monthly column about Google’s enterprise initiatives for KMWorld. You will need a copy of this tabloid to read these pieces. Also, we can point to the use of his material by investment banks. You can Google him and see that reputable journalists rely upon him as a source of information about Google. You can conduct your own investigation and find out what companies have hired him to give in-depth Google briefings. Some of these firms are small; others are among the largest and best-known organisations in the world. If you are uncertain about reading a study by a person whom you do not know, we suggest that you download information from the ArnoldIT Web site or look at the sample chapters available on the Infonortics Web site.
Are there free samples?
Yes, there is a sample chapter from The Google Legacy here. There is also a sample chapter from Google Version 2.0 here. Download these and make your own decision about the value of the studies.
Is there a site license for Google Version 2.0?
Yes. It costs $2,200 for a single site. You should contact harry.collier [at] infonortics.com for the details.
Am I able to share my single copy?
If you make a single hard copy, you may share it with others. If you have a single digital copy, we do not want that copy to be used by multiple people at the same time. You have to use your judgment, but if we find a misuse of the studies, then we will take legal action.
My library bought one copy, may I make this one hard copy available via inter library loan?
If you print one hard copy, then you may share that one copy via inter library loan. We do not permit making or sharing multiple instances of a single copy of either study. If we learn about a use outside of these terms, we will take legal action.
I am a student, may I have a copy for free?
No. You may download the sample chapters and review the information on the public Web sites for Infonortics and ArnoldIT.
Is Mr Arnold available for lectures about Google?
You may contact him at sa [at] arnoldit. com. He handles lectures through his office.
Is there an update to either study?
We will issue a separate stand-alone monograph in September 2008. However, that monograph will address in depth Google’s use of machine learning. The preface to the monograph will provide additional commentary on Google’s activities in the last year. We do not plan, at this time, additional editions of either study, preferring the stand-alone monograph approach for now.
I need a review copy. How do I request one?
Please, contact harry.collier [at] infonortics. com.
I have other questions. Whom do I contact?
You can email me, the publisher, at harry.collier [at] infonortics.com.
Harry Collier, Managing Director, Infonortics Ltd., Tetbury, Glou., May 3, 2008