Google: Suddenly Too Big
February 22, 2009
Today Google is too big. Yesterday and the day before Google was not too big. Sudden change at Google or a growing sense that Google is not the quirky Web search and advertising company everyone assumed Googzilla was?
The New York Times’s article by professor Randall Stross available temporarily here points out that some perceive Google as “too big.” Mr. Stross quotes various pundits and wizards and adds a tasty factoid that Google allowed him to talk to a legal eagle. Read the story now so you can keep your pulse on the past. Note the words the past. (You can get Business Week’s take on this same “Google too powerful” here.)
The fact is that Google has been big for years. In fact, Google was big before its initial public offering. Mr. Stross’s essay makes it clear that some people are starting to piece together what dear Googzilla has been doing for the past decade. Keep in mind the time span–decade, 10 years, 120 months. Also note that in that time interval Google has faced zero significant competition in Web search, automated ad mechanisms, and smart software. Google is essentially unregulated.
Let me give you an example from 2006 so you can get a sense of the disconnect between what people perceive about Google and what Google has achieved amidst the cloud of unknowing that pervades analysis of the firm.
Location: Copenhagen. Situation: Log files of referred traffic. Organization: Financial services firm. I asked the two Web pros responsible for the financial services firm’s Web site one question, “How much traffic comes to you from Google?” The answer was, “About 30 percent?” I said, “May we look at the logs for the past month?” One Webmaster called up the logs and in 2006 in Denmark, Google delivered 80 percent of the traffic to the Web site.
The perception was that Google was a 30 percent factor. The reality in 2006 was that Google delivered 80 percent of the traffic. That’s big. The baloney delivered from samples of referred traffic, if the Danish data were plus or minus five percent, Google has a larger global footprint than most Web masters and trophy generation pundits grasp. Why? Sampling services get the market share data in ways that understate Google’s paw prints. Methodology, sampling, and reverse engineering of traffic lead to the weird data that research firms generate. The truth is in log files and most outfits cannot process large log files so “estimates” not hard counts become the “way” to truth. (Google has the computational and system moxie to count and perform longitudinal analyses of its log file data. Whizzy research firms don’t. Hence the market share data that show Google in the 65 to 75 percent share range with Yahoo 40 to 50 points behind. Microsoft is even further behind and Microsoft has been trying to close the gap with Google for years.)
So now it’s official because the New York Times runs an essay that say, “Google is big.”
To me, old news.
In my addled goose monographs, I touched on data my research unearthed about some of Google’s “bigness”. Three items will suffice:
- Google’s programming tools allow a Google programmer to be up to twice as productive as a programmer using commercial programming tools. How’s this possible? The answer is the engineering of tools and methods that relieve programmers of some of the drudgery associated with developing code for parallelized systems. Since my last study — Google Version 2.0 — Google has made advances in automatically generating user facing code. If the Google has 10,000 code writers and you double their productivity, that’s the equivalent of 20,000 programmers’ output. That’s big to me. Who knows? Not too many pundits in my experience.
- Google’s index contains pointers to structured and unstructured data. The company has been beavering away to that it no longer counts Web pages in billions. The GOOG is in trillions territory. That’s big. Who knows? In my experience, not too many of Google’s Web indexing competitors have these metrics in mind. Why? Google’s plumbing operates at petascale. Competitors struggle to deal with the Google as it was in the 2004 period.
- The computations processed by Google’s fancy maths are orders of magnitude greater than the number of queries Google processes per second. For each query there are computations for ads, personalization, log updates, and other bits of data effluvia. How big is this? Google does not appear on the list of supercomputers, but it should. And Google’s construct may well crack the top five on that list. Here’s a link to the Google Map of the top 100 systems. (I like the fact that the list folks use the Google for its map of supercomputers.)
The real question is, “What makes it difficult for people to perceive the size, mass, and momentum of Googzilla?” I recall from a philosophy class in 1963 some thing about Plato and looking at life as a reflection in a mirror or dream (???????). Most of the analysis of Google with which I am familiar treats fragments, not Die Gestalt.
Google is a hyper construct and, as such, it is a different type of organization from those much loved by MBAs who work in competitive and strategic analysis.
The company feeds on raw talent and evolves its systems with Darwinian inefficiency (yes, inefficiency). Some things work; some things fail. But in chunks of time, Google evolves in a weird non directive manner. Also, Google’s dominance in Web search and advertising presages what may take place in other markets sectors as well. What’s interesting to me is that Google lets users pull the company forward.
The process is a weird cyber – organic blend quite different from the strategies in use at Microsoft and Yahoo. Of its competitors, Amazon seems somewhat similar, but Amazon is deeply imitative. Google is deeply unpredictable because the GOOG reacts and follows users’ clicks, data about information objects, and inputs about the infrastructure’s machine processes. Three data feeds “inform” the Google.
Many of the quants, pundits, consultants, and MBAs tracking the GOOG are essentially data archeologists. The analyses report what Google was or what Google wanted people to perceive at a point in time.
I assert that it is more interesting to look at the GOOG as it is now.
Because I am semi retired and an addled goose to boot, I spend my time looking at what Google’s open source technology announcements that seem to suggest the company will be doing tomorrow or next week. I collect factoids such as the “I’m feeling doubly lucky” invention, the “programmable search engines” invention, the “dataspaces” research effort, and new patent documents for a Google “content delivery demonstration”, among others — many others I wish to add.
My forthcoming Google: The Digital Gutenberg explains what Google has created. I hypothesize about what the “digital Gutenberg” could enable. Knowing where Google came from and what it did is indeed helpful. But that information will not be enough to assist the businesses increasingly disrupted by Google. By the time business sectors figure out what’s going on, I fear it may be too late for these folks. Their Baedekers don’t provide much actionable information about Googleland. A failure to understand Googleland will accelerate the competitive dislocation. Analysts who fall into the trap brilliantly articulated in John Ralston Saul’s Voltaire’s Bastards will continue confuse the real Google with the imaginary Google. The right information is nine tenths of any battle. Apply this maxim to the GOOG is my thought.
Stephen Arnold, February 22, 2009
A Publisher Who Violated Copyright: Foul Play
February 22, 2009
I did not want to write about this situation until I cooled down. I write monographs (dull and addled ones to be sure) but I expect publishers to follow the copyright laws for the country in which the publisher resides. I used to work at a big, hungry, successful publishing company in New York. Even in the go go 1980s, the owner set an example for the officers and professionals to follow. The guideline was simple. Treat information and copyright with respect. Before returning the the nutso New York scene, I worked at the Courier Journal & Louisville Times Co., then one of the top 25 newspapers in the world. The rules were clear there too. Respect copyright. I have three active publishers at this time: Frank Gilbane (The Gilbane Group), whom I have described as the least tricky information wizard I know; Harry Collier (Infonortics Ltd.), my Google publisher and long time colleague; and Steve Newton, at Galatea in the UK, who makes my lawyer look like a stand up comedian. Mr. Newton is serious and respectful of authors like the savvy Martin White and me, the addled goose.
I would go straight to my attorney if I found out that one of these professionals was sending without my permission copies of my monographs to individuals who were not reviewers or representatives of a procurement team. Gilbane, Collier, and Newton would either send an email or pick up the mobile and let me know who wanted a copy.
I was thunderstruck when a dead tree publisher in New Jersey, which I will not name, sent me via electronic mail and with no communication with me a copy of a hot off the press book about Google. I took three actions:
- I alerted my attorney that a publisher was possibly violating copyright and that I wanted to know what to do to protect myself. “Delete the file” and “Tell ’em not to do this type of distribution again” were the two points I recall.
- I asked one of my top researchers and one of the people who does research for my legal and investigative reports to telephone the publisher and state what the attorney told me. Then repeat the message again and inform the publisher to pass further communications to my assistant, not to me.
- I deleted the file.
TurboWire: Search for the Children of Some Publishing Executives
February 22, 2009
A bit of irony: at a recent dinner party, a publishing executive explained that his kids had wireless, Macbooks, and mobile phones. He opined that his kids knew the rules for downloading. I was standing behind the chair in which his son was texting and downloading a torrent. The publishing executive stood facing his son and talking to me about his ability to manage digital information. I asked the son what he was downloading. He said, “Mall Cop”. From Netflix I asked? He said, “Nope, a torrent like always.”
If you want to take a look at some of the functionality for search and retrieval of copyrighted materials, check out TurboWire. You can download a copy here. Click here for the publisher’s Web site. The features include search (obviously) and:
- Auto-connect, browse host, multiple search.
- Connection quality control.
- Library management and efficient filtering.
- Upload throttling.
- Direct connection to known IP addresses.
- Full-page connection monitor.
- Built-in media player.
Oh, talking about piracy is different from preventing one’s progeny from ripping and shipping in my opinion. And, no, I did not tell my host that he was clueless. I just smiled and emitted a gentle honk.
Stephen Arnold, February 22, 2009
Medpedia: Using Web 2.0 to Advance Medicine
February 22, 2009
Editor’s Note: The health information sector is showing some zip. Beyond Search asked Constance Ard, the Answer Maven, to comment on the new service, Medpedia.
Medpedia has a stated purpose of “applying a new collaborative model to the sharing, collection and advancement to medical knowledge.”
This new project has the support of gold star partners Harvard, Stanford and University of Michigan Medical Schools as well as UC Berkley’s School of Public Health. This technology platform is open to the public but has special appeal to users in the medical, health services, academic, and research communities.
The project began in 20089 with Charter Members and Advisors offering support for this collaborative model of medical knowledge sharing.
The Privacy Policy provides support for third-party advertisers to collect and use site user information. Using the site does not require registration for readers. Editors and Members must register. An industry disclosure practice has also been adopted by Medpedia that requires editors to “disclose in their public profiles their corporate and academic affiliations and they must disclose if they receive, or expect to receive, any form of compensation for the content they contribute to Medpedia, or any compensation related to medicine, medical information, or products and services related to the body.”
The Terms of Use outlines very clearly that the site does not provide medical advice and the content is not Peer Reviewed. Contributors must register to use the site. Contributors should review the terms carefully.
Medpedia has kept the user audience in mind for this project. They provide plain English pages for your average Jane Q. User and Clinical pages for medical professionals. This flexibility along with other key features such as interdisciplinary contributions allow Medpedia to reach beyond the consumer and/or researcher to meet the needs of both types of user.
Contributions may be made by anyone. Editors are screened and carefully selected but once a member becomes a recognized editor their profile will track their contributions on Medpedia. Medpedia does plan to expand to languages other than English. Contributors have very specific levels of access for content creation and editing on the site. The FAQ’s lay out the types and responsibilities associated with the various levels.
Using the site is easy. The index of current articles has a list of terms that can be linked to access full encyclopedia articles. The ruling organizational scheme is alphanumeric.
For the layperson, reviewing the search results for a search of the articles on “infectious diseases” at first glance does not hold much hope. However, as you review the results the articles are most definitely indexed appropriately. If you are a keyword user, don’t expect highlighted search terms in the results list. The one line search blurb is literally the first line of the article no matter the format of the full-text.
The seed content does have highly reliable information that can be used by any level researcher for accurate content. Medpedia warns that as the general public contributes to the site this content will require verification. This need for verification is why the Editor and Committee structure will be so important for the development of this collaborative model. The editors will provide the touchstone for accuracy and currency as site content grows.
Finding articles by the contributing organization or by community is easy. Community within the Medpedia environment refers to a particular group of articles, editors and contributors on a specific topic i.e. Adult ADD/ADHD. There is alphanumeric index of the communities and an alpha index of professionals who have provided a profile that provides education and experience.
The collaborative nature of this model is encouraging. The site seems to be well governed to insure that quality, reliable and verifiable information is accessible. The search feature seems effective but the results display has room for improvement, at least from a layperson viewpoint. In my opinion, in the days of keyword searching the blurb in the result needs to be more reflective of the content than the first line of text from the article.
As this site grows it will be important to investigate the effectiveness of the editorial process to ensure that the collaborative model does not fail due to an overwhelming influx of inaccurate out-dated information. As it stands, the seed content makes this a useful and reliable source for medical information. The indexes and structure applied to the content is good and the search tool seems accurate despite the disappointing results display. If you are seeking reliable medical content Medpedia is a good place to start whether you are a professional or Jane Q. User.
Constance Ard, Answer Maven, February 22, 2009
Google: A Scoffing Violator for Sure
February 22, 2009
If Microsoft can release Internet Explorer 8 and put itself on a list of non compliant Web sites, Google can violate its own Webmaster guidelines. SearchNewz doesn’t agree. You can read Dave Davies’ view of the scoffing violator Google here. Mr. Davies includes a link to Google’s explanation of the situation. For me, the most important comment in the write up was:
As it turns out, old Google Japan has been buying links in the form of blog posts to help increase their rankings. Of course, it wasn’t actually Google – it was a third party (of course) and Google Japan’s PageRank has been dropped to a 5 from the 9 it was at. So a black eye for Google. Of course, they have a good explanation but then – who doesn’t. 🙂 All the same, the one person who came out of this looking great – Matt Cutts who once more represents Google well and you just want to trust him to do no evil.
My research suggests that Google takes other liberties with its guidelines as well. But if Google makes its rules, just like a shopping mall owner, Google can breaks its rules. Google is a bit more influential than a shopping mall, however. I don’t mind pointing out Googzilla’s flaws, but I do try to follow its rules. I even put up with silliness from the now famous Cyrus and his death of knowledge about Google’s own open source information stream. Mr. Davies makes a good point, but it won’t amount to a hill of dead Google power supplies.
Stephen Arnold, February 22, 2009
WebFetch: Metasearch UK Style
February 22, 2009
InfoSpace was on my radar several years ago. Since that matter was resolved, I haven’t given the company much thought. I did a quick search of my notes and files about the company and came across a reminder to myself about WebFetch. The WebFetch.com site was an InfoSpace property when I first came across it. A quick visit to the site on February 21, 2009, revealed that the service is tagged as an InfoSpace property. I had this snippet of information in my InfoSpace folder:
Catering to English-language Internet users in Europe and using innovative metasearch technology, WebFetch® offers queries that draw results from many leading search engines all at once. In one click, users receive both free listings and paid-for results. All paid-for results are labeled as “sponsored.”
WebFetch is a comparison metasearch system. Your query is passed against Google’s, Microsoft’s, yahoo’s, and Ask’s Web index. You can review results in a single, relevance-ranked, deduplicated list. Alternatively you can look at the most relevant hits from each of the four search engines. I learned about the system several years ago. I noted a redesign in 2006 that included some graphical representations of search results. An FAQ about the service is here. With a click, one can narrow the search to UK or international content. My tests revealed that there was not significant difference in the results. I have a note to myself that says, “InfoSpace acquired WebFetch.com.” But I cannot verify that item of information in the files loaded on this system.
InfoSpace has been selling its mobile assets. The company seems to be in flux. What struck me when I visited WebFetch.com on February 21, 2009, was:
- There was no advertising on the pages displayed to me
- The site was clean but the information about the service took a bit of sleuthing to uncover
- The flashier features such as the visualization I noted in my 2006 notes to myself was no longer available.
InfoSpace has a long and somewhat interesting history. WebFetch.com seems to be marginalized, but I don’t think too much about other InfoSpace Web search properties either. These include the service named Dogpile.com, which continues to strike me as somewhat off center. Other search properties include MetaCrawler.com and WebCrawler.com. After reviewing each service, I concluded that Dogpile.com was the site that seemed the most well rounded.
What’s the future of metasearch? I think the term is being pushed aside by the notion of federated search. And, federated search itself is being displaced by systems that aggregate, parse, and assemble content. An example of this trend is the Fetch Technologies’ approach. This outfit snagged a Googler in late 2008. My conclusion: bet on Fetch, not WebFetch.com.
Stephen Arnold, February 22, 2009
Google: Dilemma Possible in Federal Sales Push
February 22, 2009
I am no longer schlepping to Washington, DC, every week. Those days are happily behind me. I have been thinking about several different news events in the last week. At lunch yesterday at the lovely Harrod’s Creek Bar-B-Q Pit yesterday, three of us mentioned separately these events which could come together in a chain reaction crash:
If several events come together at one time, Google could face the bureaucratic equivalent of a chain reaction collision as its government business begins to take off into the hundreds of millions of dollars, not the wacky $4,000 reported by a unit of CBS.
- The story that ran in MarketWatch about Google’s making only $4,000 in sales to the Federal government in 2008. I mentioned this in the context of a major news outlet’s amazing ability to create the impression that Google’s presence in the US government is only slightly more than buying a Kentucky influence peddler to get a road repaved. Ludicrous. Wrong. Uninformed. Why’s this important? Whoever wrote the story doesn’t know much about government procurements, the General Services Administration, and Google’s growing footprint in agencies. Not surprising. Most experts find their information via Google and calls to some well-worn contacts. Little wonder Google was positioned as an incompetent loser in the Federal market. Totally wrong. Refresh your memory of this write up here. My write up is here.
- One of my lunch partners mentioned the White House’s push for open source systems and non-proprietary software. The example offered was the use of Drupal software for the Recovery.gov Web site. You can read about this in the TechPresident.com Web log. There are maybe a half dozen or more trophy generation firms running around the White House doing information technology. But that’s normal for a new administration. The message that this decision sends to Executive branch agencies is that proprietary software peddled by the giant integrators may fall from favor. Yellow lights flash. Bells clang. Suddenly the Beltway Bandits form Google practices and little known programs built on proprietary software which cannot be operated by government employees gets pushed toward the budget MRI machine.
- The third person at lunch raised the issue of the Department of Justice’s apparent interest in looking at Google as a company of interest in the dicey monopoly space. I don’t may much attention to the DOJ since I had to wait 45 minutes to get through the air blast super security sucking machine to attend a meeting in the facility in 2008. My group had just arrived from a secure facility and the machine flagged the group of four as having residue on our clothes. We think it was the taxi’s air freshener that made us late to the meeting. The new interest at Justice seems to be related to an allegation by SourceTool.com that Google is not behaving like a tame Googzilla. You can read one take on this story here.
When I walked the two technical advisors to Beyond Search this morning, I reflected on these three Google-related comments. My thoughts coalesced around the idea that Google may be in for a little rough sailing in Washington. First, even if the White House loves open source, Google, and Macs (loved by Googlers because “real” Unix is only a click away from the cartoon interface)–Google could become a hot potato. A probe even if the allegation is specious fires up the bureaucracy. When those thousands of gears engage, in my experience it is tough predict what will emerge from the maw of justice. Risk sky rockets, and no procurement committee gets too excited about risk. In my opinion, some hefty procurements now in the works could stall. Google doesn’t sell direct. Integrators and partners feel the pain. Phones ring. Email flows. With that information winging around, risk ratchets up like the tachometer on an F1 race car leaving the curve and heading down the straight away.
The dilemma? The government wants to buy Google. The allegations and the legal process slow the uptake of Google products and services. Google now faces a real–not an imaginary decline in US Federal government sales. I think I will find a front row seat and watch these forces collide, merge, reform, and eventually reach entropy. The question becomes, “Now what?” Any thoughts?
Google as Content Tsar
February 21, 2009
The Valley Wag Web log ran an interesting article here. The write up was “The Height of Google Hubris”, and I think that hubris means “a term used in modern English to indicate overweening pride, superciliousness, or arrogance, often resulting in fatal retribution.” Wow. I thought it meant trophy generation confident. Anyway, for me, the most interesting comment was this one attributed to a high ranking Googler named Jonathan Rosenberg.
We need to make it easier for the experts, journalists, and editors that we actually trust to publish their work under an authorship model that is authenticated and extensible, and then to monetize in a meaningful way. We need to make it easier for a user who sees one piece by an expert he likes to search through that expert’s entire body of work. Then our users will be able to benefit from the best of both worlds: thoughtful and spontaneous, long form and short, of the ages and in the moment.
Valley Wag then adds this bit of biographical insight into the Googler who allegedly made the statement I just quoted:
The likes of Rosenberg, whose career before Google was marked by the baroque failures of @Home, a broadband service which ended in bankruptcy in 2001, and eWorld, an Apple-owned Internet service provider which shut down in 1996?
Double wow.
Stephen Arnold, February 21, 2009
Google Plumbing Stat
February 21, 2009
Amit Agarwal, a professional blogger and personal technology columnist for a national newspaper, wrote “Single Google Query Uses 1000 Machines in 0.2 Seconds” here. The data came from Googler Jeff Dean, a former Digital Equipment wizard who joined Googzilla 20 patent documents ago. Key points for me were:
- One query uses 1,000 machines
- The Google index is in memory
- Latency now 200 milliseconds, down from 1000 milliseconds
- Power consumption… a lot.
Hopefully a video of Dr. Dean’s talk with turn up on the Google Channel.
Stephen Arnold, February 21, 2009
Nielsen: Time per User
February 21, 2009
I like the tables and data that ZDNet makes available. It delivers the old Predicast File 16 punch without the online connect and type charges of by gone days. The table “Top Web Brands in December 2008” here ruffled my thinning pin feathers. Let me highlight three companies’ “time” and capture the thoughts that flapped through my addled goose mind. Here are the values that puzzled me:
Yahoo, according to Nielsen, attracted 117 million visitors and each visitor spent 3 minutes and 12 seconds per visit. The barking dog AOL Media Network attracted 86 million visitors and each visitor spent 3 minutes and 41 seconds per visit. YouTube.com (one of the top five sites in terms of traffic according to some stats cats) attracted 81 million visitors and each visitor spent 54 seconds on the site. The site able to attract visitors and make them go away fastest was Amazon with 61 million visitors and each visitor spent 34 seconds on the site.
Now these data strike me as evoking more questions than they answer. For example:
- Yahoo gets me to stick around because the system is so slow. Email is not usable from some countries. Yahoo’s gratuitous “Do you want to cache your email?” is nuts. If I am in Estonia on Monday and Warsaw on Tuesday, what do you think? These “sticky” values are indicative of some other factors, which the ZDNet presentation does not address. I think Yahoo gets a high score because of the amount of time required to perform basic email operations. I fondly note the inadequate “ying” server because I have to sit and wait for the darn thing to deliver data to me.
- The Amazon number is just odd. I buy books and a few on sale odds and ends. The Amazon system also demonstrates sluggishness. There’s the need to turn on “one click”. That takes time because I can not easily spot the verbiage that allows me to turn on one click and have the system remember that as my preference. Then there is the sluggish rendering of items deep in an Amazon results list. I find the search system terrible, and I waste a lot of time looking for current titles that * are * available for the Kindle. The long Amazon pages take time to browse. In short, how can a visitor get in and out of Amazon in and average time of 34 seconds. Something’s fishy.
- The AOL numbers are similar to Yahoo. Maybe system latency is the way to improve dwell time.
- The YouTube.com number makes no sense at all. YouTube.com offers short videos and now longer fare. YouTube.com demographics are skewed to the trophy generation. How can a YouTube.com visitor wade through the clutter on the various YouTube.com Web pages, wait for the video to buffer, and then get out of Dodge City in 54 seconds. Something’s off track here.
I am confident that Nielsen’s analysts have well crafted answers. I wonder, however, if Phil Nielsen would accept those answers. I know I would not unless I could look at the method of data collection, the math behind the calculation, and the method for cranking out the tidy time values. I sure hope no former Wall Street quants were involved in these data because I would be really suspicious.
My hunch is that the simple reason the numbers strike me as weird is that these data are flawed, maybe in several different ways. In today’s economic climate, numbers are like Jello. I never liked Jello.
Stephen Arnold, February 21, 2009