Microsoft’s Browser Rank

July 26, 2008

I heard about Browser Rank a while ago. My take on the technology is a bit different from that of the experts, wizards, and pundits stressing the upside of the approach. To get the received “wisdom”, you will want to review these analyses of this technology:

  • Microsoft’s own summary of the technology here. The full paper is here. (Note: I have discovered that certain papers are no longer available from Microsoft.com; for example, the DNABlueprint document. Snag this document in a sprightly manner.)
  • Steve Shankland’s write up for CNet here. The diagram is a nice addition to the article.
  • Arnold Zafra’s description for Search Engine Journal here.

By the time you read this, there will dozens of commentaries.

Here’s my take:

Microsoft has asserted that it has more than 20 billion pages in its index. However, indexing resources are tight, so Microsoft has been working to find ways to know exactly which pages to index and reindex without spidering the bulk of the Web pages each time. The answer is to let user behavior generate a short list of what must get indexed. The idea is to get maximum payoff from minimal indexing effort.

This is pretty standard practice. Most systems have a short list of “must index” frequently links. There is a vast middle ground which gets pinged and updated on a cycle; for example, every 30 days. Then there are sites like the Railway Retirement Board, which gets indexed on a relaxed schedule, which could be never.

Microsoft’s approach is to take a bunch of factors that can be snagged by monitoring user behavior and use these data to generate the index priority list. Dwell time is presented in the paper as radically new, but it isn’t. In fact, most of the features have been in use or tested by a number of search systems, including the now ancient system used by The Point (Top 5% of the Internet), which Chris Kitze, my son, and I crafted 15 years ago.

We too needed a  way to know only the specific Web sites to index. Trying to index the entire Web was beyond our financial and technical resources. Our approach worked, and I think Microsoft’s approach worked. But keep in mind that “worked” means users looking for popular content will be well served. Users looking for more narrow content will be left to fend for themselves.

I applaud Microsoft’s team for bundling these factors to create a browser graph. The problem is that scale is going to make the difference in Web search, Web advertising, and Web content analytics. Big data returns more useful insights about who wants what under what situation. Context, therefore, not shortcuts to work around capacity limitations is the next big thing.

Watch for the new IDC report authored by Sue Feldman and me on this topic. Keep in mind that this is my opinion. Let me know if you agree or disagree.

Stephen Arnold, July 26, 2008

Google: Chubby and Paxos

July 26, 2008

The duo is not Cisco and Poncho or the Lone Ranger and Silver. Paxos is closer to leather biker gear and a Harley Davidson belt buckle. The outfit gets some panache and the biker’s pants stay properly slung. You may want to read the 16 pages of Googley goodness here. The paper is “Paxos Made Live–An Engineering Perspective.” One of the interesting facts about this paper is that Tushar Chandra has emerged as a spokesperson for Google. You can read my translation of some of his recent comments here.

In this brief essay, I want to identify three of the points discussed in this 2007 paper that are of particular interest to me. But before I highlight these points I want to provide some context. Chubby is a mechanism to keep processes from acting like hungry kindergartners running to the milk and cookies. Chubby keeps order and get the requests filled quickly without having two six year olds getting into a knock down fight over a graham cracker.

Chubby is pretty nifty technology, representing a major advance over the file and record locking schemes used for Codd databases. When I mention this point to IBM DB2 or Oracle wizards, I am greeted with hoots of laughter. “Google has nothing we don’t have and we have file and record locking schemes that are much better,” I was told in May 2007 in the IBM booth at a major trade show. No problem. I believe IBM and Oracle. I just hope their customers believe them, when Google reveals the efficiency of Chubby. You can learn more about Chubby in my 2005 The Google Legacy and my 2007 Google Version 2.0, or you can read this Google white paper. File and record locking for reads and writes is one of the hot spots in many database systems. Some companies turn cartwheels to figure out how to perform writes without screwing up read response time. Believe me, some of these outfits do Cirque de Soleil type acrobatics to work around the database read write problems.

Second, Chubby is not new. When a Google technical paper appears, Google is not revealing a work in progress. My analysis of Google engineering papers and patent documents suggests a careful staging of each information release. When a paper appears, the technology is up, running, and locked in. A competitor learning about a Google innovation from a patent document or a Google technical paper is learning about something that is two to five years “old”; that is, the company has been working on a problem and figured out a bunch of possible solutions. The one soluti0on that makes it into the Google production environment is a good one. When the Googlers talk about an innovation, the competitor who decides to respond is late out of the starting gate. Neither of my two Google studies contained “new” information. I was reporting what was ancient history for Googzilla.

Paxos

Now what’s a Paxos?

Paxos is not one thing. It is a collection  of protocols that allow a system to adapt to failures. Google has lots of servers, so there are many failures. Chubby sits between the Google File System and Google’s BigTable (a data management system, not a traditional relational database). Wikipedia can deliver some less than stellar information, but the write up for Paxos struck me as reasonably good, and the information will get you anchored in the notion. The diagrams won’t be of much use, but the Google diagrams are almost equally opaque. The reason is that the flow diagrams don’t make much sense unless you have some experience with smart software in a failure prone environment. Based on the style of writing and the type of diagrams in the Paxos write up, my hunch is that a Google-grade brain contributed a thought of two the the Wikipedia entry. The external links reinforce my conclusion that this is a pretty reliable description of the flavors of Paxos. Of course, it’s tough to determine which “flavor” or “flavors” are part of the Google library.

chubby performance

A typical Google performance table. Google compares its processes to themselves, not to commercial alternatives. These data suggest that Google is doing the work of a cluster of high performance machines on a single commodity server. The key number is operations per second, which works out to 38,400 operations per second for 20 workers (clients). What’s remarkable is that throughput is 3.6 times greater for for the larger test database. In other words, as the data get bigger, the throughput goes up. © 2007 Google, Inc.

In my vastly simplistic way, Paxos is one tiny cog in Google’s library of smart algorithms. The algorithms crank mindlessly through a procedure writing values. Another process watches these values. When an anomaly becomes evident, the watching process “checks” with other processes and reaches a consensus about what action to take. It sounds really democratic and time consuming. The method is neither. The consensus is not like a human vote. When a group of processes return an acceptance value, the “master” decision is made automatically when a majority of the processes return a proposed value to the master.

Keep in mind that this occurs in a massively parallel computing environment. These types of system level processes occur with near zero latency. This type of master-slave set up is a feature of other core Google processes; for example, the Google File System itself. I describe the advantages of Google approach in The Google Legacy, and I will not repeat that information here. I think it is sufficient to point out that the approach has some very significant benefits, and most of Google’s competitors are racing to duplicate functionality that Google has had in operation for at least eight years.

Read more

Google’s Data Rissoto Attracts Italian Legal Eagles

July 25, 2008

PaidContent.org’s Dianne See Morrison  reported on July 25, 2008, that Google has spoiled the data risotto in Italy. The Italians are picky about recipes, and the GOOG is alleged to “failing to adequately [sic] monitor third-party content posted to their [sic] Web site.” You can Ms. Morrison’s interesting article here. The issue has been simmering for two years. The content seems to be video. The Italian issue joins similar actions in France and Spain. The Wall Street Journal’s Alessandra Galloni filed a new item about the Italian tussle here. By the time you read my comments, the WSJ’s link may be dead.

I’m no attorney. I don’t even have a hankering to spend time with my own attorney. I can point to my 2005 study The Google Legacy here. In that study, researched and written in late 2003 and 2004, I compiled a list of the  vulnerabilities Google faced at that time. In the top three were legal actions.

My research provided me with quotes and publicly accessible documents that revealed the following items:

  • Google delegates functions and asks that those making decisions analyze data and use those data to back up their decisions. This method is different from more political or social procedures used by some other organizations. For example, at Lycos in 1994, face to face discussions took place and many decisions were a collaborative effort, not a data driven effort.
  • Google’s founders are logical, maybe to a fault. If a statement says X, then it “means” X. Google, therefore, looks at rules and guidelines and reasons that these documents mean what they say. Anyone with any experience in the halls of Congress or Parliament know that what a word “means” is more slippery than a 10 gram blob of mercury. However, once logic locks in, the logic dictates the argument. Google executives appear to me to believe that the company is complying and making an effort to comply with rules, laws, and guidelines.
  • Google’s engineers have come up with a number of patent documents addressing content related issues. The company is focusing resources on the problem of content that finds its way on to the Google system that may be problematic.

I processed these items surfaced by my research and drew the conclusions I set forth in The Google Legacy. First, Google is a disruptive force of significant proportions. The culture of Google exists in the eye of a hurricane. Inside Google, it’s calm. Outside of Google tempests rage. Lawyers thrive because those with alleged grievances don’t know how to “get their way” with Google. The logic of the mathematician does not mesh smoothly with the logic of the lawyer.

I am also reasonably confident that Google believes that it is behaving within the letter and spirit of the law as Google understands those promulgations. I know this may sound crazy because legal actions are coming fast and furious.

I know that Google values mathematics and clever solutions. Google is chock full of really smart people who can look at this mathematical expression and resolve it without hesitation:

(a, b) x (c, d) = (a x c – b x d, a x d + b x c)

Individuals who can bring mathematical reasoning and deep technical knowledge to bear on a “problem” often arrive at solutions that are inscrutable to the average bear. Google makes engineering and business decisions with this type of insight or cleverness.  It is not surprising to me that people see Google as an arrogant bunch of gear heads, indifferent to the needs of other businesses. The notion of “getting it” works within Google; it does not work too well in other organizations.

The result?

Lawsuits. Lots and lots of litigation. An infinity of legal eagles, no matter how light weight, can settle on Googzilla and slow it down, knock it over, and maybe pluck out its innards.

I want to see how Italy’s legal eagle react to the Google risotto.

Stephen Arnold, July 25, 2008

Baidu Is No Dodo: Growth Takes Wing

July 25, 2008

Pete Barlas wrote “China Search Leader Baidu Still Reporting Strong Growth”. You can read the full text here. Baidu, not Google, holds sway in the booming Chinese market. In the most recent quarter, Baidu’s revenue doubled to $117 million. The company projects revenue of more than $130 million in the next quarter. The company generates revenue from paid search ads. The company reports that it has more than 180,000 advertisers, up from 161,000 12 months ago. Can Google close the gap? I will be watching.

Stephen Arnold, July 25, 2008

Taxonomies: 24 Caret or Fool’s Gold

July 25, 2008

I have been bedeviled by taxonomies in the last two weeks. Vendors want to demo their systems. Clients want to find out how to make their taxonomies improve search. Even an entrepreneur showed up, gave me money, and outlined his taxonomy scheme for world domination.

scriptorium_1.lg

Fancy tools are not needed to create a useful taxonomy.

Yikes!

The purpose of this feature is to provide some basic places to seek taxonomy lists, services, and functions. The list is not complete, and I will add to it over time.

  • Dow Jones Factiva. You can get librarians to give you a hand and license software too. Click here for traditional media’s taxonomy resource.
  • Interse. A Microsoft SharePoint-centric system. Click here.
  • SV Technologies, now part of Sydney Plus. Legal taxonomy. Click here
  • Taxonomy Warehouse. This is the place to start. Click here to start your quest.
  • WAND. Software, services, and term lists based on business units. Click here.
  • Wordmap. The grandpappy of many whippersnappers’ word lists. Click here

In my April 2008 Beyond Search study, I provide in depth analyses of Access Innovations‘ system and the SchemaLogic taxonomy management systems. You can get information about the for fee profiles here.

This is not a complete list. If you wish to add companies, please use the comment form for this Web log.

Stephen Arnold, July 25, 2008

Google Israel: Beavering Away

July 24, 2008

Google Israel remains my pick for the smartest Google operation outside of Mountain View. My opinion may rile the whiz kids working near Seattle and annoy the heck out of the wizards in Beijing, but I’m entitled to my view.

Noa Parag’s “Google as a Start Up?” is an important essay, and I suggest you click this link to Globes Online and read his English language article here. It’s one part interview with Googler Meir Brand who is pretty good at math and one part business analysis. Don’t delay. Globes Online doesn’t claim to be an online archive, but it does claim bragging rights to its coverage of Israel’s business affairs.

So, what did I learn?

  1. Google Israel is operated like a start up. The company has 100 employees in two R&D centers, one in Haifa and one in Tel Aviv.
  2. Google Israel developed Google Trends and the overlay technology that puts text content on video clips available on YouTube.com
  3. Mobile advertising is a significant opportunity.

Noa Parag’s write up underscores to points about Google. First, the company delegates and relies on email, Google’s internal online system to keep Google Israel “down the hall”. Google, despite its size, is allowing Google Israel to run with the start up ball.

Stephen Arnold, July 24, 2008

A David Outperforming Two Goliaths: Factiva, Lexis, Silobreaker

July 24, 2008

A thoughtful reader sent me a screen shot of a Compete.com report. This is the metrics company that says, “Track your rivals. Then eat their lunch.” As you may know, I don’t get too excited by third party analytics. The data have to show me a big jump, or most of the market shares information is a statistical fuzz ball. When I saw this chart, I took notice.

silobreaker factiva

The time period is a 12 month span, ending on June 30, 2008. The companies on the chart are Dow Jones’s “other” online service, Dow Jones Factiva. You can read more about this outfit here. This online service is so adept that it’s Google ad today (July 24, 2008) returns a 404 error or “File Not Found”. I clicked on the ad eight or nine times to see if was traditional media latency or just carelessness. Answer: carelessness.

The second company’s data charted by Compare is Lexis Nexis, one of the two monopolies in legal information. I love the Lexis tag line: “Lead with Confidence. Work with Confidence. Grow with Confidence.” Unfortunately this Compare.com chart shows Lexis following, not something to inspire confidence or trigger growth. Lexis Nexis sells online information to lawyers, but not surprisingly, lawyers have been finding out that their clients expect the legal eagles to use publicly accessible services, not the high priced services. Accordingly, Lexis Nexis has been working overtime to make Lexis spin more money. Nexis, has been paddling upstream for years, and the brand has less visibility than the hair product (Nexxis) in my opinion. Lexis tried to get the hair product company to change its name. Didn’t work. Tough to confuse a sagging online service with shampoo and conditioners in my opinion.

Now, the third company is co-founded by the former McKinsey manager and intelligence officer, Mats Bjore, and the CEO Kristofer Mansson. His company, Silobreaker, is the one with the soaring line of the chart. When a third party generates an upward curve that rises steeply, I take notice. The absolute numbers are less important than the third party’s sampling process notes a significant change. You can read my interview with Mr. Bjore here.

What’s this chart tell me?

First, Silobreaker is gaining attention at the expense of Factiva and LexisNexis. You can see that in the up and down red and green lines.

Second, Silobreaker’s upward ascent tells me that the company is getting new customers, not just sucking oxygen from the bigger guys’ base.

Third, whatever goosed Silobreaker to rapid growth took place early in 2008, and the momentum appears to be holding up. There will be a tail off in the summer when information junkies head for the beach or a trout stream.

But the useful piece of data is that the combined “people” score for Silobreaker.com is only slightly less than the combined “people” score of Factiva.com and Nexis.com.

Silobreaker may be a David. The two Goliaths owned by traditional media companies and a track record of throwing money and people at a “problem” are not out of the game. But if I were the product manager for either of these two companies, I would be considering one of these actions:

[a] Killing Silobreaker.com with a price war or carpet bomb marketing campaign

[b] Polishing my rĂ©sumĂ© because I am getting humiliated by a company in Sweden, which has a GDP smaller than my employer’s annual revenue

[c] Buying Silobreaker.com and taking credit for the company’s rapid growth, nifty technology, and developers

[d] Deleting my Silobreaker.com bookmark and pretending that the company does not exist.

Since I worked for the world’s smartest publisher, William Ziff, I would go for [c]. Why pretend that a giant traditional publishing company can make a product people want, that’s sexy, and has lift. Buy it, issue a news release, and collect that bonus.

Will Factiva and Lexis wake up? I will keep you posted.

Stephen Arnold, July 24, 2008

Microsoft: What Now for Search?

July 24, 2008

Googzilla twitches its tail and Microsoft goes into convulsions. When I was in the management consulting game, my boss, Dr. William Sommers, talked about “hyper-actions”. The idea was that a single event or a minor event would trigger excessive reactions.

convulsions

Brain scan of a person undergoing excessive “excitement” and “over reaction”.

When I read the flows-like-water prose of Kara Swisher’s “Microsoft’s Latest Web Stumble: Kevin Johnson Out” and then her brief introduction to Mr. Steve Ballmer’s “Full Memo to the Troops about New Reorg”, I thought about Dr. Sommers’s “hyper-action” neologism. In my opinion, we are watching the twitch in Mountain View triggering via management string theory the convulsions in Redmond.

First, let me identify for you the points that jumped from screen to neurons in Ms. Swisher’s write ups.

  1. Ms. Swisher reports that Mr. Kevin Johnson was the architect behind the Yahoo buy out. I thought that the idea was cooked in Mr. Chris Liddell’s lamb-roasting pit. Obviously my sources were off base. Mr. Johnson moves to Juniper and Mr. Liddell continues to get a Microsoft paycheck. Mr. Liddell’s remarks at the March 2008 Morgan Stanley Technology Conference left me with the impression that he was being “systematic” in his analysis. Here’s one take on his remarks.
  2. Ms. Swisher’s run down of Microsoft’s actions so far in 2008 is excellent, and she reminded me that Microsoft bought aQuantive, a fact which had slipped off my radar. What has happened to aQuantive for which Microsoft paid $6 billion, more than what Microsoft paid for Fast Search & Transfer and Powerset combined. He mentioning aQuantive reminded me of those wealthy car collectors on the Speed Channel’s exotic automobile auctions. What do you do with a $1.2 million Corvette? You put it in a garage. You don’t run down to the Speedway in Harrods Creek, Kentucky, to buy a pack of chewing tobacco.
  3. Ms. Swisher turns a great phrase; specifically, “Microsoft has succeeded in burnishing its image as a Web also-ran and still has an uncertain path to change that.” I quite like the notion that a large company takes one action and succeeds in producing an opposite reaction. I think the Google folks would peg that as one of the Laws of Google Dynamics applied to Microsoft. For every action, there is a greater, opposite reaction that persists through time. (Ms. Swisher’s statement that Yahoo looks stable brought a smile to my face as well.)

Next, let me comment on the Mr. Steve Ballmer reorg memo, which will be a classic in business schools for years to come. The opening line will probably read, “Mr. Steve Ballmer, firmly in control of Microsoft, sat at his desk and looked across the Microsoft campus. He knew a bold strategic action was needed to deal with the increasing threat of Google, etc. etc.”

After the razzle dazzle about goals, the memo gets down to business:

We will out-innovate Google in key areas—we’re already seeing this in our maps and news search. Third, we are going to reinvent the search category through user experience and business model innovation. We’ll introduce new approaches that move beyond a white page with 10 blue links to provide customers with a customized view of their world. This is a long-term battle for our company—and it’s one we’ll continue to fight with persistence and tenacity.

Read more

MicroStrategy: TSA Swims through Data with PIMS

July 24, 2008

Government information technology makes me perspire. When a government news item renders in my news reader, I don’t pause. I want to make an exception. MicroStrategy is a very intriguing company. The fact that the firm has ramped its services to a law enforcement agency is interesting. MicroStrategy has been working with TSA since 2004. The deal signed in 2006 has saved TSA more than $100 million. The sentence that caught my attention was:

The TSA is a metrics-based organization… We [the TSA] use metrics every day to drive our decision making and quantify security effectiveness, operational efficiency and workforce management.

An example of this metrics focus is that since 2004, the TSA uses PIMS to run one million reports per year. TSA has about 12,000 users of the system. Each user prints about two reports a week. TSA is right in line with the Office of Management & Budget’s guidelines for managers to make decisions based on hard data, not hunches.

MicroStrategy, as you may know, popped in and out of the news in the 2000-2002 period. One of the items I recalled reading is here. Some former MicroStrategy professionals founded Clarabridge, a company focused on the overlap between business intelligence and content processing. You can find information about that company here.

I want to pay closer attention to MicroStrategy. Companies that can help Federal agencies save $100 million are the taxpayers’ best friends. I am interested in the MicroStrategy – Clarabridge alignment as well. Off to the library in the morning to find what I can find.

Stephen Arnold, July 24, 2008

Googzilla Swallows Telegraph Media Group

July 24, 2008

Traditional media has been my favorite whipping boy for a long time. The Telegraph Media Group may force me to rethink my critical view of companies who write stuff, print of dead trees, and employ folks at near starvation wages to get the messy artifacts to a declining readership. Silicon.com reported here that the publisher of The Daily Telegraph, Sunday Telegraph and Weekly Telegraph, as well as the telegraph.co.uk Web site will standardize on Google Apps–word processing, mail, collaboration. The whole shooting match.

My reading of the announcement suggests that TMG did the math and calculated that it could save a bundle. More significantly, TMG lets Google worry about software, presumably so the newspapers can worry about selling adverts. The most interesting statement in the Silicon.com write up is this remark attributed to one of TMG’s managers:

We see the levels of innovation happening in the consumer space…you can actually take advantage of within the enterprise space.

Microsoft, among other traditional software companies, are going to learn first hand how fissionable material goes critical. A few things happen, then a few more things, and then the game changes. Is Google Apps ready to go critical?

My view: yes.

Stephen Arnold, July 24, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta