Google Does Yahoo-Style Math
July 23, 2008
Not long ago, a Yahoo guru opined that building a Web search system cost about $300 million. I made a feeble attempt to point out that if that were indeed true, Yahoo would have accomplished the task and not collected search engines the way my mother adds to her collection of knickknacks. Similarly, Microsoft would not have bought Powerset, Fast Search & Transfer, and football-field sized data centers. I think the Yahoo math in that essay which you can read here was 1 + 1 = 3 bazillion. A bazillion is a technical term favored by the mathematical challenged. You can read about Yahoo math here.
Now, Fortune, sponsor of the Brainstorm Tech conference that featured the Beavis and Butt-head analysis I noted yesterday, offers another number. This time the number is $100 million and the person doing the calculation is Google’s employee #12, Marissa Mayer, high wizardette of the Googleplex.
You can read Fortune’s own take on this calculation here. I cannot do justice to the Fortune writer’s discourse. I can remark that Google News was a project that showed off some flashy Google technology circa 2001. Google News’s developer was, based on what I learned in my research, was Krishna Bharat. In 2006, Google News became official. In the last seven years, Google News has expanded, making it easy for me to see what’s shakin’ in France or a score of other countries.
Google’s technology spiders a subset of important news sources. The system “discovers” the important stories. The front page or splash page is automatically generated with follow on stories appearing in a cluster of related links. There have been some hiccups. These have ranged from major media outlets reminding Google that a newspaper accepting a feed from the mothership should not appear at the top of the news stack for a story. Google fixed this with some “human intervention”, which is now a key distinction of Google’s intelligent software; that is, a human makes sure the numerical recipe doesn’t add too much sugar and not enough salt to the output. The service also provided me with a good example to tell traditional publishers that unless some rethinking of their news operations took place, digital news would erode the traditional news base. I started yapping about this in 2001, but then and now, traditional publishers prefer to talk with my partners not me. I guess I’m too blunt for the white shoes and panama hat set on sultry summer days. Gee, the truth is the truth and the earnings of Time Warner (Fortune’s owner) supports my 2001 insights I suppose.
Now to the magic number.
Google News is a demo; it is not a revenue producer. I provide some information about the technology Google uses for this service in The Google Legacy and Google Version 2.0. If you want to know more, click here. The technology in Google News is darned impressive and generally unappreciated by users and competitors alike. By the way, do you wonder what Mr. Bharat has been doing since 2003, the most recent technical paper on his Google official biography page? (He’s been busy, but he is remiss in updating his research activities. For a peek, check out US20080097833.)
How do I know this? Do you see any ads on this page? As my Wall Street pals tell me, “Google’s revenue comes from ads.” So, no adds means no revenue on public facing Google pages. There’s not even a link to Google Enterprise that I could find.
“Why,” I ask, “are there zero ads on Google News?”
If my research is correct (which it may not be), the reason may be tie back to those traditional publishers. Not even Google can figure out how to divvy up a $0.83 click among the contentious, cantankerous publishers whose headlines are presented on the Google News page. A misstep can trigger another $1 billion lawsuit. Not even Google wants more of these old media new media face offs. I read here that Google even has a deal with the Associated Press, a forward looking outfit if there ever was one. Too many lawyers undermine one’s ability to do math, I have heard.
After seven or eight years, Google News’s presents an ad free face to me. Your mileage may differ, of course. And, according to the Fortune write up, Google wants to handle consumer health records in the same way; that is, traffic generation; to wit:
Mayer said that’s the way Google thinks about monetizing digital consumer health records. The company is one of many working to make it convenient for people to store and access their medical records online, a move that proponents say will improve health care by empowering consumers. But Mayer said that after some internal discussions, Google brass decided not to put ads on health record pages.
I think the strategy is for Google News and the other Google services to pull traffic to Google’s information amusement park. Overall traffic is a net benefit as long as Google can manage the costs associated with scaling to handle the hoards of visitors, buyers, and gawkers.
What strikes me as weird as why Google feels compelled to use Yahoo math–that is, making up a number–to justify its 10-year old business strategy. The top line revenue and net profit reveal the underlying math and the wisdom of Google’s approach. What’s a $100 million to Google. I can’t even get my Bowmar calculator to calculate the percentage because the zeros overflow the display.
Yahoo math from Google I don’t need. Agree? Disagree?
Stephen Arnold, July 23, 2008
Privacy Flash Point
July 23, 2008
When I speak with professional groups, I dance around the issue of “smart software”. The idea is that scripts do more than handle situations as a zero or one, white or black, on or off. The computers are binary, but programmers have numerous methods for helping a script deal with ambiguity.
One of the ways is to know what a single user or a group of users who share characteristics actually do. Looking at what a person does nine times out of ten times makes it easy to tell a script, “When this person takes this action, you take that action.”
The key to making this type of “smart software” work is data. The more data one has about an individual or a group of like-acting individuals, then the easier it is to cook up simple rules. The script runs the actions. When a decision is needed, the script looks at the usage data and makes a decision.
Endeca can integrated saved queries into a work flow. When the sales person reaches a particular point in a selling script, the Endeca system runs the query and displays the information based on a combination of rules and looking at some data about what sells, what product returns the largest commission, or some other factor.
Again, the key is rules and data.
The rules are tedious to set up and test. But once in place, the real nourishment for smart software is data. But most users are themselves unaware of what actions they take when using a computer. If I remind a user that email can be analyzed for syntactical fingerprints, friends, and insight into the preferences of the user, people are shocked. This amazes me.
Closed doors–that is, privacy–are tough to live behind in an online world.
I was thinking about this issue and privacy because the current issue of KMWorld, a tabloid published by Information Today, arrived via snail mail this afternoon. My monthly column was no more. In the July August 2008 issue, my column had become a feature story, “Cloud Computing and the Issue of Privacy”, pages 14, 15, and 22. The highlight of the story is a graphic from one of Google’s patent documents showing an exemplary data model for usage information about an individual or a group of users. The idea is that when a person can be assigned to a cluster based on some discovered similarity, probability methods make it trivial to “predict” what most members of the group will next do or prefer. This is not magic, but it is complicated and requires a honking big computer to work when there are lots of people and many groups.
To prepare for the one or two emails I get when my for-fee articles appear, I thought it might be a good idea to see what’s online. I know a little about Google but I don’t know much beyond my little area of expertise that I hone against the whetstone of Kentucky culture.
Update to Vista Search
July 23, 2008
TechWhack reported that Windows Search 4.0 will be released through Windows auto update mechanism. You can read the original article here. To be upfront, I had lost track of the “internal search” version number in XP and Vista. I have an even tougher time figuring out search in Outlook Express, Outlook, SharePoint, Dynamics CRM and its X++ language, SQL Server, and Microsoft’s search acquisitions. My inability to keep Microsoft search systems clear in my mind is reflective of my age and my addled goose status here in rural Kentucky.
TechWhack reported: “This new version gives more control to enterprise IT administrators over users’ access to and use of search.”
What will thrill my chief technical officer is that “this new edition would require complete re-indexing of all the data on the computer. The good news is that the reindexing will suck up fewer resources.
Stephen Arnold, July 23, 2008
Semantra Snags $3 Million in Additional Funding
July 23, 2008
The economy is uneven. Semantra, however, has obtained $3 million in funding from CPMG, a unit of Cardinal Investment Company. The “C” means cardinal and the “PMG” means Public Market Group. No matter, the CPMG investment in Semantra totals about $9 million.
What’s a Semantra?
The company is a leader in “conversational analytics.” This buzzword means that a user of a Microsoft Dynamics CRM system can ask a question in plain English. Semantra converts the question to a syntax Microsoft Dynamics understands. Semantra then displays the answer. The company say that it :
…is a pioneer in Natural Language and Semantics that is applied in a search and information access context that enables enterprises to quickly and easily retrieve precise, critical information from complex corporate databases through inquiries in the language of a user’s business. With an understanding of linguistics, conceptual modeling and relational theory, Semantra built its software to empower business users with real time, common language commands and requests unavailable through traditional BI or enterprise search solutions. Semantra significantly improves the value of any enterprise business application. Semantra’s headquarters are located in Dallas, Texas.
A typical interface looks like this:
The company received an infusion of cash from CPMG about one year ago. The company plans a product roll out later in 2008. You can learn more about the company here.
Despite the challenges some text and content processing companies face with their sources of funding, Semantra appears to have few problems.
Stephen Arnold, July 23, 2008
CNet Uses the M Word
July 23, 2008
Charles Cooper’s “So When Do We Get over with It and Declare Google a Monopoly?” is a benchmark for Googzilla. You can read the full text of the essay here. Put aside the summer of transparency with Googlers explaining how to innovate, run cloud services, and motivate math wizards.
Writing in CNet News.com Mr. Cooper quotes noted legal eagle, Richard Schmalensee, laborer in the knowledge vineyard at MIT, probing Mr. Schmalensee about when a company becomes a monopoly:
There’s no magic threshold but with high share levels, you get to be concerned… On the other hand, monopolists are allowed to compete. The question is whether the arrangement would stifle competition.
My thought is that Google is far ahead of Google and Yahoo in plumbing. Search and advertising pay the bills but these are applications. Google is poised to move into other sectors.
Google is not a monopoly; Google is a 21st century East India Company. If you recall your history, EOC operated as a nation state but as constructs do, EOC faded.
Stephen Arnold, July 23, 2008
Intellectual Riches from the Fortune Brainstorming Tech Conference
July 22, 2008
Harrods Creek is a long way from Half Moon Bay, California. Thanks to the modern technology available here in the hills among the possums and the rabbits, I have been able to follow some of the action at the Fortune Brainstorming Tech Conference. Fortune, as you know, is the Batman of business magazines, and it uses its glittering reputation to corral big thinkers to brainstorm.
One of the most interesting articles about this conference is Stefanie Olsen’s “Viacom CEO: Great Content Is King”. I hope this discussion among Viacom, Verizon Communications, and Google finds its way to YouTube. Please, read Ms. Olsen’s full text report here.
What stopped me in my tracks was this quote from the brainstorming cyclone from Time Warner, owner of Fortune Brainstorming Tech Conference:
We [Viacom] have vast libraries of content, and we are able to find new audiences thanks to emerging distribution. People in Asia are discovering Beavis and Butt-head and it hasn’t been in the United States for seven years…For us, it’s about finding more and more places to put it.
Google’s Vint Cerf asked a Googley question about if “content and distribution of content will be separate going forward?” This is a darn good question. The Viacom and Verizon executives’ answers, in my opinion, were muddled. In my opinion, I don’t think either the Viacom or the Verizon executive knew what Mr. Cerf meant. His question was a lot clearer to me than the answers given to Mr. Cerf.
As I understand, Viacom’s answer, creating and distributing are different but the two are joined by an economic interest. And Verizon emphasize that Verizon is all about the network.
Okay. I must admit I don’t know what these two executives are trying to tell me. Viacom’s intellectual riches include Beavis and Butt-head, who have earning power in China. The Verizon person talks about network, and my last dealings with the company involved a charge for data services in Canada where the data service was explicitly not supposed to work. No one cared about that $300 charge. I just paid the bill and concluded that Verizon is about charging me for services that are not supposed to work outside the US. I paid a price for my curiosity.)
I am still laughing about the reference to Beavis and Butt-head in the context of the Fortune Brainstorming Tech Conference. If this is tech from Fortune, I am glad I live in rural Kentucky, and I am delighted that I dropped my Fortune subscription. I wonder if there are any reruns of Beavis and Butt-head on my Apple TV?
Stephen Arnold, July 22, 2008
Autonomy Nails Another Laurel to Its Crown
July 22, 2008
Autonomy follows it analyst-crushing financial results with the “highest Socha-Gelbmann rankings”. The story appeared in the highly regarded MarketWatch online news service via the PRNewswire via FirstCall via Comtex. I am thrilled that the news reached me quickly. You can read the full story here. If you have been living in a hollow in rural Kentucky, you may ask, “What’s a Socha-Gelbmann Ranking?” Well, let me fill you in.
Socha-Gelbmann
Socha Consulting LLC, operated by George J. Socha, Jr., Esquire, does surveys and delivers services in eDiscovery and automated litigation support activities. Socha Consulting focuses on the eDiscovery market. The acronym means “electronic discovery”, a buzzword much loved by attorneys and consultants involved in figuring out what’s in the terabytes of electronic information delivered by the legal discovery process.
Mr. Socha Jr., Esquire is the principal in Socha Consulting, LLC, a firm which provides expert advice to consumers with respect to effective electronic discovery strategies, and to providers with respect to the development of e-discovery services, software and strategy. Prior to forming Socha Consulting, Mr. Socha worked in private practice where he helped establish litigation support departments at 250-attorney and 50-attorney firms. Mr. Socha is a graduate of the University of Wisconsin (B.A.) and Cornell Law School. You can read this bio here. Mr. Socha’s offices are also in Minnesota, in St. Paul, a lovely city.
Tom Gelbmann is the other half of the research report’s team. Information about him is located at Gelbmann.biz here. Mr. Gelbmann runs a consulting practice focused on helping law firms and corporate law departments maximize value from investments in technology. He has worked as a CIO position at two major law firms, and he has also conducted several market research projects on behalf of information and technology service providers to the legal sector. Prior to his work with the legal technology community, Tom served as a Director of Computer Security Consulting for a global consulting organization. You can read his full bio here. His office is in Minnesota. Details are here.
eDiscovery
In a nutshell, eDiscovery indexes documents. The very best systems provide useful tools to the lucky souls who are billable during this tedious process of ferreting for evidence, facts, and supporting material. For example, some eDiscovery systems include billing functions to make it painless for the hard-charging attorney to tally the minutes, hours, days, weeks, and months required to “read” lots of email, memos, reports, and files with text in them. Other systems take an item–say, for example, the name of a person–and generate a list of related documents or people. Other systems chew through terabytes of text and generate a visual display of who is related to whom or what is related to what. I have seen systems using cartoon figures and lines to connect individuals, events, cash transfers, and other life actions. Most of these systems allow the legal eagle to enter a word or phrase, see a results list, browse a list of related topics, and perform other activities which can then be saved in a “case audit” file. The idea is that another lawyer can come along and recreate the exact finding process, identify the specific document with the needed “fact”, and print out the audit trail for a cowering opponent whose argument has been trashed with the brilliance of the legal argument, silver bullet fact, and solid research.
So, the most recent study by George J. Socha, Jr., Esquire is described here. The current report looks over the previous five years of Socha-Gelbmann results and the output is the 2008 Socha-Gelbmann 6th Annual Electronic Discovery Survey.
Findings
The new report are available now. The big news is that Autonomy has been, according the the aforementioned news story:
named a Top 5 Electronic Discovery Provider in the 2008 Socha-Gelbmann Electronic Discovery Survey Report for its ZANTAZ’ e-Discovery software and service. Autonomy was named a Top 5 Provider in nine software and service categories, including preservation, collection, analysis, production, presentation, and law firm rankings. This marks the fourth consecutive year that the company has been ranked as a Top 5 service provider in the report.
You can get a small nibble of the approach in this series of questions about the 2007 study here.
Autonomy provides, according to the news story:
end-to-end eDiscovery for the largest and most complex legal and regulatory matters, supported by 6,000 servers across five data centers. This comprehensive technology and services solution provides data preparation, analytics for Early Case Assessment (ECA), legal hold, full EDD processing, advanced review and production, all on a powerful platform. Through automatic processing of all electronically stored information (ESI), whether email, audio or video, Autonomy enforces legal hold policies and enables eDiscovery across the organization based on the meaning and relevance of information to litigation.
Kudos to Autonomy for this excellent showing. And, to George J. Socha, Jr., Esquire and Tom Gelbmann, “Keep up the good work.” A happy quack to the Autonomy team as well. With video, fraud detection, and eDiscovery, I may have to recategorize Autonomy from enterprise search vendor to enterprise information application solution provider. If I do this, the search sector will lose a luminary. Plus ca change, plus c’est la même chose!
Stephen Arnold, July 22, 2008
Krugle Entgerprise 2.3 Appliance Arrives
July 22, 2008
ThomasNet, the Industrial NewsRoom [sic], reported on July 21, 2008, that Kugle has rolled out a search appliance. You can read the full news story here. Krugle is a search and metadata processing vendor specializing in code. The Enterprise 2.3 system can now handle over 10 billion lines of source code per appliance. You can learn more here. Krugle is a vertical search engine, if you are looking for an example of this type of niche strategy.
Appliances are now available from a number of vendors to make deployment of search less painful. Krugle joins Exegy, Clearwell Systems, Google, Index Engines, and Thunderstone, among others, in the appliance sector.
Why?
The costly and time consuming track record some search vendors have compiled is is great marketing program for search appliances. A search disaster causes the phone to ring at an appliance vendor’s office. Plug it in and go is often preferable to paying for a phalanx of vendor engineers drinking coffee in the cafeteria to fill out their time sheets and talk about multi-core processors.
Stephen Arnold, July 22, 2008
Google: More Insight into Its Cloud Play
July 22, 2008
The summer of transparency continues. The lucky beneficiary of the Google summer of openness is Doug Henschen, the capable journalist at Intelligent Enterprise. Mr. Henchen does a good job of capturing the thoughts of Rishi Chandra. If you are a Google watcher, you will want to read the full text of this interview here.
Several points stuck in my addled goose brain after reading this interview. May I share these with you? Then I will offer a handful of observation from the hills of rural Kentucky. Now the main points for me:
- The Salesforce.com tie up with Google “is at the application layer.” The idea is that Salesforce.com can tap Google and–the interesting bit–Google can tap into Salesforce.com. Google also pats Salesforce.com on the head. Mr. Chandra says, “”…They helped us improve those APIs so they [Salesforce.com] can do a lot more than third parties could do previously.”
- App Engine is designed or engineered to “bulld onto Google infrastruture and gain the same capabilities and scalability.”
- Mr. Chndra notes, “We built our infrastructure os it’s particularly tuned for Web applications.”
- Mr. Chandra reveals, “Our appraoch will follow the model … to go from consumers to the enterprise”
I find these points significant because Google transparency talks “run a game plan.” Google is not the slap dash group of beer drinking math club members that some in the media see when scrutinizing Google. My research suggests that Google is caclulating. In fact, the subtitle of my 2007 study is “the Calculating Predator.”
Let me offer some observations about these four points. I will reference some of the research data in my Google Version 2.0, so if you want the nitty gritty, you will need to track down that 250 page, dull as dishwater report here.
Observations from the Addled Goose
First, Salesforce.com wants to do a lot more than date Google, according to my research. Google has technology that can cure the scaling and performance issues that make Salesforce.com brittle. Like eBay, making wholesale changes to the core architecture is expensive, complicated, and risky if the fix doesn’t work. Google, for its part, is cheerleading Google and providing some coaching. What’s Google get from the deal? Well, it’s not just exposure to the ways of enterprise cloud computing. Google is able to learn from the Salesforce.com efforts. I think of the Google-Salesforce.com tie up like a lab experiment. Salesforce.com also has some tasty multi-tenant technology. This invention virtualizes virtualized infrastructures. So, yoiu have an application. You run it so it is stable, scalable, and reasonably reliable. Then you allow different companies to use the application as if that application was dedicated to the one company. Everything is partitioned. Very cool stuff indeed.
Second, the reference to the “same infrastruture” is one of those phrases that carry a truck load of meaning. I remember a teacher in a required literature class making us talk in depth about William Carlos Williams’ phrase “red wheelbarrow”. The phrase “same infrastructure” means that the goodies running on the Google infrastructure can–at some point in time of Google’s choosing–be made available to an enterprise. Want to crunch 20 years’ American Express credit card data for the week before Mother’s Day in New York City? Today AMEX cannot do this without some work. With the Google data management infrastructure, the query is not only trivial. The query will return results in a fraction of the time American Express’ MBAs now wait. This is–believe me–a very big deal.
Third, Mr. Chandra says clealy that what Google built to do Web search has now, after a decade–that’s 10 years of Internet time–is tuned for Web applications. I suppose Google could buy a billboard on Times Square, but the company is crystal clear in telling Mr. Henchen and his readers that Google is an application platform. Yoiu can read more about how a Web search system works like a big distributed laptop in my 2005 study The Google Legacy here. Note: this is also dull and long, stuffed with technical diagrams that explain why Microsoft and Yahoo are lagging Messrs. Brin and Page in online services.
And, finally (actually fourthly), Mr. Chandra lays out the strategic thrust of Google. Google has been using this phrase for a number of years. I think I have a note somewhere that the first use of the phrase was in 2001. Google is not a wacky math club. Google is a supra-national enterprise that competitors, regulators, and mavens are just now–after a decade in Internet time–beginning to perceive in its broad outlines. It’s thrilling that ComScore and NetRatings report that Google has a two thirds share of the search market, but the take make it very tough to look at Act II at Google. My research suggests that ACT II will be a compelling event to watch from the sidelines. I don’t want to be under one of Googzilla’s paws when the beast heads for the Fortune 100 in earnest. Keep in mind that as Google “hooks” young folks, these little Googlers will pull Google into the enterprise. Hence, the consumer becomes the enterprise customer. An ad in eWeek or CIO won’t stop this type of demographic play.
Net net: Kudos to Doug Henschen for getting a Googler to talk. I love to watch the Jedi knights at Google run the game plan. The tactic is one that an NFL owner can understand. Senior managers at IBM and Oracle? I’m not sure.
Stephen Arnoold, July 22, 2008
Google Patents May Be Worthless
July 22, 2008
Thank goodness I am not an attorney. The Patent Law Blog ran an essay today (July 21, 2008) called “The Death of Google’s Patents”. You may want to read the full text of the essay here. I went through the write up two times and still came away wondering about the implications of John F. Duffy’s article in “Patently: Patent Law Blog.”
The key point in the essay was a court ruling that says, “You cannot patent a software process.” There may be some exceptions, but if a company like Google has a patent on a software system or method, well, those patents are toast. (You can see why I am not legal eagle material.)
Once Mr. Duffy drops this information bomb, he moves on to the subject of Google’s PageRank invention. If the decisions reviewed in this Web log post are ones with bite, Google could lose patent protection. Keep in mind that the PageRank invention is not Google’s. The patent proudly announcements that Stanford University is the assignee, but I may be misreading the PDF in front of me. My recollection is that the patent carries a note that some of the work was performed under a National Science Foundation grant. Does that mean that some or all of the PageRank “invention” is usable by other companies?
The rulings summarized in this thought provoking essay could (will?) undermine Google’s legal position for some, maybe all, of the firm’s 350 patent documents.
Could Google’s patent documents end up in the garbage dump?
Does Google Have a Patent Strategy?
At the Boston Search Engine Meeting in April 2008, a young wizard and former Microsoftie, asked in the Q&A session after my speech, “Just because a company has a patent, does that mean the company will use the disclosed invention?”