Google Metaweb Deal Points to Possible Engineering Issue

July 19, 2010

Years ago, I wrote a BearStearns’ white paper “Google’s Semantic Web: the Radical Change Coming to Search and the Profound Implications to Yahoo & Microsoft,” May 16, 2007, about the work of Epinions’ founder, Dr. Ramanathan Guha. Dr. Guha bounced from big outfit to big outfit, landing at Google after a stint at IBM Almaden. My BearStearns’ report focused on an interesting series of patent applications filed in February 2007. The five patent applications were published on the same day. These are now popping out of the ever efficient USPTO as granted patents.

A close reading of the Guha February 2007 patent applications and other Google technical papers make clear that Google had a keen interest in semantic methods. The company’s acquisition of Transformics at about the same time as Dr. Guha’s jump to the Google was another out-of-spectrum signal for most Google watchers.

With Dr. Guha’s Programmable Search Engine inventions and Dr. Alon Halevy’s dataspace methods, Google seemed poised to take over the floundering semantic Web movement. I recall seeing Google classification methods applied in a recipe demo, a headache demo, and a real estate demo. Some of these demos made use of entities; for example, “skin cancer” and “chicken soup”.

Has Google become a one trick pony? The buy-technology trick? Can the Google pony learn the diversify and grow new revenue tricks before it’s time for the glue factory?

In 2006, signals I saw flashed green, and it sure looked as if Google could speed down the Information Highway 101 in its semantic supercar.

Is Metaweb a Turning Point for Google Technology?

What happened?

We know from the cartwheels Web wizards are turning, Google purchased computer Zen master Danny Hillis’ Metaweb business. Metaweb, known mostly to the information retrieval and semantic Web crowd, produced a giant controlled term list of people, places, and things. The Freebase knowledgebase is a next generation open source term list. You can get some useful technical details from the 2007 “On Danny Hillis, eLearning, Freebase, Metaweb, Semantic Web and Web 3.0” and from the Wikipedia Metaweb entry here.

What has been missing in the extensive commentary available to me in my Overflight service is some thinking about what went right or wrong with Google’s investments and research in closely adjacent technologies. Please, keep in mind that the addled goose is offering his observations based on his research for this three Google monographs, The Google Legacy, Google Version 2.0, and Google: the Digital Gutenberg. If you want to honk back, use the comments section of this Web log.

First, Google should be in a position to tap its existing metadata and classification systems such as the Guha context server and the Halevy dataspace method for entities. Failing these methods, Google has its user input methods like Knol and its hugely informative search query usage logs to generate a list of entities. Heck, there is even the disambiguation system to make sense of misspellings of people like Britney Spears. I heard a Googler give a talk in which the factoid about hundreds of variants of Ms. Spears’s name were “known” to the Google system and properly substituted automagically when the user goofed. The fact that Google bought Metaweb makes clear that something is still missing.

Maybe Google wanted the open source “magnet” Freebase? Maybe Google wanted Danny Hillis, super Yoda to Google’s Jedi knights? Maybe Google needed better technology? Whatever the reason, the deal cost a lot of money. Metaweb had attracted money from Benchmark Capital and other outfits. The total invested may be close to $60 million. Figuring a 10X return for the VC crowd, the acquisition price tag could be a hefty $600 million, maybe even more? Google has $30 billion in loose change, so Google can buy what it wants without cutting more day care. What does Metaweb have that Google needs? That’s a good question and it may underscore that whatever Google had developed, bought, or downloaded lacked firepower to fight tomorrow’s search wars is lacking. That’s important to me.

Second, the open-source angle is interesting. I am becoming more supportive of open source solutions. I think that many commercial enterprises use open source the way Madison Avenue uses the word “new” on laundry detergent. The word carries no freight of meaning. The fact that Metaweb has its knowledgebase in open source means that anyone, including a Google engineer, could download the term list or access it through the Metaweb API. Why pay? If the knowledgebase is open source, maybe Metaweb has some other asset such as people, technology, a mailing list, or technology that actually works? Following this line of reasoning, the addled goose asked himself, “Why doesn’t Google’s existing technology, engineering, and knowledgebases deliver what’s needed?” Has Google’s vaunted engineering fallen short in what is a pivotal technical capability? The goose can speculate that Google’s existing technology is like a Tour de France former champion struggling to keep pace with younger, more agile competitors. I will just toss this lance at you and allow you to answer the questions.

Third, the deal – which seems to be part of a mini-boom in semantic acquisitions – could be a “let’s buy it now” play. Microsoft snapped up Powerset. Evri (Paul Allen) bought Radar Networks. Expert System (Italy) is finding itself besieged by requests to go to the prom. A promising outfit in Madrid (www.bitext.com) finds itself almost as popular as Spain’s World Cup team player Andrés Iniesta. Maybe Google is taking a defensive position? Maybe Google feels pressure and thinks that a Cold War strategy is one way to deal with Facebook, Twitter, and the social shift in search.

Hypotheses for July 2010: The Run Up to 2011

For me, these questions point to three hypotheses about Google 2010. Feel free to disagree, just bring along some factoids or references to Google products, services, technical papers, or patent documents, please.

Google’s internal engineering solutions don’t work in the morphing social search world.

This is a statement that strikes at Google’s aggressive investment in its own engineers’ systems and methods. In the semantic space, Google has some very heavy artillery in position and firing. Are these existing solutions like the defenses in the Maginot Line? Work like a champ as long as the enemy stands directly in the line of fire. When the enemy just goes around the emplacements, panic sets in. Maybe the acquisition of Metaweb is an example of an outfit in a defensive position looking for new weaponry to meet a different type of enemy tactic. That Zuckerberg or Biz guy are still be a problem. The fix is coming in late 2010 or 2011. The fix, in my opinion, was needed when Orkut ran aground years ago.

2006 to 2010: Now vulnerable?

Second, since Google’s miracle year of 2006 when everything went Google’s way, 2010 is a different story. Stock’s down. Legal hassles up. Competition swarming. Google’s responses have been ineffectual and sometimes wacky like the “tell China it has to change” and then the “China is really sort of okay”. The invulnerable Google sure looks a lot like a company struggling from front to front, battle to battle, skirmish to skirmish. “Call in the technical reinforcements” echoes in my addled brain. Is Google suffering from Social War Syndrome?

New problem, wrong solution?

Google has been an organization keen on solving problems with logic. If Facebook is an annoyance with its pesky Xooglers, technology will solve the problem, according to Google methodological principles. If Twitter is a pain, then stuffing Twitter into another Google service will solve the problem, right? My view is that the solution to Google’s problems are less technological and more managerial. With Google’s diffusion across different market sectors, the founders have lost the ability to apply laser focus to what are non-search problems. Nothing is more illogical than social behavior. If logic worked, then Google would not find itself bouncing from problem to problem across publishing, mapping, social networking, and other hot spots. Solving a focus problem with more technology or even duplicative technology won’t work in my opinion.

Wrap Up

The Metaweb acquisition may add exactly what’s needed to the Google Molotov cocktail. My research suggests that Google needs to buy companies that equip it for battles that Google is in danger of losing right now. One example is rich media. Amazon, Apple, Hulu, and others are generating revenue from rich media. Google’s flagship YouTube.com is still in port dealing with the Customs House staff about value.

There are other markets where Google is not making the type of headway that investors, even competitors once expected. Examples range from enterprise search to online shopping, from online games to streaming music. This acquisition calls attention to the need for Google to shift from hoarding technology to using it to generate significant new revenues from new markets. The recent earning report makes clear that Google is spending more and possibly struggling with cost controls in its core operations.

This is one more indication of the shift in the company’s management capabilities between 2006 and 2010. Metaweb can become a great acquisition when Google uses the company’s talent and technology to build ad revenue and disprove the “one trick pony” saddle that Google seems to be sporting these days.

Google has changed. The Metaweb deal may be one more piece of evidence of the nature of the shift in the last 48 months.

Stephen E Arnold, July 18, 2010

Freebie

Written by Stephen E. Arnold · Filed Under Business strategy, Editorial opinion, Feature, Google, Mobile, News, Semantic, Technology, Text analytics, Text processing

Comments

3 Responses to “Google Metaweb Deal Points to Possible Engineering Issue”

Serge on July 19th, 2010 7:02 pm

Lots of good points. I definitely agree there’s a void within G’s new media strategies and this could be one way of dealing with it..
Antonio Valderrabanos on July 22nd, 2010 5:47 am

Stephen, thanks for the reference, Iniesta is quite happy with it too.

On the technical side: I have a feeling that if you are right about the engineering issue and the issue indeed exists, the issue will stay there for a while and Metaweb may not bring the solution.

This acquisition means making progress on the “semantic web” path, whatever that means. Strange as though it may seem, this is not the same as making progress on the “semantics” path (even if this may be the intended path). Metaweb is about organizing (labeling, merging, synching…) information sources that have STRUCTURE. Metaweb is NOT about discovering structure in unstructured sources, like running text. Freebase wasn’t built from texts, but from databases. This capability, structuring the unstructured, is the key to travelling the “semantics” path or, to be more precise, the “linguistics” path. And, in my view, this is the relevant path.

Why is the “semantic web” path not enough (although certainly valuable)? Freebase may contain tons and tons of entities and relationships; as big as it may be, it is limited, static and dependent on sources built manually. The “linguistics” path makes it possible to identify entities and relationships DYNAMICALLY, the way entities and relationships appear in real life (newspapers, blogs, etc.), because it allows machines to structure unstructured information (to learn the “aboutness” of a sentence or a text). In other words, the “linguistic” path is the one that frees you from static structured data (always scarce) and allows you to exploit (ever growing and free) existing text. Besides, exploiting all the text available on the Web makes it possible to target the entity and relationship issue – the one that Metaweb is targeting – at the web scale. The “semantic web” path will not free you from the dependency on pre-existing data, hence it will not be fit for a web-scale problem.

And this is why, Stephen, you may be writing on this topic for a while, until the “semantic web” path reaches its dead end and other paths are looked into with more insight
Anonymous on July 30th, 2010 3:19 pm

I wouldn’t be so sure that the Metaweb acquisition cost much money. Your math is based on successful companies being purchased which are the vast minority. If Google really spent 600MM to acquire Metaweb, don’t you think that the terms would have had to be announced to their shareholders? Not to mention that Metaweb has been actively laying off employees for the last year or more since they haven’t really made any money. Valuations these days are based on earnings and earnings potential and all Metaweb has are what engineers are left and some clever algorithms. I think it’s quite likely that this was a fire sale and Google got them for a song.

And I wouldn’t put too much faith into Dr Guha’s work. He was basically ousted from Epinions after 1 year because he was all theory with nothing that was practical or workable.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.