Entity Extraction from Google and Yahoo
August 19, 2009
I found the announcement reported on the Programmable Web a harbinger. “Yahoo Quietly Axes Two Search APIs” disclosed that Yahoo is out of the entity extraction business * before * the Microsoft deal closes. I can see nuking this type of service once the deal closes, but killing off a service quietly is more troubling to me. Programmable Web provides a link to “Being Optimistic at the Deathbed of Yahoo Search API”. I am not so optimistic. You can read Yahoo’s announcement on the YDN:
What is the impact? Yahoo offers no information. What about those BOSS fans? Yahoo offers no information.
To my surprise, Mashable here reported that Yahoo has not killed its term extraction API.
What’s ironic is that at about the same time as Yahoo’s entity flip flop flip was taking place, the Google’s patent document US20090204592 was published. Now entity extraction is no big deal any longer. You can poke around and find open source routines or you can click on Google ads for Teragram’s solution. I find these Google patent documents interesting because it suggests to me that the Google is cognizant of the functions that search vendors such as Autonomy and Endeca have been including in their upscale systems. With the Google nosing into these functions, I have a hunch that Google will be looking to add some new zing to its Google Search Appliance and its enterprise applications.
You can read the patent document using the wonderful USPTO system here. The abstract for the document filed on April 9, 2009, complements other Google text processing patent documents. (You can explore these via the Perfect Search / ArnoldIT.com service at http://arnoldit.perfectsearchcorp.com/.)
A system receives a search query, determines whether the received search query includes an entity name, and determines whether the entity name is associated with a common word or phrase. When the entity name is associated with a common word or phrase, the system generates a link to a rewritten query, performs a search based on the received search query to obtain first search results, and provides the first search results and the link to the rewritten query. When the entity name is not associated with a common word or phrase, the system rewrites the received search query to include a restrict identifier associated with the entity name, generates a link to the received search query, performs a search based on the rewritten search query to obtain second search results, and provides the second search results and the link to the received search query.
Yahoo waffles (thrashes in confusion?) and the Google discloses an entity function. I noted with interest that one of the Google entity extraction inventors was Marissa Mayer along with several colleagues. Tell me. Which company seems to be on the upswing? Which company is pointed toward the sunset?
Stephen Arnold, August 19, 2009