Do Google and Microsoft Really Understand the Web?

June 22, 2012

We humans are difficult for search engines to understand. For example, try this query: “terminal”. Okay, which is it, airplane terminal, bus terminal, computer terminal? You get the idea. Ars Technica explains “How Google and Microsoft Taught Search to Understand the Web.”

Journalist Sean Gallagher and associates picked the brains behind two of the Web’s biggest search engine projects, Google‘s Knowledge Graph and Microsoft‘s Satori. Both are efforts to move search from matching strings of text to connecting the dots of meaning. The result is an in-depth explanation that any search professional should become familiar with. The article informs us:

“The efforts are in part a fruition of ideas put forward by a team from Yahoo Research in a 2009 paper called ‘A Web of Concepts,‘ in which the researchers outlined an approach to extracting conceptual information from the wider Web to create a more knowledge-driven approach to search. They defined three key elements to creating a true ‘web of concepts’:

  • Information extraction: pulling structured data (addresses, phone numbers, prices, stock numbers and such) out of Web documents and associating it with an entity
  • Linking: mapping the relationships between entities (connecting an actor to films he’s starred in and to other actors he has worked with)
  • Analysis: discovering categorizing information about an entity from the content (such as the type of food a restaurant serves) or from sentiment data (such as whether the restaurant has positive reviews).”

These ideas are still mostly unrealized, but Google and Microsoft are both beginning to make progress. Entity extraction itself is not new, but the database scale and relationship building of the current approaches are. Both companies’ entity databases are non-traditional. They are graph databases that map relationships between users and activities, much like Facebook’s Open Graph.

“Entities” have become complicated bundles of information. Each contains a unique identifier; a collection of properties based on the attributes of the real-world topic; links representing the topic’s relationship to other entities; and things a user searching for that topic might want to do. The article compares and contrasts how each company collects and manages these dossiers. One difference lies in each system’s UI—Google’s seems more about answering questions, while Bing’s new front end appears to facilitate taking actions.

Both Knowledge Graph and Satori give the user ways to help in the first results list, by narrowing the search or pointing the engine down the correct path. This sort of direction is still essential, since neither company is anywhere close to making seamless and accurate semantic search a reality. Both engines still have holes, and are already fighting lag from their growing data bases. And that’s just while working with just English! The article concludes:

“When other languages are added to the entity extraction language processing of the search engines, the number of entities and relationships they have to manage is bound to explode, both in terms of number and complexity. To truly ‘understand’ the Web, Knowledge Graph and Satori are going to have to get a lot smarter. And they’re bound to push the bounds of semantic processing and computing forward in the process, as bigger and bigger graphs of knowledge are shoved into memory.”

It seems that natural language search worthy of a futurist’s dreams is still years away. This article is a great window into the baby steps being made right now by two of the Web’s biggest crawlers.

Cynthia Murrell, June 22, 2012

Sponsored by PolySpot

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta