Google Dabbles in Big Data and Works Toward Disambiguation

March 20, 2013

The ambiguous nature of language is a subtlety that is really only grasped by the human mind. Computers aren’t able to disambiguate the mumbo jumbo that is the human language. Mercury is mercury to a computer; it doesn’t matter whether it is Freddie, the planet, the car, an element or any of the other plethora of possibilities.

In “Learning From Big Data: 40 Million Entities in Context” Google (go figure, the company is becoming the proverbial duct tape of the technology world) is hoping to change all of that.

“To provide that help, we are releasing the Wikilinks Corpus: 40 million total disambiguated mentions within over 10 million web pages…The mentions are found by looking for links to Wikipedia pages where the anchor text of the link closely matches the title of the target Wikipedia page. If we think of each page on Wikipedia as an entity (an idea we’ve discussed before), then the anchor text can be thought of as a mention of the corresponding entity.”

It will allow searchers to co-reference search terms and search across documents that are similar in nature and work on subsets of data. Let’s face it, if it works…there will be a lot of time and money saved and a lot of headaches will go away. If it doesn’t, Google may want to rethink its foray into the Big Data arena.

Leslie Radcliff, March 20, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta