Need an Open Source Semantic Web Crawler?

December 17, 2015

If you do, the beleaguered Yahoo has some open source goodies for you. Navigate to “Yahoo Open Sources Anthelion Web Crawler for Parsing Structured Data on HTML Pages.” The software, states the write up, is “designed for parsing structured data from HTML pages under an open source license.”

There is a statement I found darned interesting:

“To the best of our knowledge, we are first to introduce the idea of a crawler focusing on semantic data, embedded in HTML pages using markup languages as microdata, microformats or RDFa,” wrote authors Peter Mika and Roi Blanco of Yahoo Labs and Robert Meusel of Germany’s University of Mannheim.

My immediate thought was, “Why don’t these folks take a look at the 2007 patent documents penned by Ramanathan Guha. Those documents present a rather thorough description of a semantic component which hooks into the Google crawlers. Now the Google has not open sourced these systems and methods.

My reaction is, “Yahoo may want to ask the former Yahooligans who are now working at Yahoo how novel the Yahoo approach really is.”

Failing that, Yahoo may want to poke around in the literature, including patent documents, to see which outfits have trundled down the semantic crawling Web thing before. Would it have been more economical and efficient to license the Siderean Software crawler and build on that?

Stephen E Arnold, December 17, 2015

Comments

2 Responses to “Need an Open Source Semantic Web Crawler?”

  1. serwis on December 17th, 2015 5:46 pm

    serwis

    Need an Open Source Semantic Web Crawler? : Stephen E. Arnold @ Beyond Search

  2. camera an ninh hcm on January 9th, 2016 4:46 am

    Great delivery. Great arguments. Keep up the good spirit.

  • Archives

  • Recent Posts

  • Meta