Patterns in Web Content

March 10, 2011

Data mining refers to a form of application which seeks common themes or patterns in specific pools of information. The core of its popularity rests within the scientific communities, though the technology is increasingly being applied in the various arteries of the commercial sector.

The exponential growth of the Web has brought into focus the necessity for the ability to trace and scrutinize the relationships inherent in the aforementioned collections of information.
The Computational Linguistic & Psycholinguistics Research Center (CLiPS) located in Belgium has just released Pattern, a mining unit that was designed to couple with the Python language system. The Pattern Web site says:

“It [Pattern] bundles tools for data retrieval (Google + Twitter + Wikipedia API, Web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks).”

When you follow the link above, you can access the release directly. Check out the the specifications for compatibility.

I thought it interesting to discover the designers, in a trial of their creation, used the software to track the progress of a local politicians in the 2010 elections in their home country. Pattern scanned thousands of Tweets, split between two languages, updating the data pool on a daily basis. The results were fascinating. You can read a detailed description of the experiment here.

Micheal Cory, March 10, 2011

Written by Stephen E. Arnold · Filed Under News, Semantic, Text processing

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.