Metadata Extraction

February 8, 2009

A happy quack to the reader who sent me a link to “Automate Metadata Extraction for Corporate Search and Mashups” by Dan McCreary here. The write up focuses on the UIMA framework and the increasing interest in semantics, not just key word indexing. I found the inclusion of code snippets useful. The goslings here at Beyond Search are urged to copy, cut and paste before writing original scripts. Why reinvent the wheel? The snippets may not be the exact solution one needs, but a quick web footed waddle through them revealed some useful items. Mr. McCreary has added a section about classification and he used the phrase “faceted search” which may agitate the boffins at Endeca and other firms where facets are as valuable as a double eagle silver dollar. I was less enthusiastic about the discussion of Eclipse, but you may find it just what you need to chop down some software costs.

The write up in in several parts. Here are the links to each section: Part 1, Part 2, and Part 3. I marked this article for future reference. Quite useful if a bit pro-IBM.

Stephen Arnold, February 6, 2009

Written by Stephen E. Arnold · Filed Under Enterprise, News, Online (general), Semantic, Technology

Comments

4 Responses to “Metadata Extraction”

Daniel Tunkelang on February 8th, 2009 8:31 am

No agitation at seeing others proclaim the merits of faceted search–they might even drive my book sales! But I’m not sure about mixing current British and early American slang.
Stephen E. Arnold on February 8th, 2009 3:46 pm

Daniel Tunkelang,

As an addled goose, I mix quacks with English. The variants don’t resonate with a fowl.

Stephen Arnold, February 8, 2009
Dan McCreary on February 8th, 2009 4:21 pm

Thanks for the nice review! Did I use the expression “faceted search” incorrectly? Do you have another definition?

Regarding the pro-IBM slant, I actually was asked by the editors to take most of the references to the history of the CAS architecture out since most of it was done internally within IBM.

Any feedback on the overall architecture of UIMA/CAS vs. other more I/O intensive pipeline architectures? Are they any real alternatives to UIMA/CAS for open standards of text-mining annotators?
Stephen E. Arnold on February 8th, 2009 6:59 pm

Dan McCreary,

Thanks for posting and knowing about my Web log. Some writers get their feathers plucked in Beyond Search. You are welcome in Harrod’s Creek pond anytime.

Stephen Arnold, February 8, 2009

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.