Metadata Extraction
February 8, 2009
A happy quack to the reader who sent me a link to “Automate Metadata Extraction for Corporate Search and Mashups” by Dan McCreary here. The write up focuses on the UIMA framework and the increasing interest in semantics, not just key word indexing. I found the inclusion of code snippets useful. The goslings here at Beyond Search are urged to copy, cut and paste before writing original scripts. Why reinvent the wheel? The snippets may not be the exact solution one needs, but a quick web footed waddle through them revealed some useful items. Mr. McCreary has added a section about classification and he used the phrase “faceted search” which may agitate the boffins at Endeca and other firms where facets are as valuable as a double eagle silver dollar. I was less enthusiastic about the discussion of Eclipse, but you may find it just what you need to chop down some software costs.
The write up in in several parts. Here are the links to each section: Part 1, Part 2, and Part 3. I marked this article for future reference. Quite useful if a bit pro-IBM.
Stephen Arnold, February 6, 2009
Comments
4 Responses to “Metadata Extraction”
No agitation at seeing others proclaim the merits of faceted search–they might even drive my book sales! But I’m not sure about mixing current British and early American slang.
Daniel Tunkelang,
As an addled goose, I mix quacks with English. The variants don’t resonate with a fowl.
Stephen Arnold, February 8, 2009
Thanks for the nice review! Did I use the expression “faceted search” incorrectly? Do you have another definition?
Regarding the pro-IBM slant, I actually was asked by the editors to take most of the references to the history of the CAS architecture out since most of it was done internally within IBM.
Any feedback on the overall architecture of UIMA/CAS vs. other more I/O intensive pipeline architectures? Are they any real alternatives to UIMA/CAS for open standards of text-mining annotators?
Dan McCreary,
Thanks for posting and knowing about my Web log. Some writers get their feathers plucked in Beyond Search. You are welcome in Harrod’s Creek pond anytime.
Stephen Arnold, February 8, 2009