Entity Extraction with Solr

July 11, 2013

Entity extraction is a feature that many enterprise users want to build into their architecture. Solr 4 has the features that allow a work around or “poor man’s” entity extraction. Erik Hatcher, one of the founders of LucidWorks, explains how in his SearchHub blog entry, “Poor Man’s ‘Entity’ Extraction with Solr.”

The instructions begin:

“Entity extraction, as defined on Wikipedia, ‘seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.’ When drilling down into the specifics of the requirements from our customers, it turns out that many of them have straightforward solutions using built-in (Solr 4.x) components, such as: Acronyms as facets; Key words or phrases, from a fixed list, as facets; Lat/long mentions as geospatial points.”

SearchHub is one of many means through which LucidWorks bolsters its support and training to all Apache Lucene Solr developers as well as LucidWorks customers. LucidWorks users find that both the LucidWorks Big Data and LucidWorks Search solutions are ready to go out-of-the-box but allow customization and scalability in a way that Hatcher demonstrates above.

Emily Rae Aldridge, July 11, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Written by Stephen E. Arnold · Filed Under Entity extraction, News

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.