HP and Its New IDOL Categorizer
January 1, 2014
I read “Analytics for Human Information: Optimize Information Categorization with HP IDOL.” I noticed that HP did not reference the original reference to the 1998 categorization technology in its write up. From my point of view, news about something developed 15 years ago and referenced in subsequent Autonomy collateral is not something fresh to me. In fact, presenting the categorizer as something “amazing” suggests a superficial grasp of the history of IDOL technology which dates from the late 1980s and early 1990s. It is fascinating how some “experts” in content processing reinvent the wheel and display their intellectual process in such an amusing way. Is it possible to fool oneself and others? Remarkable.
Update, January 1, 2014, 11 am Eastern:
Hewlett Packard is publicizing IDOL’s automatic categorization capability. As a point of fact, this function has been available for 15 years. Here’s a description from a 2001 Autonomy IDOL Server Technical Brief, 2001.
DOL server can automatically categorize data with no requirement for manual input whatsoever. The flexibility of Autonomy’s Categorization feature allows you to precisely derive categories using concepts found within unstructured text. This ensures that all data is classified in the correct context with the utmost accuracy. Autonomy’s Categorization feature is a completely scalable solution capable of handling
high volumes of information with extreme accuracy and total consistency. Rather than relying on rigid rule based category definitions such as Legacy Keyword and Boolean Operators, Autonomy’s infrastructure relies on an elegant pattern matching process based on concepts to categorize documents and automatically insert tag data sets, route content or alert users to highly relevant information pertinent to the users profile. This highly efficient process means that Autonomy is able to categorize upwards of four million documents in 24 hours per CPU instance, that’s approximately one document, every 25 milliseconds. Autonomy hooks into virtually all repositories and data formats respecting all security and access entitlements, delivering complete reliability. IDOL server accepts a category or piece of content and returns categories ranked by conceptual similarity. This determines for which categories the piece of content is most appropriate, so that the piece of content can subsequently be tagged, routed or filed accordingly.
Stephen E Arnold, January 1, 2014