Attivio Highlights Content Intake Issues

November 4, 2014

I read “Digesting Ingestion.” The write up is important because it illustrates how vendors with roots in traditional information retrieval like Attivio are responding to changing market demands.

The article talks about the software required to hook a source like a Web page or a dynamic information source to a content processing and search system. Most vendors provide a number of software widgets to handle frequently encountered file types; for example, Microsoft Word content, HTML Web pages, and Adobe PDF documents. However, when less frequently encountered content types are required, a specialized software widget may be required.

Attivio states:

There are a number of multiplicative factors to consider from the perspective of trying to provide a high-quality connector that works across all versions of a source:

·         The source software version, including patches, optional modules, and configuration

·         Embedded or required 3rd party software (such as a relational database), including version, patches, optional modules and configuration

·         Hardware and operating system version, including patches, optional modules, and configuration

·         Throughput/capacity of the repository APIs

·         Throughput/capacity and ability to operate in parallel.

This is useful information. In a real world example, Attivio reports that a number of other factors can come into play. These range from lacking appropriate computing resources to corrupt data that connectors send to the exception folder and my favorite Big Data.

Attivio is to be credited for identifying these issues. Search-centric vendors have to provide solutions to these challenges. I would point out that there are a number of companies that have leapfrogged search-centric approaches to high volume content intake.

These new players, not the well known companies providing search solutions, are the next generation in information access solutions. Watch for more information about automated collection and analysis of Internet accessible information and the firms redefining information access.

Stephen E Arnold, November 4, 2014

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta