New IBM Redbook: IBM Watson Enterprise Search and Analytics
October 12, 2014
The Redbook is free. You can download it from this IBM link for now. The full title is “IBM Watson Content Analytics. Discovering Actionable Insight from Your Content.”
The Redbook weighs in with 598 pages of Watson goodness. If you follow the IBM content analytics products, you may know that the previous version was know as IBM Content Analytics with Enterprise Search or (ICAwES).
The Redbook presents some philosophical content. IBM has a tradition to uphold. In addition, the Redbook provides information about facets (yep, good old metadata), some mathy features that make analytics analytical, and sentiment analysis.
ICAwES does not operate as an island. The sprawling system can hook into IBM’s semi automatic classification system, Cognos, and interface tools.
Is ICAwES an “enterprise search” system? I would say, “Sure is.” You will have to work through the Redbook and draw your own conclusions. You will also want to identify the Watson component. Watson is Lucene with IBM scripts and wrappers, but IBM has far more colorful lingo for describing the system. After all, IBM Watson is supposed to generate $1 billion in a snappy manner. If IBM’s plan bears revenue fruit, in five or six years, Watson will be a $10 billion per year business. That’s quite a goal, considering Autonomy required 13 years to push into $800 million in revenue territory and IBM has been offering information retrieval systems since the days of STAIRS.
The new information in the July 2014 edition of the Redbook adds a chapter containing some carefully selected case studies. There is a new chapter called “Enterprise Search” to which I will return in a moment. Also, the many authors of the Redbook have added to the discussion of Cognos, one of IBM’s business intelligence systems. Finally, the Redbook provides some helpful suggestions for “customizing and extending the content analytics miner.”
I urge you to work through this volume because it provides a useful yardstick against which to measure the IBM Watson marketing and public relations explanations against the reality, limitations, and complexity of the IBM Content Analytics system. Is the Redbook describing a product or a collection of components that an IBM implementation team will use to craft a customized solution?
The chapter on Enterprise Search begins on page 445 and continues to page 486. The solution is a two part affair. On one hand, processed content will output data about the entities, word frequencies, and similar metrics in the corpus and updates to the corpus. On the other hand, ICAwES is a search and retrieval system. Many vendors take this approach today; however, certain types of content cannot be comprehensively processed by the system. Examples range from video content, engineering drawings, digital imagery, and certain types of ephemeral content such as text messages sent via an ad hoc Bluetooth mesh network. One can code up a fix, but that is likely to be more hassle than many licensees will tolerate.
The Redbook shows some ready-to-use interfaces. These can, of course, be modified. The sample in the screenshot below looks quite a bit like the original Fulcrum Technologies’ presentation of information processed by the system. A more modern implementation would be Amazon’s recent JSON centric system for content.
ICAwES Redbook, Copyright IBM 2014.
The illustration shows a record viewed by tags; for example categories. Items can be tallied in a chart that provides a summary of how many content objects share a particular index terms. The illustration shows the ICAwES identifying terms in a user’s query, identifying entities like IBM Lotus Domino, and other features associated with Autonomy IDOL or Endeca style systems. Both of these date from the late 1990s, so IBM is not pushing too far from the dirt path carved out of the findability woods by former leaders in enterprise search.
IBM provides information needed to implement query expansion. Yes, a dictionary lurks within the system, and an interface is provided so the licensee can be like Noah Webster. The system is rules based, and a specialist is needed to create or edit rules. As you may know, rules based systems suffer from several drawbacks. Rules have to be maintained, subject matter experts or programmers are usually required to make the proper judgments, and rules can drift out of phase with the users’ queries unless the system is monitored with above average rigor. Like Autonomy IDOL, skimp on monitoring and tuning, and the system can generate some interesting results.
The provided user interface looks like this:
ICAwES Redbook, Copyright IBM 2014.
With many users wanting a “big red button” to simplify information access, this interface brings forward the high density displays associated with TeraText and similar legacy systems. The density seems to include hints of Attivio and BA Insight user interfaces as well. There are many choices available to the user. However, without special training, it is unlikely that a marketing professional using ICAwES will be able to make full use of of query trees, category trees, and the numerous icons that appear in four different locations. I can hear the user now, “I want this system to be just like Google? I want to type in a three words and scan the results.”
Net net. If you are working in an organization that favors IBM solutions, this system is likely to be what senior management licenses. Keep in mind that ICAwES will require the ministrations of IBM professional services, probably additional headcount, and on-going work to keep the system delivering useful results to users and decision makers.
The system delivers key word search, rich indexing, and basic metrics about the content. IBM offers more robust analytic tools in its SPSS product line. For more comprehensive text analysis, take a look at IBM i2 and Cybertap solutions if your organization has appropriate credentials for these somewhat more sophisticated information access and analysis systems.
After working through the Redbook, I had one question, “Where’s Watson?”
Stephen E Arnold, October 12, 2014