Text Analytics Summit Summary Sparks UIMA Thoughts

June 22, 2008

Seth Grimes posted a useful series of links about the Text Analytics Summit, held in Boston the week of June 16, 2008. You can read his take on the conference here. I was not at the conference. I was on the other side of the country at the Gilbane shin dig. To make up for my non attendance, I have been reading about the summit.

From what I can deduce from the Web log posts, the conference attracted the Babe Ruths and Ty Cobbs of text analysis, a market that nestles between enterprise search and business intelligence. I am not too certain about the boundaries of either of these markets, but text analytics is polymorphic and can appear searchy or business intelligency depending upon the context.

I clicked through the links Mr. Grimes provides, and I recommend that you spend a few finites with each of the presentations. I learned a great deal. Please, review his short essay.

One point stuck in my mind. The purpose of this essay is to call your attention to this comment and offer several observations about its implications for those who want to move beyond key word retrieval. Keep in mind that I am offering my opinion.

Here’s the comment. Mr. Grimes writes:

I’ll conclude with one disappointing surprise on the technical front, that UIMA — the Unstructured Information Management Architecture, an integration framework created by IBM and released several years ago as open source to the Apache — has not been more broadly accepted. IBM software architect Thomas Hampp spoke about his company’s use of the framework in the OmniFind Analytics edition, but Technology Panel participants said that their companies — Attensity (David Bean), Business Objects (Claire Thomas), Clarabridge (Justin Langseth), Jodange (Larry Levy), and SPSS (Olivier Jouve) — simply do not perceive user demand for the interoperability that UIMA can offer.

My understanding of this statement and the supporting evidence in the form of high profile industry executives is that an open standard developed by IBM has little, if any, market traction. In short, if the UIMA standard were gasoline, your automobile would not run or just sputter along.

Let us assume that this lack of UIMA demand is accurate. Now I know this is a big assumption, and I am confident that an IBM wizard will tell me that I am wrong. Nevertheless, I want to follow this assumption in the next part of the essay.

Possible Causes

[Please, keep in mind that I am offering my opinion in a free Web log. If you have not read the editorial policy for this Web log, click on the About link on any page of Beyond Search. Some readers forget that I am using this Web log as a journal and a container for the information that does not appear in my for fee reports and my paid writings such as my monthly column in KMWorld. Some folks are reading my musings and ignoring or forgetting what I am trying to capture for myself in these posts. Check out the disclaimer here.]

What might be causing the lack of interest in UIMA, which as you know is an open source framework to allow different software gizmos to talk to one another? For a more precise definition UIMA, you can give the IBM search engine a whirl or click this Wikipedia link, http://en.wikipedia.org/wiki/UIMA.

Here is my short list of the causes for the UIMA excitement void. I am not annoyed with IBM. I own IBM servers, but I want to pick up Mr. Grimes’ s statement and perform a thought experiment. If this type of writing troubles you, please, click away from Beyond Search. Also, I am reacting to a comment about IBM, but I want to use IBM as an example of any large company’s standards or open source initiative.

First, IBM is IBM. IBM has an obligation to its shareholders to deliver growth. Therefore, IBM’s promulgating a standard is in some way large or small a way to sell IBM products and services. Maybe potential UIMA users are not interested in the potential upsell that may follow.

Second, open source and standards have proven to be incredibly useful. Maybe IBM nees to put more effort into educating partners, vendors, and customers about UIMA? Maybe IBM has invested in UIMA and found that marketing did not produce the expected results, so IBM has moved on.

Third, maybe today IBM lacks clout in the search and content processing sector. In 1960, IBM could dictate what was hot and what was not. UIMA’s underwhelming penetration might be evidence that the IBM of today lacks the moxie the company enjoyed almost a half century ago.

And one fourth possibility is that no one really wants to embrace UIMA. Enterprise software is not a level playing field. The vendor wants to own the customer, locking out any other vendor who might suck dollars from the company owning a customer. IBM and other enterprise vendors want to build walls, not create open doors.

I have several other thoughts on my list, but these four provide insight into my preliminary thinking.

Observations

Now let’s consider the implications of these four points, assuming, of course, that I am correct.

Big companies and standards do not blend as well as a peanut butter and jelly sandwich. The two ingredients may not yet be fully in harmony. Big companies want money and open standards do not have the revenue to risk ratio that makes financial officers comfortable.
Open source is hard to control. Vendors and buyers want control. Vendors want to control the technology. Buyers want to control risk. Open source may reduce the vendor’s control over a system and buyers lose control over the risk a particular open source system introduces into an enterprise.
Open source appeals to those willing to break with traditional information technology behavior. IBM, despite its sporty standards garb, is a traditional vendor selfing traditional solutions. Open source is making headway, but it is most successful when youthful blood flows through the enterprise. Maybe UIMA needs more time for the old cows to leave the stock pen?

What is your view? Is your organization ready to embrace UIMA, big company standards, and open source? Agree? Disagree? Let me know.

Stephen Arnold, June 22, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Search, Semantic, Text processing

Comments

One Response to “Text Analytics Summit Summary Sparks UIMA Thoughts”

David Ferrucci on July 11th, 2008 9:43 am

UIMA is about interoperability of text (and multi-modal, e.g., speech and video) analytics. It’s not clear to me at least, that the majority or even the typical text analysis vendor is focused on getting their analytics to interoperate with other vendor’s analytics. Where UIMA adds value is in projects or solutions that require the scaleable and robust integration and deployment of analytics coming from different producers.

The underlying assumption of UIMA’s value proposition is that to accelerate the application of unstructured information in the market place, solutions will require this sort of integration because the high-value, highly-specialized analytics needed to solve different parts of a broader problem, will likely be independently produced by folks who do not specialize in frameworks, standards or infrastructure.

Integrated applications sold by text analysis vendors are often differentiated by their own analytics built on their own internal interfaces or platforms. I think an open-question with regard to the broader adoption of UIMA, is whether or not the industry will mature to a point where analytics become highly sophisticated and increasing specialized such that the analytic component alone brings enough value that analytic producers do not want/need to spend their time on infrastructure, standards, deployment, frameworks, interfaces and tooling.

We have already seen this phenomenon in the academic, research and government arenas. If analytic produces provide enough independent value and something like UIMA is available, they will just simply want to “plug-in” and not invest in the whole application. If this spreads to the broader market, then integration and customization to build aggregate solutions we become an important revenue generator that competes aggressively with niche applications. Besides being a potential platform for any text analysis application builder, UIMA has and is well positioned to provide a very significant value proposition to the integrator of text and multi-modal analysis solutions.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.