Sentiment Analysis: The Progeny of Big Data?

June 9, 2015

I read “Text Analytics: The Next Generation of Big Data.” The article provides a straightforward explanation of Big Data, embraces unstructured information like blog posts in various languages, email, and similar types of content, and then leaps to the notion of text analytics. The conclusion to the article is that we are experiencing “The Coming of Age of Text Analytics—The Next Generation of Big Data.”

The idea is good news for the vendors of text analytics aimed squarely at commercial enterprises, advertisers, and marketers. I am not sure the future will match up to the needs of the folks at the law enforcement and intelligence conference I had just left.

There are three reasons:

First, text analytics are not new, and the various systems and methods have been in use for decades. One notable example is BAE Systems use of its home brew tools and Autonomy’s technology in the 1990s and i2 (pre IBM) and its efforts even earlier.

Second, the challenges of figuring out what structured and unstructured data mean require more than determining if a statement is positive or negative. Text analytics is, based on my experience, blind to such useful data as real time geospatial inputs and video streamed from mobile devices and surveillance devices. Text analytics, like key word search, makes a contribution, but it is in a supporting role, not the Beyoncé of content processing.

Third, the future points to the use of technologies like predictive analytics. Text analytics are components in these more robust systems whose outputs are designed to provide probability-based outputs from a range of input sources.

There was considerable consternation a year or so ago. I spoke with a team involved with text analytics at a major telecommunications company. The grousing was that the outputs of the system did not make sense and it was difficult for those reviewing the outputs to figure out what the data meant.

At the LE/intel conference, the focus was on systems which provide actionable information in real time. My point is that vendors have a tendency to see the solutions in terms of what is often a limited or supporting technology.

Sentiment analysis is a good example. Blog posts invoking readers to join ISIS are to some positive and negative. The point is that the point of view of the reader determines whether a message is positive or negative.

The only way to move beyond this type of superficial and often misleading analysis is to deal with context, audio, video, intercept data, geolocation data, and other types of content. Text analytics is one component in a larger system, not the solution to the types of problems explored at the LE/intel conference in early June 2015. Marketing often clouds reality. In some businesses, no one knows that the outputs are not helpful. In other endeavors, the outputs have far higher import. Knowing that a recruiting video with a moving nasheed underscoring the good guys dispatching the bad guys is off kilter. Is it important to know that the video is happy or sad? In fact, it is silly to approach the content in this manner.

Stephen E Arnold, June 9, 2014

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta