Sentiment Analysis: Bubbling Up as the Economy Tanks
January 20, 2008
Sentiment analysis is a sub-discipline of text mining. Text mining, as most of you know, refers to processing unstructured information and text blocks in a database to wheedle useful information from sentences, paragraphs, and entire documents. Text mining looks for entities, linguistic clues, and statistically significant high points.
The processing approach varies from vendor to vendor. Some vendors use statistics; others semantic techniques. More and more, mix and match procedures to get the best of each approach. The idea is that software “reads” or “understands” text. None of the more than 100 vendors offering text mining systems and utilities does as well as a human, but the systems are improving. When properly configured, some systems out perform a human indexer. (Most people think humans are the best indexers, but for some applications, software can do a better job.) Humans are needed to resolve “exceptions” when automated systems stumble. But unlike the human indexer who often memorizes a number of terms and uses these sometimes without seeking a more appropriate term from the controlled vocabulary. Also, human indexers can get tired, and fatigue affects indexing performance. Software indexing is the only way to deal with the large volumes of information in digital form today.
Sentiment analysis “reads” and “understands” text in order to find out if the document is positive or negative. About eight years ago, my team did a sentiment analysis for a major investment fund’s start up. The start up’s engineers were heads down on another technical matter, and the sentiment analysis job came to ArnoldIT.com.
We took some short cuts because time was limited. After looking at various open source tools and the code snippets in ArnoldIT’s repository, we generated a list of words and phrases that were generally positive and generally negative. We had several collections of text, mostly from customer support projects. We used these and applied some ArnoldIT “magic”. We were able to process unstructured information and assign a positive or negative score to documents based on our ArnoldIT “magic” and the dictionary. We assigned a red icon for results that our system identified as negative. Without much originality, we used a green icon to flag positive comments. The investment bank moved on, and I don’t know what the fate of our early sentiment analysis system was. I do recall that it was useful in pinpointing negative emails about products and services.
A number of companies offer sentiment analysis as a text mining function. Vendors include, Autonomy, Corpora Software, and Fast Search & Transfer, among others. A number of companies offer sentiment analysis as a hosted service with the work more sharply focused on marketing and brands. Buzzmetrics (a unit of AC Nielsen), Summize, and Andiamo Systems compete in the consumer segment. ClearForest, before it was subsumed into Reuters (which was then bought by the Thomson Corporation) had tools that performed a range of sentiment functions.
The news that triggered my thinking about sentiment was statistics and business intelligence giant SPSS’s announcement that it had enhanced the sentiment analysis functions of its Clementine content processing system. According to ITWire, Clementine has added “automated modeiing to identify the best analytic models, as well as combining multiple predictions for the most accurate results. You can read more about SPSS’s Clementine technology here. SPSS acquired LexiQuest, an early player in rich content processing, in 2002. SPSS has integrated its own text mining technology with the LexiQuest technology. SAS followed suit but licensed Inxight Software technology and combined that with SAS’s home-grown content processing tools.
There’s growing interest in analyzing call center, customer support, and Web log content for sentiment about people, places, and things. I will be watching for more announcements from other vendors. In the behind-the-firewall search and content processing sectors, there’s a strong tendency to do “me too” announcements. The challenge is to figure out which system does what. Figuring out the differences (often very modest) between and among different vendors’ solutions is a tough job.
Will 2008 be the year for sentiment analysis? We’ll know in a few months if SPSS competitors jump on this band wagon.
Stephen E. Arnold, January 20, 2008.
Comments
9 Responses to “Sentiment Analysis: Bubbling Up as the Economy Tanks”
[…] Original post by Stephen E. Arnold […]
[…] Original post by Beyond Search Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages. […]
SPSS can already provide out-of-the-box sentiment analysis capabilities in 5 languages!
Great article Stephen and thanks for the mention of Andiamo Systems. I’m the Founder & CEO. Semantic analysis is the most challenging part of our business, and I believe that for the forseeable future, even with great strides in technology automation, that a layer of human analysis is required to confirm the automated ratings especially where those ratings fall outside of an accepted confidence level.
That’s the approach we are taking. Nathan Gilliat would call this ‘human-assisted software analysis’.
[…] contributions on this subject can be found from bloggers Stephen E. Arnold, Nathan Gilliatt, and Seth Grimes. Share and Enjoy: These icons link to social bookmarking sites […]
Well, ther is also few open source products, with a bit lower quality maybe, but
on the other side with all that open source beauty – Open Source
http://alias-i.com/lingpipe/, htttp://www.jane16.com both quite good to be used in production systems. Anyway good point ,to birng up these subject to the light of the world little more
Open Source Software for Automated Sentiment Analysis: RapidMiner with its text mining and web mining plugin:
http://www.rapid-i.com/
generic sentiment analysis technology for all languages and all domains (product reviews, discussion forums, web blogs, political sentiments, financial sentiments, economic sentiments, etc.).
Frank Xavier,
Good catch.
Stephen Arnold, September 4, 2008
Take a look at Twitter sentiment analysis tool http://smm.streamcrab.com, its written in python and uses Naive Bayes classifier with semi-supervised machine learning