Quick and Dirty Sentiment Analysis
September 14, 2010
I thought “Most Common Words Unique to 1 Star and 5 Star App Store Reviews” provides some insight into how certain sentiment analysis systems work. The article said:
I wrote a script to crawl U.S. App Store customer reviews for the top 100 apps from every category (minus duplicates) and compute the most common words in 1-star and 5-star reviews, excluding words that were also common in 3-star reviews.
Frequency count against a “field”. Here are the results for positive apps:
awesome, worth, thanks, amazing, simple, perfect, price, everything, ever, must, iPod, before, found, store, never, recommend, done, take, always, touch
How do you know a loser?
waste, money, crashes, tried, useless, nothing, paid, open, deleted, downloaded, didn’t, says, stupid, anything, actually, account, bought, apple, already
“Sentiment” can be disceerned by looking for certain words and keeping count. So much for rocket science of “understanding unstructured text.”
Stephen E Arnold, September 14, 2010
Freebie
Comments
5 Responses to “Quick and Dirty Sentiment Analysis”
I wonder how it works with sarcasm and cultural slang, where “sick” may mean “good”.
And the word “useless” is interesting: as Marco says, when users are scathing in their desire for a new feature, they may be overall positive about the product.
It would be interesting to compare using these key words vs. more sophisticated sentiment analysis. I suspect the simple solutions may be fairly accurate 80% of the time, but that last 20% would be much harder.
Stephen,
In my opinion, sentiment analysis is never really “rocket science”, yet it is a bit more sophisticated that simply counting words.
Good sentiment analysis systems will analyze sentences much more deeply in order to “understand” it the right way. For instance, if the sentence contains “the service was not outstanding”, obviously, you cannot rely only on the word “outstanding” as a positive cue to detect the sentiment properly.
This is a very simple case, but if the sentence is: “I used to say that their service is outstanding, but that was until a couple months ago”, the problem is more challenging.
I can assure you that sophisticated systems are trying to analyze accurately even the most complex sentences like the last one. Only the customer reviews with an obvious and distinct positive or negative tone could be guessed properly with word counts.
To answer Avi, many people don’t get sarcasm, so I would not expect an automatic algorithm to be very accurate with irony and sarcasm. Training the system to capture the language subtleties of cultural groups or the problem domain can help a lot to boost the accuracy, it’s actually almost a must.
The folks at Google are using exactly this kind of counting. Only they’re doing it on a much larger scale and for many more domains using slightly more sophisticated models than counting. But nothing so intensive as parsing whole sentences.
Check out Ryan McDonald’s paper on inferring a sentiment lexicon of phrases (not just words):
The Viability of Web-derived Polarity Lexicons
L. Velikovich, S. Blair-Goldensohn, K. Hannan and R. McDonald
North American Association for Computational Linguistics (NAACL), 2010.
http://www.ryanmcd.com/papers/web_polarity_lexiconsNAACL2010.pdf
And how they’re tackling problems like negation:
Learning to Classify the Scope of Negation for Improved Sentiment Analysis
I. Councill, R. McDonald and L. Velikovich
Negation and Speculation in Natural Language Processing (NeSp-NLP), 2010
http://www.ryanmcd.com/papers/NeSp-NLP10.pdf
They’re doing the same kind of things for other applications, like filling in categorical lists (like lists of doctors, lawyers and baseball players).
As far as I know, no one is trying to apply complex parsing to extract sentiment on a web scale. The problem is that the value added by these bleeding edge techniques is negligible at current levels of accuracy compared to the computational costs.
Bob Carpenter,
Thanks for taking the time to comment. Interested in doing a Search Wizards Speak interview? It’s a freebie I offer some folks.
Stephen E Arnold, September 15, 2010
[…] you read blogger ArnoldIT’s post last month on Marco Arment’s (former lead developer at Tumblr) blog about gauging the […]