Suicide Sentiment Analysis

November 21, 2014

Short honk: The notion of figuring out something about the emotional payload of a message is interesting. If you are following developments in sentiment analysis, you may find “Emotion Detection in Suicide Notes Using Maximum Entropy Classification” interesting. Now what might be done to pipe the output of this analysis into a predictive analytics engine with access to deep user data?

Stephen E Arnold, November 21, 2014

Attensity Ups Its Presence in Hackathons

October 28, 2014

I found the Attensity blog post “Attensity Takes Utah Tech Week” quite interesting. I cannot recall when mainstream content processing companies embraced hackathons so fiercely.

The blog post explains:

A hackathon, for the uninitiated, is exactly what it sounds like: a hybrid of computer hacking and a marathon in a grueling, caffeine-fueled, 12-hour time period. Groups comprised of mostly engineers and IT whizzes compete against the clock and other teams to create a project to present at the of the day to a panel of judges.

What did Attensity’s engineers build to showcase the company’s sentiment analysis and analytics technologies? Here’s the Attensity description:

With the Twitter API up and running, Team Attensity used Raspberry Pi to process tweets using #obama and #utahtechweek. Simultaneously, the team used Arduino to code sentiments from the tweets using a red light for negative sentiments, blue for positive sentiments, and yellow for neutral sentiments.

Attensity was pleased with the outcome in Utah. More hackathons are in the firm’s future. I wonder if one can deploy IBM Watson using a Raspberry Pi or showcase HP Autonomy with an Arduino.

How will hackathons generate revenue? I am not sure. The effort seems like a cost hole to me.

Stephen E Arnold, October 28, 2014

On the Value of Customized Sentiment Analysis

August 26, 2014

Natural language processing—one of its most-discussed functions in business is sentiment analysis. Over at the SmartData Collective, Lexalytics’ Scott Van Boeyen tells us “Why Sentiment Analysis Engines Need Customization.” The short answer: slang. The write-up explains:

The problem with sentiment analysis is sometimes it’s wrong.[…]

“Oh man, that was nasty!” Is this sentence positive or negative? Surely, it must be negative. “Nasty” is a negative word, and everything else in this sentence is neutral. Final answer, negative! Drum roll…. Wrong! It’s positive.

The person who said this used the American slang definition of nasty, which has positive sentiment. There is absolutely no way to know by reading the sentence. So, if you (a human) were just tricked by reading this article, how is a machine supposed to figure it out? Answer: Tell the engine what’s positive and what’s negative.

High quality NLP engines will let you customize your sentiment analysis settings. “Nasty” is negative by default. If you’re processing slang where “nasty” is considered a positive term, you would access your engine’s sentiment customization function, and assign a positive score to the word.

The man has a point. Still, we are left with a few questions: How much more should one expect to pay for a customization feature? Also, how long does it take to teach an NLP platform comprehensive alternate vocabulary? How does one decide what slang to include—has anyone developed a list of suggestions? Perhaps one could start by consulting the Urban Dictionary.

Cynthia Murrell, August 26, 2014

Sponsored by, developer of Augmentext

Attensity Leverages Biz360 Invention

August 4, 2014

In 2010, Attensity purchased Biz360. The Beyond Search comment on this deal is at One of the goslings reminded me that I had not instructed a writer to tackle Attensity’s July 2014 announcement “Attensity Adds to Patent Portfolio for Unstructured Data Analysis Technology.” PR-type “stories” can disappear, but for now you can find a description of “Attensity Adds to Patent Portfolio for Unstructured Data Analysis Technology” at

My researcher showed me a hard copy of 8,645,395, and I scanned the abstract and claims. The abstract, like many search and content processing inventions, seemed somewhat similar to other text parsing systems and methods. The invention was filed in April 2008, two years before Attensity purchased Biz360, a social media monitoring company. Attensity, as you may know, is a text analysis company founded by Dr. David Bean. Dr. Bean employed various “deep” analytic processes to figure out the meaning of words, phrases, and documents. My limited understanding of Attensity’s methods suggested to me that Attensity’s Bean-centric technology could process text to achieve a similar result. I had a phone call from AT&T regarding the utility of certain Attensity outputs. I assume that the Bean methods required some reinforcement to keep pace with customers’ expectations about Attensity’s Bean-centric system. Neither the goslings nor I are patent attorneys. So after you download 395, seek out a patent attorney and get him/her to explain its mysteries to you.

The abstract states:

A system for evaluating a review having unstructured text comprises a segment splitter for separating at least a portion of the unstructured text into one or more segments, each segment comprising one or more words; a segment parser coupled to the segment splitter for assigning one or more lexical categories to one or more of the one or more words of each segment; an information extractor coupled to the segment parser for identifying a feature word and an opinion word contained in the one or more segments; and a sentiment rating engine coupled to the information extractor for calculating an opinion score based upon an opinion grouping, the opinion grouping including at least the feature word and the opinion word identified by the information extractor.

This invention tackles the Mean Joe Green of content processing from the point of view of a quite specific type of content: A review. Amazon has quite a few reviews, but the notion of an “shaped” review is a thorny one. See, for example, The invention’s approach identifies words with different roles; some words are “opinion words” and others are “feature words.” By hooking a “sentiment engine” to this indexing operation, the Biz360 invention can generate an “opinion score.” The system uses item, language, training model, feature, opinion, and rating modifier databases. These, I assume, are either maintained by subject matter experts (expensive), smart software working automatically (often evidencing “drift” so results may not be on point), or a hybrid approach (humans cost money).


The Attensity/Biz360 system relies on a number of knowledge bases. How are these updated? What is the latency between identifying new content and updating the knowledge bases to make the new content available to the user or a software process generating an alert or another type of report?

The 20 claims embrace the components working as a well oiled content analyzer. The claim I noted is that the system’s opinion score uses a positive and negative range. I worked on a sentiment system that made use of a stop light metaphor: red for negative sentiment and green for positive sentiment. When our system could not figure out whether the text was positive or negative we used a yellow light.


The approach used for a US government project a decade ago, used a very simple metaphor to communicate a situation without scores, values, and scales. Image source:

Attensity said, according the news story cited above:

By splitting the unstructured text into one or more segments, lexical categories can be created and a sentiment-rating engine coupled to the information can now evaluate the opinions for products, services and entities.

Okay, but I think that the splitting of text into segment was a function of iPhrase and search vendors converting unstructured text into XML and then indexing the outputs.

Attensity’s Jonathan Schwartz, General Counsel at Attensity is quoted in the news story as asserting:

“The issuance of this patent further validates the years of research and affirms our innovative leadership. We expect additional patent issuances, which will further strengthen our broad IP portfolio.”

Okay, this sounds good but the invention took place prior to Attensity’s owning Biz360. Attensity, therefore, purchased the invention of folks who did not work at Attensity in the period prior to the filing in 2008. I understand that company’s buy other companies to get technology and people. I find it interesting that Attensity’s work “validates” Attensity’s research and “affirms” Attensity’s “innovative leadership.”

I would word what the patent delivers and Attensity’s contributions differently. I am no legal eagle or sentiment expert. I do like less marketing razzle dazzle, but I am in the minority on this point.

Net net: Attensity is an interesting company. Will it be able to deliver products that make the licensees’ sentiment score move in a direction that leads to sustaining revenue and generous profits. With the $90 million in funding the company received in 2014, the 14-year-old company will have some work to do to deliver a healthy return to its stakeholders. Expert System, Lexalytics, and others are racing down the same quarter mile drag strip. Which firm will be the winner? Which will blow an engine?

Stephen E Arnold, August 4, 2014

From Search to Sentiment

July 28, 2014

Attivio has placed itself in the news again, this time for scoring a new patent. Virtual-Strategy Magazine declares, “Attivio Awarded Breakthrough Patent for Big Data Sentiment Analysis.” I’m not sure “breakthrough” is completely accurate, but that’s the language of press releases for you. Still, any advance can provide an advantage. The write-up explains that the company:

“… announced it was awarded U.S. Patent No. 8725494 for entity-level sentiment analysis. The patent addresses the market’s need to more accurately analyze, assign and understand customer sentiment within unstructured content where multiple brands and people are referenced and discussed. Most sentiment analysis today is conducted on a broad level to determine, for example, if a review is positive, negative or neutral. The entire entry or document is assigned sentiment uniformly, regardless of whether the feedback contains multiple comments that express a combination of brand and product sentiment.”

I can see how picking up on nuances can lead to a more accurate measurement of market sentiment, though it does seem more like an incremental step than a leap forward. Still, the patent is evidence of Attivio’s continued ascent. Founded in 2007 and headquartered in Massachusetts, Attivio maintains offices around the world. The company’s award-winning Active Intelligence Engine integrates structured and unstructured data, facilitating the translation of that data into useful business insights.

Cynthia Murrell, July 28, 2014

Sponsored by, developer of Augmentext

Search, Not Just Sentiment Analysis, Needs Customization

July 11, 2014

One of the most widespread misperceptions in enterprise search and content processing is “install and search.” Anyone who has tried to get a desktop search system like X1 or dtSearch to do what the user wants with his or her files and network shares knows that fiddling is part of the desktop search game. Even a basic system like Sow Soft’s Effective File Search requires configuring the targets to query for every search in multi-drive systems. The work arounds are not for the casual user. Just try making a Google Search Appliance walk, talk, and roll over without the ministrations of an expert like Adhere Solutions. Don’t take my word for it. Get your hands dirty with information processing’s moving parts.

Does it not make sense that a search system destined for serving a Fortune 1000 company requires some additional effort? How much more time and money will an enterprise class information retrieval and content processing system require than a desktop system or a plug-and-play appliance?

How much effort is required to these tasks? There is work to get the access controls working as the ever alert security manager expects. Then there is the work needed to get the system to access, normalize, and process content for the basic index. Then there is work for getting the system to recognize, acquire, index, and allow a user to access the old, new, and changed content. Then one has to figure out what to tell management about rich media, content for which additional connectors are required, the method for locating versions of PowerPoints, Excels, and Word files. Then one has to deal with latencies, flawed indexes, and dependencies among the various subsystems that a search and content processing system includes. There are other tasks as well like interfaces, work flow for alerts, yadda yadda. You get the idea of the almost unending stream of dependent, serial “thens.”

When I read “Why Sentiment Analysis Engines need Customization”, I felt sad for licensees fooled by marketers of search and content processing systems. Yep, sad as in sorrow.

Is it not obvious that enterprise search and content processing is primarily about customization?

Many of the so called experts, advisors, and vendors illustrate these common search blind spots:

ITEM: Consulting firms that sell my information under another person’s name assuring that clients are likely to get a wild and wooly view of reality. Example: Check out IDC’s $3,500 version of information based on my team’s work. Here’s the link for those who find that big outfits help themselves to expertise and then identify a person with a fascinating employment and educational history as the AUTHOR.



In this example from, notice that my work is priced at seven times that of a former IDC professional. Presumably Mr. Schubmehl recognized that my value was greater than that of an IDC sole author and priced my work accordingly. Fascinating because I do not have a signed agreement giving IDC, Mr. Schubmehl, or IDC’s parent company the right to sell my work on Amazon.

This screen shot makes it clear that my work is identified as that of a former IDC professional, a fellow from upstate New York, an MLS on my team, and a Ph.D. on my team.



I assume that IDC’s expertise embraces the level of expertise evident in the TechRadar article. Should I trust a company that sells my content without a formal contract? Oh, maybe I should ask this question, “Should you trust a high  profile consulting firm that vends another person’s work as its own?” Keep that $3,500 price in mind, please.

ITEM: The TechRadar article is written by a vendor of sentiment analysis software. His employer is Lexalytics / Semantria (once a unit of Infonics). He writes:

High quality NLP engines will let you customize your sentiment analysis settings. “Nasty” is negative by default. If you’re processing slang where “nasty” is considered a positive term, you would access your engine’s sentiment customization function, and assign a positive score to the word. The better NLP engines out there will make this entire process a piece of cake. Without this kind of customization, the machine could very well be useless in your work. When you choose a sentiment analysis engine, make sure it allows for customization. Otherwise, you’ll be stuck with a machine that interprets everything literally, and you’ll never get accurate results.

When a vendor describes “natural language processing” with the phrase “high quality” I laugh. NLP is a work in progress. But the stunning statement in this quoted passage is:

Otherwise, you’ll be stuck with a machine that interprets everything literally, and you’ll never get accurate results.

Amazing, a vendor wrote this sentence. Unless a licensee of a “high quality” NLP system invests in customizing, the system will “never get accurate results.” I quite like that categorical never.

ITEM: Sentiment analysis is a single, usually complex component of a search or content processing system. A person on the LinkedIn enterprise search group asked the few hundred “experts” in the discussion group for examples of successful enterprise search systems. If you are a member in good standing of LinkedIn, you can view the original query at this link. [If the link won’t work, talk to LinkedIn. I have no idea how to make references to my content on the system work consistently over time.] I pointed out that enterprise search success stories are harder to find than reports of failures. Whether the flop is at the scale of the HP/Autonomy acquisition or a more modest termination like Overstock’s dumping of a big name system, the “customizing” issues is often present. Enterprise search and content processing is usually:

  • A box of puzzle pieces that requires time, expertise, and money to assemble in a way that attracts and satisfies users and the CFO
  • A work in progress to make work so users are happy and in a manner that does not force another search procurement cycle, the firing of the person responsible for the search and content processing system, and the legal fees related to the invoices submitted by the vendor whose system does not work. (Slow or no payment of licensee and consulting fees to a search vendor can be fatal to the search firm’s health.)
  • A source of friction among those contending for infrastructure resources. What I am driving at is that a misconfigured search system makes some computing work S-L-O_W. Note: the performance issue must be addressed for appliance-based, cloud, or on premises enterprise search.
  • Money. Don’t forget money, please. Remember the CFO’s birthday. Take her to lunch. Be really nice. The cost overruns that plague enterprise search and content processing deployments and operations will need all the goodwill you can generate.

If sentiment analysis requires customizing and money, take out your pencil and estimate how much it will cost to make NLP and sentiment to work. Now do the same calculation for relevancy tuning, index tuning, optimizing indexing and query processing, etc.

The point is that folks who get a basic key word search and retrieval system work pile on the features and functions. Vendors whip up some wrapper code that makes it possible to do a demo of customer support search, eCommerce search, voice search, and predictive search. Once the licensee inks the deal, the fun begins. The reason one major Norwegian search vendor crashed and burned is that licensees balked at paying bills for a next generation system that was not what the PowerPoint slides described. Why has IBM embraced open source search? Is one reason to trim the cost of keeping the basic plumbing working reasonably well? Why are search vendors embracing every buzzword that comes along? I think that search and an enterprise function has become a very difficult thing to sell, make work,  and turn into an evergreen revenue stream.

The TechRadar article underscores the danger for licensees of over hyped systems. The consultants often surf on the expertise of others. The vendors dance around the costs and complexities of their systems. The buzzwords obfuscate.

What makes this article by the Lexalytics’ professional almost as painful as IDC’s unauthorized sale of my search content is this statement:

You’ll be stuck with a machine that interprets everything literally, and you’ll never get accurate results.

I agree with this statement.

Stephen E Arnold, July 11, 2014

Sentiment Analysis: A Breakthrough

May 12, 2014

Short honk. I have some questions about the efficacy of search vendors who pitch sentiment analysis. The jargon blizzard obscures some of the methods. I talk about some of these hyperboles in my video about search jargon. The article “Turning the Frown Upside Down: Kraft’s Jell-O Plans Twitter Mood Monitor” explains one of the secrets of the sentiment analysis wizards. Big Data? Nah, counting smiley faces. What Dark Arts do other sentiment analysis mavens conjure?

Stephen E Arnold, May 12, 2014



Attensity Analyze 6.3: Signs of Life Evident

March 11, 2014

Attensity has been a quiet sentiment, analytics, text processing vendor for some months. The company has now released a new version of its flagship product, Analyze, now at version 6.3. The headline feature is “enhanced analytics.”

According to a company news release, Attensity is “the leading provider of integrated, real-time solutions that blend multi-channel Voice of the Customer analytics and social engagement for enterprise listening needs.” Okay.

The new version of Analyze delivers to licensees real time information about what is trending. The system provides “multi dimensional visualization that immediately identifies performance outliers in the business that can impact6 the brand both positively and negatively.” Okay.

The system processes over 150 million blogs and forums, Facebook, and Twitter. Okay.

As memorable as these features are, here’s the passage that I noted:

Attensity 6.3 is powered by the Attensity Semantic Annotation Server (ASAS) and patented natural language processing (NLP) technology. Attensity’s unique ASAS platform provides unmatched deep sentiment analysis, entity identification, statistical assignment and exhaustive extraction, enabling organizations to define relationships between people, places and things without using pre-defined keywords or queries. It’s this proprietary technology that allows Attensity to make the unknown known.

“To make the unknown known” is a bold assertion. Okay.

I have heard that sentiment analysis companies are running into some friction. The expectations of some licensees have been a bit high. Perhaps Analyze 6.3 will suck up customers of other systems who are dissatisfied with their sentiment, semantic, analytics systems. Making the “unknown known” should cause the world to beat a path to Attensity’s door. Okay.

Stephen E Arnold, March 11, 2014

Incorporating Twitter Sentiment Analysis into Trading Software

February 21, 2014

Thomson Reuters has added Twitter sentiment analysis to its Eikon subscription trading platform. Sorting tweets into positive and negative messages based on proprietary language-processing technology, the feature meets the demands of a growing number of traders.


According to Matthew Finnegan’s story “Thomson Reuters Adds Twitter Sentiment Analysis to Eikon Trading Terminal” for Computerworld UK, the analytics tool will show users the volume of both positive and negative messaging relating to specific companies on an hourly basis. Thomson Reuters’ Chief Technology Officer Philip Brittan stressed that the information will be used primarily for research, not a basis for trading decisions.


Since there have been instances of fake Tweets influencing markets, the caution is probably justified. But the power of social media’s unstructured data cannot be denied, and Eikon is attempting to harness it for subscribers:


“…the Eikon sentiment analysis aims to also make it easier for humans to quickly make sense of masses of social media information currently available, with tens of thousands of tweets about major companies each day.”


It’s one more way we see social media emerging as the dominant media force of the 21st century.


Laura Abrahamsen, February 21, 2014


Sponsored by, developer of Augmentext

Guide to Sentiment Analysis Application

January 17, 2014

The article on Lexalytics Blog titled Tagging, Taxonomies, Categorization with Salience provides a guide to using salience to get the most out of data. The first step, Discovery, involves features like Themes which extracts proper noun phrases to give a summary of what the content contains. Step 2 uses Concept Topics which uses ontology built from Wikipedia’s semantic knowledge to relate one word to another.

The article explains how this works:

“Salience will use the relationship between the category samples to tag your data. So every time the word “lion” pops up in your data, that entry will be categorized as “cats”. Every time the word “cheetah” appears, salience will know that this animal belongs to the cat family, and will tag the document as “cats”. This method of categorization is awesome because you do not need to list every single member of the cat family to create this category.”

Step 3 is another way of classifying data; it is creating a query topic. You input all words associated with your topic after consulting Wikipedia and a thesaurus, then limit the search with more information, and you also include how closely one word must be to another for it to be relevant.

Chelsea Kerwin, January 17, 2014

Sponsored by, developer of Augmentext

Next Page »