From Search to Sentiment
July 28, 2014
Attivio has placed itself in the news again, this time for scoring a new patent. Virtual-Strategy Magazine declares, “Attivio Awarded Breakthrough Patent for Big Data Sentiment Analysis.” I’m not sure “breakthrough” is completely accurate, but that’s the language of press releases for you. Still, any advance can provide an advantage. The write-up explains that the company:
“… announced it was awarded U.S. Patent No. 8725494 for entity-level sentiment analysis. The patent addresses the market’s need to more accurately analyze, assign and understand customer sentiment within unstructured content where multiple brands and people are referenced and discussed. Most sentiment analysis today is conducted on a broad level to determine, for example, if a review is positive, negative or neutral. The entire entry or document is assigned sentiment uniformly, regardless of whether the feedback contains multiple comments that express a combination of brand and product sentiment.”
I can see how picking up on nuances can lead to a more accurate measurement of market sentiment, though it does seem more like an incremental step than a leap forward. Still, the patent is evidence of Attivio’s continued ascent. Founded in 2007 and headquartered in Massachusetts, Attivio maintains offices around the world. The company’s award-winning Active Intelligence Engine integrates structured and unstructured data, facilitating the translation of that data into useful business insights.
Cynthia Murrell, July 28, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Search, Not Just Sentiment Analysis, Needs Customization
July 11, 2014
One of the most widespread misperceptions in enterprise search and content processing is “install and search.” Anyone who has tried to get a desktop search system like X1 or dtSearch to do what the user wants with his or her files and network shares knows that fiddling is part of the desktop search game. Even a basic system like Sow Soft’s Effective File Search requires configuring the targets to query for every search in multi-drive systems. The work arounds are not for the casual user. Just try making a Google Search Appliance walk, talk, and roll over without the ministrations of an expert like Adhere Solutions. Don’t take my word for it. Get your hands dirty with information processing’s moving parts.
Does it not make sense that a search system destined for serving a Fortune 1000 company requires some additional effort? How much more time and money will an enterprise class information retrieval and content processing system require than a desktop system or a plug-and-play appliance?
How much effort is required to these tasks? There is work to get the access controls working as the ever alert security manager expects. Then there is the work needed to get the system to access, normalize, and process content for the basic index. Then there is work for getting the system to recognize, acquire, index, and allow a user to access the old, new, and changed content. Then one has to figure out what to tell management about rich media, content for which additional connectors are required, the method for locating versions of PowerPoints, Excels, and Word files. Then one has to deal with latencies, flawed indexes, and dependencies among the various subsystems that a search and content processing system includes. There are other tasks as well like interfaces, work flow for alerts, yadda yadda. You get the idea of the almost unending stream of dependent, serial “thens.”
When I read “Why Sentiment Analysis Engines need Customization”, I felt sad for licensees fooled by marketers of search and content processing systems. Yep, sad as in sorrow.
Is it not obvious that enterprise search and content processing is primarily about customization?
Many of the so called experts, advisors, and vendors illustrate these common search blind spots:
ITEM: Consulting firms that sell my information under another person’s name assuring that clients are likely to get a wild and wooly view of reality. Example: Check out IDC’s $3,500 version of information based on my team’s work. Here’s the link for those who find that big outfits help themselves to expertise and then identify a person with a fascinating employment and educational history as the AUTHOR.
See http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=idc%20attivio
In this example from http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=idc%20attivio, notice that my work is priced at seven times that of a former IDC professional. Presumably Mr. Schubmehl recognized that my value was greater than that of an IDC sole author and priced my work accordingly. Fascinating because I do not have a signed agreement giving IDC, Mr. Schubmehl, or IDC’s parent company the right to sell my work on Amazon.
This screen shot makes it clear that my work is identified as that of a former IDC professional, a fellow from upstate New York, an MLS on my team, and a Ph.D. on my team.
See http://amzn.to/1ner8mG.
I assume that IDC’s expertise embraces the level of expertise evident in the TechRadar article. Should I trust a company that sells my content without a formal contract? Oh, maybe I should ask this question, “Should you trust a high profile consulting firm that vends another person’s work as its own?” Keep that $3,500 price in mind, please.
ITEM: The TechRadar article is written by a vendor of sentiment analysis software. His employer is Lexalytics / Semantria (once a unit of Infonics). He writes:
High quality NLP engines will let you customize your sentiment analysis settings. “Nasty” is negative by default. If you’re processing slang where “nasty” is considered a positive term, you would access your engine’s sentiment customization function, and assign a positive score to the word. The better NLP engines out there will make this entire process a piece of cake. Without this kind of customization, the machine could very well be useless in your work. When you choose a sentiment analysis engine, make sure it allows for customization. Otherwise, you’ll be stuck with a machine that interprets everything literally, and you’ll never get accurate results.
When a vendor describes “natural language processing” with the phrase “high quality” I laugh. NLP is a work in progress. But the stunning statement in this quoted passage is:
Otherwise, you’ll be stuck with a machine that interprets everything literally, and you’ll never get accurate results.
Amazing, a vendor wrote this sentence. Unless a licensee of a “high quality” NLP system invests in customizing, the system will “never get accurate results.” I quite like that categorical never.
ITEM: Sentiment analysis is a single, usually complex component of a search or content processing system. A person on the LinkedIn enterprise search group asked the few hundred “experts” in the discussion group for examples of successful enterprise search systems. If you are a member in good standing of LinkedIn, you can view the original query at this link. [If the link won’t work, talk to LinkedIn. I have no idea how to make references to my content on the system work consistently over time.] I pointed out that enterprise search success stories are harder to find than reports of failures. Whether the flop is at the scale of the HP/Autonomy acquisition or a more modest termination like Overstock’s dumping of a big name system, the “customizing” issues is often present. Enterprise search and content processing is usually:
- A box of puzzle pieces that requires time, expertise, and money to assemble in a way that attracts and satisfies users and the CFO
- A work in progress to make work so users are happy and in a manner that does not force another search procurement cycle, the firing of the person responsible for the search and content processing system, and the legal fees related to the invoices submitted by the vendor whose system does not work. (Slow or no payment of licensee and consulting fees to a search vendor can be fatal to the search firm’s health.)
- A source of friction among those contending for infrastructure resources. What I am driving at is that a misconfigured search system makes some computing work S-L-O_W. Note: the performance issue must be addressed for appliance-based, cloud, or on premises enterprise search.
- Money. Don’t forget money, please. Remember the CFO’s birthday. Take her to lunch. Be really nice. The cost overruns that plague enterprise search and content processing deployments and operations will need all the goodwill you can generate.
If sentiment analysis requires customizing and money, take out your pencil and estimate how much it will cost to make NLP and sentiment to work. Now do the same calculation for relevancy tuning, index tuning, optimizing indexing and query processing, etc.
The point is that folks who get a basic key word search and retrieval system work pile on the features and functions. Vendors whip up some wrapper code that makes it possible to do a demo of customer support search, eCommerce search, voice search, and predictive search. Once the licensee inks the deal, the fun begins. The reason one major Norwegian search vendor crashed and burned is that licensees balked at paying bills for a next generation system that was not what the PowerPoint slides described. Why has IBM embraced open source search? Is one reason to trim the cost of keeping the basic plumbing working reasonably well? Why are search vendors embracing every buzzword that comes along? I think that search and an enterprise function has become a very difficult thing to sell, make work, and turn into an evergreen revenue stream.
The TechRadar article underscores the danger for licensees of over hyped systems. The consultants often surf on the expertise of others. The vendors dance around the costs and complexities of their systems. The buzzwords obfuscate.
What makes this article by the Lexalytics’ professional almost as painful as IDC’s unauthorized sale of my search content is this statement:
You’ll be stuck with a machine that interprets everything literally, and you’ll never get accurate results.
I agree with this statement.
Stephen E Arnold, July 11, 2014
Sentiment Analysis: A Breakthrough
May 12, 2014
Short honk. I have some questions about the efficacy of search vendors who pitch sentiment analysis. The jargon blizzard obscures some of the methods. I talk about some of these hyperboles in my video about search jargon. The article “Turning the Frown Upside Down: Kraft’s Jell-O Plans Twitter Mood Monitor” explains one of the secrets of the sentiment analysis wizards. Big Data? Nah, counting smiley faces. What Dark Arts do other sentiment analysis mavens conjure?
Stephen E Arnold, May 12, 2014
Attensity Analyze 6.3: Signs of Life Evident
March 11, 2014
Attensity has been a quiet sentiment, analytics, text processing vendor for some months. The company has now released a new version of its flagship product, Analyze, now at version 6.3. The headline feature is “enhanced analytics.”
According to a company news release, Attensity is “the leading provider of integrated, real-time solutions that blend multi-channel Voice of the Customer analytics and social engagement for enterprise listening needs.” Okay.
The new version of Analyze delivers to licensees real time information about what is trending. The system provides “multi dimensional visualization that immediately identifies performance outliers in the business that can impact6 the brand both positively and negatively.” Okay.
The system processes over 150 million blogs and forums, Facebook, and Twitter. Okay.
As memorable as these features are, here’s the passage that I noted:
Attensity 6.3 is powered by the Attensity Semantic Annotation Server (ASAS) and patented natural language processing (NLP) technology. Attensity’s unique ASAS platform provides unmatched deep sentiment analysis, entity identification, statistical assignment and exhaustive extraction, enabling organizations to define relationships between people, places and things without using pre-defined keywords or queries. It’s this proprietary technology that allows Attensity to make the unknown known.
“To make the unknown known” is a bold assertion. Okay.
I have heard that sentiment analysis companies are running into some friction. The expectations of some licensees have been a bit high. Perhaps Analyze 6.3 will suck up customers of other systems who are dissatisfied with their sentiment, semantic, analytics systems. Making the “unknown known” should cause the world to beat a path to Attensity’s door. Okay.
Stephen E Arnold, March 11, 2014
Incorporating Twitter Sentiment Analysis into Trading Software
February 21, 2014
Thomson Reuters has added Twitter sentiment analysis to its Eikon subscription trading platform. Sorting tweets into positive and negative messages based on proprietary language-processing technology, the feature meets the demands of a growing number of traders.
According to Matthew Finnegan’s story “Thomson Reuters Adds Twitter Sentiment Analysis to Eikon Trading Terminal” for Computerworld UK, the analytics tool will show users the volume of both positive and negative messaging relating to specific companies on an hourly basis. Thomson Reuters’ Chief Technology Officer Philip Brittan stressed that the information will be used primarily for research, not a basis for trading decisions.
Since there have been instances of fake Tweets influencing markets, the caution is probably justified. But the power of social media’s unstructured data cannot be denied, and Eikon is attempting to harness it for subscribers:
“…the Eikon sentiment analysis aims to also make it easier for humans to quickly make sense of masses of social media information currently available, with tens of thousands of tweets about major companies each day.”
It’s one more way we see social media emerging as the dominant media force of the 21st century.
Laura Abrahamsen, February 21, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Guide to Sentiment Analysis Application
January 17, 2014
The article on Lexalytics Blog titled Tagging, Taxonomies, Categorization with Salience provides a guide to using salience to get the most out of data. The first step, Discovery, involves features like Themes which extracts proper noun phrases to give a summary of what the content contains. Step 2 uses Concept Topics which uses ontology built from Wikipedia’s semantic knowledge to relate one word to another.
The article explains how this works:
“Salience will use the relationship between the category samples to tag your data. So every time the word “lion” pops up in your data, that entry will be categorized as “cats”. Every time the word “cheetah” appears, salience will know that this animal belongs to the cat family, and will tag the document as “cats”. This method of categorization is awesome because you do not need to list every single member of the cat family to create this category.”
Step 3 is another way of classifying data; it is creating a query topic. You input all words associated with your topic after consulting Wikipedia and a thesaurus, then limit the search with more information, and you also include how closely one word must be to another for it to be relevant.
Chelsea Kerwin, January 17, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Clarabridge: An IPO in 2014?
January 3, 2014
Clarabridge is a company involved in CEM or customer experience management. I am not exactly sure what the concept means. My experiences with customer support at T-Mobile or Holland America leaves me in a quandary. For companies that want to help me, the companies are doing everythi9ng in their power to drive me elsewhere. I assume that Clarabridge has a solution to this problem. No, not customers, but the costs a company incurs when customers contact them.
I noted that Clarabridge, according to In the Capital, raised $80 million in September 2013. In the Capital asserted that the company may be moving toward an initial public offering. The write up continued:
CEO Sid Banerjee said he hopes his company will soon follow in the footsteps of Cvent, another Northern Virginia company that recently went public.
Interesting stuff. The money and the the alleged 2014 IPO, not CEM. With rumors of some push back for come sentiment centric analytics, Clarabridge may have cracked the code. It does have an additional $80 million.
Stephen E Arnold, January 3, 2014
Semantria and Diffbot: Clever Way to Forge a Tie Up
December 12, 2013
Short honk. I came across an interesting marketing concept in “Diffbot and Semantria Join to Find and Parse the Important Text on the ‘Net (Exclusive).”
Semantria (a company that offers sentiment analysis as a service) participated in a hackathon in San Francisco. The explains:
To make the Semantria service work quickly, even for text-mining novices, Rogynskyy’s team decided to build a plugin for Microsoft’s popular Excel spreadsheet program. The data in a spreadsheet goes to the cloud for processing, and Semantria sends back analysis in Excel format.
Semantria sponsored a prize for the best app. Diffbot won:
A Diffbot developer built a simple plugin for Google’s Chrome browser that changes the background color of messages on Facebook and Twitter based on sentiment — red for negative, green for positive. The concept won a prize from Semantria, Rogynskyy said. A Diffbot executive was on hand at the hackathon, and Rogynskyy started talking with him about how the two companies could work together.
I like the “sponsor”, “winner” and “team up” approach. The pay off, according to the article, is “While Semantria and Diffbot technologies continue to be available separately, they can now be used together.”
Sentiment analysis is one of the search submarkets that caught fire and then, based on the churning at some firms like Attensity, may be losing some momentum. Marketing innovation may be a goal other firms offering this functionality in 2014.
Stephen E Arnold, December 12, 2013
Try TextBlob for Sentiment Analysis
November 19, 2013
Sad to say, we have heard rumblings about severe disappointment with Attensity-type and Lexalytics-type sentiment applications. If you want to kick some tires in this interesting search niche, look instead to the open source application TextBlob. OpenShift points out this resource in, “Day 9: TextBlog—Finding Sentiments in Text.” The article is one in an ambitious series by writer Shekhar Gulati, who challenged himself to master one technology a day for a month. Very admirable, sir!
Gulati begins with his experience with sentiment analysis:
“My interest in sentiment analysis is few years old when I wanted to write an application which will process a stream of tweets about a movie, and then output the overall sentiment about the movie. Having this information would help me decide if I wanted to watch a particular movie or not.
“I googled around, and found that Naive Bayes classifier can be used to solve this problem. The only programming language that I knew at the time was Java, so I wrote a custom implementation and used the application for some time. I was lazy to commit the code, so when my machine crashed, I lost the code and application. Now I commit all my code to github, and I have close to 200 public repositories 🙂
“In this blog, I will talk about a Python package called TextBlob which can help developers solve this problem. We will first cover some basics, and then we will develop a simple Flask application which will use the TextBlob API.”
The post does indeed cover the basics, including the installation of Python and virtualenv before we can get going with TextBlob. It then takes us through writing an example application and deploying to the cloud. As he notes above, Gulati has his code safe and sound at Github; the code for this example are posted here, and the js and css files can be found here.
Cynthia Murrell, November 19, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Invest in Social Media Sentiment Analysis to Avoid Brand Damage
November 2, 2013
Simon Creasey from Computer Weekly recently reported on the outcome of the latest Twitter firestorm in the article “Failure to Invest in Sentiment Analytics Could Lead to Brand Damage.”
According to the article, a disgruntled British Airways passenger decided use a paid-for promoted tweet to blast his complaints to thousands of Twitter followers. As you can imagine, the tweet went viral and was shared and re-shared until it received global coverage. While PR disasters are often unavoidable, businesses are developing social media sentiment analysis software to contain them.
The article concludes:
““Monitoring what people are saying about your products and industry can help you design your products and propositions for the future and in that sense Twitter acts as a great market research tool as well as a lead-generation tool,” says Sinclair.
“Similarly, if you monitor what people are saying about your brand it can also help you with customer service and PR. There are many examples of companies who have found themselves under social media attack. Failure to invest in these kinds of tools could easily result in significant damage to a company’s reputation and brand.”
These days, social media is ever expanding and it is impossible to keep track of everything being said about your company’s brand, products, and employees. In order to avoid PR disasters like the one that happened to British Airways, companies should invest in the latest sentiment analysis technologies.
Jasmine Ashton, November 02, 2013
Sponsored by ArnoldIT.com, Developer of Beyond Search