CyberOSINT banner

Search Cheerleader Seeks Text Analytics Unicorns

June 12, 2015

The article on Venture Beat whimsically titled Where Are the Text Analytics Unicorns provides yet another cheerleader for search. The article uses Aileen Lee’s “unicorn” concept of a company begun since 2003 and valued at over a billion dollars. (“Super unicorns” are companies valued at over a hundred billion dollars like Facebook.) The article asks why no text analytics companies have joined this exclusive club? Candidates include Clarabridge, NetBase and Medallia.

“In the end, the answer is a very basic one. Contrast the text analytics sector with unicorns that include Uber — Travis Kalanick’s company — and Airbnb, Evernote, Flipkart, Square, Pinterest, and their ilk. They play to mass markets — they’re a magic mix of revenue, data, platform, and pizazz — in ways that text analytics doesn’t. The tech companies on the unicorn list — Cloudera, MongoDB, Pivotal — provide or support essential infrastructure that covers a broad set of needs.”

Before coming to this conclusion, the article posits other possible reasons as well, such as the sheer number of companies competing in the field, or even competition from massive companies like IBM and Google. But these are dismissed for the more optimistic end note that essentially suggests we give the text analytics unicorns a year. Caution advised.

Chelsea Kerwin, June 12, 2015

Sponsored by, publisher of the CyberOSINT monograph


Lexalytics: GUI and Wizard

June 12, 2015

What is one way to improve a user’s software navigational experience?  One of the best ways is to add a graphical user interface (GUI).  Software Development @ IT Business Net shares a press release about “Lexalytics Unveils Industry’s First Wizard For Text Mining And Sentiment Analysis.”  Lexalytics is one of the leading companies that provides sentiment and analytics solutions and as the article’s title explains it has made an industry first by releasing a GUI and wizard for Semantria SaaS platform and Excel plug-in.  The wizard and GUI (SWIZ) are part of the Semantria Online Configurator, SWEB 1.3, which also included functionality updates and layout changes.

” ‘In order to get the most value out of text and sentiment analysis technologies, customers need to be able to tune the service to match their content and business needs,’ said Jeff Catlin, CEO, Lexalytics. ‘Just like Apple changed the game for consumers with its first Macintosh in 1984, making personal computing easy and fun through an innovative GUI, we want to improve the job of data analysts by making it just as fun, easy and intuitive with SWIZ.’”

Lexalytics is dedicated to helping its clients enjoy an easier experience when it comes to data analytics.  The company wants its clients to get the answers they by providing the tools they need to get them without having to over think the retrieval process.  While Lexalytics already provides robust and flexible solutions, the SWIZ release continues to prove it has the most tunable and configurable text mining technology.

Whitney Grace, June 12, 2015

Sponsored by, publisher of the CyberOSINT monograph

Sentiment Analysis: The Progeny of Big Data?

June 9, 2015

I read “Text Analytics: The Next Generation of Big Data.” The article provides a straightforward explanation of Big Data, embraces unstructured information like blog posts in various languages, email, and similar types of content, and then leaps to the notion of text analytics. The conclusion to the article is that we are experiencing “The Coming of Age of Text Analytics—The Next Generation of Big Data.”

The idea is good news for the vendors of text analytics aimed squarely at commercial enterprises, advertisers, and marketers. I am not sure the future will match up to the needs of the folks at the law enforcement and intelligence conference I had just left.

There are three reasons:

First, text analytics are not new, and the various systems and methods have been in use for decades. One notable example is BAE Systems use of its home brew tools and Autonomy’s technology in the 1990s and i2 (pre IBM) and its efforts even earlier.

Second, the challenges of figuring out what structured and unstructured data mean require more than determining if a statement is positive or negative. Text analytics is, based on my experience, blind to such useful data as real time geospatial inputs and video streamed from mobile devices and surveillance devices. Text analytics, like key word search, makes a contribution, but it is in a supporting role, not the Beyoncé of content processing.

Third, the future points to the use of technologies like predictive analytics. Text analytics are components in these more robust systems whose outputs are designed to provide probability-based outputs from a range of input sources.

There was considerable consternation a year or so ago. I spoke with a team involved with text analytics at a major telecommunications company. The grousing was that the outputs of the system did not make sense and it was difficult for those reviewing the outputs to figure out what the data meant.

At the LE/intel conference, the focus was on systems which provide actionable information in real time. My point is that vendors have a tendency to see the solutions in terms of what is often a limited or supporting technology.

Sentiment analysis is a good example. Blog posts invoking readers to join ISIS are to some positive and negative. The point is that the point of view of the reader determines whether a message is positive or negative.

The only way to move beyond this type of superficial and often misleading analysis is to deal with context, audio, video, intercept data, geolocation data, and other types of content. Text analytics is one component in a larger system, not the solution to the types of problems explored at the LE/intel conference in early June 2015. Marketing often clouds reality. In some businesses, no one knows that the outputs are not helpful. In other endeavors, the outputs have far higher import. Knowing that a recruiting video with a moving nasheed underscoring the good guys dispatching the bad guys is off kilter. Is it important to know that the video is happy or sad? In fact, it is silly to approach the content in this manner.

Stephen E Arnold, June 9, 2014

Free Book from OpenText on Business in the Digital Age

May 27, 2015

This is interesting. OpenText advertises their free, downloadable book in a post titled, “Transform Your Business for a Digital-First World.” Our question is whether OpenText can transform their own business; it seems their financial results have been flat and generally drifting down of late. I suppose this is a do-as-we-say-not-as-we-do situation.

The book may be worth looking into, though, especially since it passes along words of wisdom from leaders within multiple organizations. The description states:

“Digital technology is changing the rules of business with the promise of increased opportunity and innovation. The very nature of business is more fluid, social, global, accelerated, risky, and competitive. By 2020, profitable organizations will use digital channels to discover new customers, enter new markets and tap new streams of revenue. Those that don’t make the shift could fall to the wayside. In Digital: Disrupt or Die, a multi-year blueprint for success in 2020, OpenText CEO Mark Barrenechea and Chairman of the Board Tom Jenkins explore the relationship between products, services and Enterprise Information Management (EIM).”

Launched in 1991, OpenText offers tools for enterprise information management, business process management, and customer experience management. Based in Waterloo, Ontario, the company maintains offices around the world.

Cynthia Murrell, May 27, 2015

Sponsored by, publisher of the CyberOSINT monograph

Lexalytics Offers Tunable Text Mining

May 13, 2015

Want to do text mining without some of the technical hassles? if so, you will want to read about Lexalytics “the industry’s most tunable and configurable text mining technology.” Navigate to “Lexalytics Unveils Industry’s First Wizard for Text Mining and Sentiment Analysis.” I learned that text mining can be fun, easy, and intuitive.” I highlighted this quote from the news story as an indication that one does not need to understand exactly what’s going on in the text mining process:

“Before, our customers had to understand the meaning of things like ‘alpha-numeric content threshold’ and ‘entities confidence threshold,'” Jeff continued. “Lexalytics provides the most knobs to turn to get the results exactly as you want them, and now our customers don’t even have to think about them.”

Text mining, the old-fashioned way, required understanding of what was required, what procedures were appropriate, and ability to edit or write scripts. There are other skills that used to be required as the entry fee to text mining. The modern world of interfaces allows anyone to text mine. Do users understand the outputs? Sure. Perfectly.

As I read the story, I recalled a statement in “A Review of Three Natural Language Processors, AlchemyAPI, OpenCalais, and Semantria.” Here is the quote I noted in that July 2014 write up by Marc Clifton:

I find the concept of Natural Language Processing intriguing and that it holds many possibilities for helping to filter and analyze the vast and growing amount of information out there on the web.  However, I’m not quite sure exactly how one uses the output of an NLP service in a productive way that goes beyond simple keyword matching.  Some people will of course be interested in whether the sentiment is positive or negative, and I think the idea of extracting concepts (AlchemyAPI) and topics (Semantria) are useful in extracting higher level abstractions regarding a document.  NLP is therefore an interesting field of study and I believe that the people who provide NLP services would benefit from the feedback of users to increase the value of their service.

Perhaps the feedback was, “Make this stuff easy to do.” Now the challenge is to impart understanding to what a text mining system outputs. That might be a bit more difficult.

Stephen E Arnold, May 13, 2015

Hoping to End Enterprise Search Inaccuracies

May 1, 2015

Enterprise search is limited to how well users tag their content and the preloaded taxonomies.  According Tech Target’s Search Content Management blog, text analytics might be the key to turning around poor enterprise search performance: “How Analytics Engines Could Finally-Relieve Enterprise Pain.”  Text analytics turns out to only be part of the solution.  Someone had the brilliant idea to use text analytics to classification issues in enterprise search, making search reactive to user input to proactive to search queries.

In general, analytics search engines work like this:

“The first is that analytics engines don’t create two buckets of content, where the goal is to identify documents that are deemed responsive. Instead, analytics engines identify documents that fall into each category and apply the respective metadata tags to the documents.  Second, people don’t use these engines to search for content. The engines apply metadata to documents to allow search engines to find the correct information when people search for it. Text analytics provides the correct metadata to finally make search work within the enterprise.”

Supposedly, they are fixing the tagging issue by removing the biggest cause for error: humans. Microsoft caught onto how much this could generate profit, so they purchased Equivio in 2014 and integrated the FAST Search platform into SharePoint.  Since Microsoft is doing it, every other tech company will copy and paste their actions in time.  Enterprise search is gull of faults, but it has improved greatly.  Big data trends have improved search quality, but tagging continues to be an issue.  Text analytics search engines will probably be the newest big data field for development. Hint for developers: work on an analytics search product, launch it, and then it might be bought out.

Whitney Grace, May 1 2015
Sponsored by, publisher of the CyberOSINT monograph

Twitter Plays Hard Ball or DataSift Knows the End Is in Sight

April 11, 2015

I read “Twitter Ends its Partnership with DataSift – Firehose Access Expires on August 13, 2015.” DataSift supports a number of competitive and other intelligence services with its authorized Twitter stream. The write up says:

DataSift’s customers will be able to access Twitter’s firehose of data as normal until August 13th, 2015. After that date all the customers will need to transition to other providers to receive Twitter data. This is an extremely disappointing result to us and the ecosystem of companies we have helped to build solutions around Twitter data.

I found this interesting. Plan now or lose that authorized firehose. Perhaps Twitter wants more money? On the other hand, maybe DataSift realizes that for some intelligence tasks, Facebook is where the money is. Twitter is a noise machine. Facebook, despite its flaws, is anchored in humans, but the noise is increasing. Some content processes become more tricky with each business twist and turn.

Stephen E Arnold, April 11, 2015

Predicting Plot Holes Isn’t So Easy

April 10, 2015

According to The Paris Review’s blog post “Man In Hole II: Man In Deeper Hole” Mathew Jockers created an analysis tool to predict archetypal book plots:

A rough primer: Jockers uses a tool called “sentiment analysis” to gauge “the relationship between sentiment and plot shape in fiction”; algorithms assign every word in a novel a positive or negative emotional value, and in compiling these values he’s able to graph the shifts in a story’s narrative. A lot of negative words mean something bad is happening, a lot of positive words mean something good is happening. Ultimately, he derived six archetypal plot shapes.”

Academics, however, found some problems with Jockers’s tool, such as is it possible to assign all words an emotional variance and can all plots really take basic forms?  The problem is that words are as nuanced as human emotion, perspectives change in an instant, and sentiments are subjective.  How would the tool rate sarcasm?

All stories have been broken down into seven basic plots, so why can it not be possible to do the same for book plots?  Jockers already identified six basic book plots and there are some who are curiously optimistic about his analysis tool.  It does beg the question if will staunch author’s creativity or if it will make English professors derive even more subjective meaning from Ulysses?

Whitney Grace, April 10, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Attensity Adds Semantic Markup

April 3, 2015

You have been waiting for more markup. I know I have, and that is why I read “Attensity Semantic Annotation: NLP-Analyse für Unternehmensapplikationen.”

So your wait and mine—over.

Attensity, a leading in figuring out what human discourse means, has rolled out a software development kit so you can do a better job with customer engagement and business intelligence. Attensity offers Dynamic Data Discovery. Unlike traditional analysis tools, Attensity does not focus on keywords. You know, what humans actually use to communicate.

Attensity uses natural language processing in order to identify concepts and issues in plain language. I must admit that I have heard this claim from a number of vendors, including long forgotten systems like DR LINK, among others.

The idea is that the SDK makes it easier to filter data to evaluate textual information and identify issues. Furthermore the SDK performs fast content fusion. The result is, as other vendors have asserted, insight. There was a vendor called Inxight which asserted quite similar functions in 1997. At one time, Attensity had a senior manager from Inxight, but I assume the attribution of functions is one of Attensity’s innovations. (Forgive me for mentioning vendors with which some 20 somethings know quite well.)

If you are dependent upon Java, Attensity is an easy fit. I assume that if you are one of the 150 million plus Microsoft SharePoint outfits, Attensity integration may require a small amount of integration work.

According the Attensity, the benefits of Attensity’s markup approach is that the installation is on site and therefore secure. I am not sure about this because security is person dependent, so cloud or on site, security remains an “issue” different from the one’s Attensity’s system identifies.

Attensity, like Oracle, provides a knowledge base for specific industries. Oracle made term lists available for a number of years. Maybe since its acquisition of Artificial Linguistics in the mid 1990s?

Attensity supports five languages. For these five languages, Attensity can determine the “tone” of the words used in a document. Presumably a company like Bitext can provide additional language support if Attensity does not have these ready to install.

Vendors continue to recycle jargon and buzzwords to describe certain core functions available from many different vendors. If your metatagging outfit is not delivering, you may want to check out Attensity’s solution.

Stephen E Arnold, April 3, 2015

SAS Text Miner Provides Valuable Predictive Analytics

March 25, 2015

If you are searching for predictive analytics software that provides in-depth text analysis with advanced linguistic capabilities, you may want to check out “SAS Text Miner.”  Predictive Analytics Today runs down the features and what SAS Text Miner and details how it works.

It is a user-friendly software with data visualization, flexible entity options, document theme discovery, and more.

“The text analytics software provides supervised, unsupervised, and semi-supervised methods to discover previously unknown patterns in document collections.  It structures data in a numeric representation so that it can be included in advanced analytics, such as predictive analysis, data mining, and forecasting.  This version also includes insightful reports describing the results from the rule generator node, providing clarity to model training and validation results.”

SAS Text Miner includes other features that draw on automatic Boolean rule generation to categorize documents and other rules can be exported into Boolean rules.  Data sets can be made from a directory on crawled from the Web.  The visual analysis feature highlights the relationships between discovered patterns and displays them using a concept link diagram.  SAS Text Miner has received high praise as a predictive analytics software and it might be the solution your company is looking for.

Whitney Grace, March 25, 2015
Stephen E Arnold, Publisher of CyberOSINT at

Next Page »