July 10, 2015
Big data is tools help organizations analyze more than their old, legacy data. While legacy data does help an organization study how their process have changed, the data is old and does not reflect the immediate, real time trends. SAS offers a product that bridges old data with the new as well as unstructured and structured data.
The SAS Text Miner is built from Teragram technology. It features document theme discovery, a function the finds relations between document collections; automatic Boolean rule generation; high performance text mining that quickly evaluates large document collection; term profiling and trending, evaluates term relevance in a collection and how they are used; multiple language support; visual interrogation of results; easily import text; flexible entity options; and a user friendly interface.
The SAS Text Miner is specifically programmed to discover data relationships data, automate activities, and determine keywords and phrases. The software uses predictive models to analysis data and discover new insights:
“Predictive models use situational knowledge to describe future scenarios. Yet important circumstances and events described in comment fields, notes, reports, inquiries, web commentaries, etc., aren’t captured in structured fields that can be analyzed easily. Now you can add insights gleaned from text-based sources to your predictive models for more powerful predictions.”
Text mining software reveals insights between old and new data, making it one of the basic components of big data.
Whitney Grace, July 10, 2015
July 1, 2015
have expressed interest in Computer Sciences Corp’s public sector division. There are not a lot of details about the possible transaction as it is still in the early stages, so everything is still hush-hush.
The possible acquisition came after the news that CSC will split into two divisions: one that serves US public sector clients and the other dedicated to global commercial and non-government clients. CSC has an estimated $4.1 billion in revenues and worth $9.6 billion, but CACI International, Leidos Holdings, and Booz Allen Hamilton might reconsider the sale or getting the price lowered after hearing this news: “Computer Sciences (CSC) To Pay $190M Penalty; SEC Charges Company And Former Executives With Accounting Fraud” from Street Insider. The Securities and Exchange Commission are charging CSC and former executives with a $190 million penalty for hiding financial information and problems resulting from the contract they had with their biggest client. CSC and the executives, of course, are contesting the charges.
“The SEC alleges that CSC’s accounting and disclosure fraud began after the company learned it would lose money on the NHS contract because it was unable to meet certain deadlines. To avoid the large hit to its earnings that CSC was required to record, Sutcliffe allegedly added items to CSC’s accounting models that artificially increased its profits but had no basis in reality. CSC, with Laphen’s approval, then continued to avoid the financial impact of its delays by basing its models on contract amendments it was proposing to the NHS rather than the actual contract. In reality, NHS officials repeatedly rejected CSC’s requests that the NHS pay the company higher prices for less work. By basing its models on the flailing proposals, CSC artificially avoided recording significant reductions in its earnings in 2010 and 2011.”
Oh boy! Is it a wise decision to buy a company that has a history of stealing money and hiding information? If the company’s root products and services are decent, the buyers might get it for a cheap price and recondition the company. Or it could lead to another disaster like HP and Autonomy.
Whitney Grace, July 1, 2015
June 12, 2015
The article on Venture Beat whimsically titled Where Are the Text Analytics Unicorns provides yet another cheerleader for search. The article uses Aileen Lee’s “unicorn” concept of a company begun since 2003 and valued at over a billion dollars. (“Super unicorns” are companies valued at over a hundred billion dollars like Facebook.) The article asks why no text analytics companies have joined this exclusive club? Candidates include Clarabridge, NetBase and Medallia.
“In the end, the answer is a very basic one. Contrast the text analytics sector with unicorns that include Uber — Travis Kalanick’s company — and Airbnb, Evernote, Flipkart, Square, Pinterest, and their ilk. They play to mass markets — they’re a magic mix of revenue, data, platform, and pizazz — in ways that text analytics doesn’t. The tech companies on the unicorn list — Cloudera, MongoDB, Pivotal — provide or support essential infrastructure that covers a broad set of needs.”
Before coming to this conclusion, the article posits other possible reasons as well, such as the sheer number of companies competing in the field, or even competition from massive companies like IBM and Google. But these are dismissed for the more optimistic end note that essentially suggests we give the text analytics unicorns a year. Caution advised.
Chelsea Kerwin, June 12, 2015
June 12, 2015
What is one way to improve a user’s software navigational experience? One of the best ways is to add a graphical user interface (GUI). Software Development @ IT Business Net shares a press release about “Lexalytics Unveils Industry’s First Wizard For Text Mining And Sentiment Analysis.” Lexalytics is one of the leading companies that provides sentiment and analytics solutions and as the article’s title explains it has made an industry first by releasing a GUI and wizard for Semantria SaaS platform and Excel plug-in. The wizard and GUI (SWIZ) are part of the Semantria Online Configurator, SWEB 1.3, which also included functionality updates and layout changes.
” ‘In order to get the most value out of text and sentiment analysis technologies, customers need to be able to tune the service to match their content and business needs,’ said Jeff Catlin, CEO, Lexalytics. ‘Just like Apple changed the game for consumers with its first Macintosh in 1984, making personal computing easy and fun through an innovative GUI, we want to improve the job of data analysts by making it just as fun, easy and intuitive with SWIZ.’”
Lexalytics is dedicated to helping its clients enjoy an easier experience when it comes to data analytics. The company wants its clients to get the answers they by providing the tools they need to get them without having to over think the retrieval process. While Lexalytics already provides robust and flexible solutions, the SWIZ release continues to prove it has the most tunable and configurable text mining technology.
Whitney Grace, June 12, 2015
June 9, 2015
I read “Text Analytics: The Next Generation of Big Data.” The article provides a straightforward explanation of Big Data, embraces unstructured information like blog posts in various languages, email, and similar types of content, and then leaps to the notion of text analytics. The conclusion to the article is that we are experiencing “The Coming of Age of Text Analytics—The Next Generation of Big Data.”
The idea is good news for the vendors of text analytics aimed squarely at commercial enterprises, advertisers, and marketers. I am not sure the future will match up to the needs of the folks at the law enforcement and intelligence conference I had just left.
There are three reasons:
First, text analytics are not new, and the various systems and methods have been in use for decades. One notable example is BAE Systems use of its home brew tools and Autonomy’s technology in the 1990s and i2 (pre IBM) and its efforts even earlier.
Second, the challenges of figuring out what structured and unstructured data mean require more than determining if a statement is positive or negative. Text analytics is, based on my experience, blind to such useful data as real time geospatial inputs and video streamed from mobile devices and surveillance devices. Text analytics, like key word search, makes a contribution, but it is in a supporting role, not the Beyoncé of content processing.
Third, the future points to the use of technologies like predictive analytics. Text analytics are components in these more robust systems whose outputs are designed to provide probability-based outputs from a range of input sources.
There was considerable consternation a year or so ago. I spoke with a team involved with text analytics at a major telecommunications company. The grousing was that the outputs of the system did not make sense and it was difficult for those reviewing the outputs to figure out what the data meant.
At the LE/intel conference, the focus was on systems which provide actionable information in real time. My point is that vendors have a tendency to see the solutions in terms of what is often a limited or supporting technology.
Sentiment analysis is a good example. Blog posts invoking readers to join ISIS are to some positive and negative. The point is that the point of view of the reader determines whether a message is positive or negative.
The only way to move beyond this type of superficial and often misleading analysis is to deal with context, audio, video, intercept data, geolocation data, and other types of content. Text analytics is one component in a larger system, not the solution to the types of problems explored at the LE/intel conference in early June 2015. Marketing often clouds reality. In some businesses, no one knows that the outputs are not helpful. In other endeavors, the outputs have far higher import. Knowing that a recruiting video with a moving nasheed underscoring the good guys dispatching the bad guys is off kilter. Is it important to know that the video is happy or sad? In fact, it is silly to approach the content in this manner.
Stephen E Arnold, June 9, 2014
May 27, 2015
This is interesting. OpenText advertises their free, downloadable book in a post titled, “Transform Your Business for a Digital-First World.” Our question is whether OpenText can transform their own business; it seems their financial results have been flat and generally drifting down of late. I suppose this is a do-as-we-say-not-as-we-do situation.
The book may be worth looking into, though, especially since it passes along words of wisdom from leaders within multiple organizations. The description states:
“Digital technology is changing the rules of business with the promise of increased opportunity and innovation. The very nature of business is more fluid, social, global, accelerated, risky, and competitive. By 2020, profitable organizations will use digital channels to discover new customers, enter new markets and tap new streams of revenue. Those that don’t make the shift could fall to the wayside. In Digital: Disrupt or Die, a multi-year blueprint for success in 2020, OpenText CEO Mark Barrenechea and Chairman of the Board Tom Jenkins explore the relationship between products, services and Enterprise Information Management (EIM).”
Launched in 1991, OpenText offers tools for enterprise information management, business process management, and customer experience management. Based in Waterloo, Ontario, the company maintains offices around the world.
Cynthia Murrell, May 27, 2015
May 13, 2015
Want to do text mining without some of the technical hassles? if so, you will want to read about Lexalytics “the industry’s most tunable and configurable text mining technology.” Navigate to “Lexalytics Unveils Industry’s First Wizard for Text Mining and Sentiment Analysis.” I learned that text mining can be fun, easy, and intuitive.” I highlighted this quote from the news story as an indication that one does not need to understand exactly what’s going on in the text mining process:
“Before, our customers had to understand the meaning of things like ‘alpha-numeric content threshold’ and ‘entities confidence threshold,'” Jeff continued. “Lexalytics provides the most knobs to turn to get the results exactly as you want them, and now our customers don’t even have to think about them.”
Text mining, the old-fashioned way, required understanding of what was required, what procedures were appropriate, and ability to edit or write scripts. There are other skills that used to be required as the entry fee to text mining. The modern world of interfaces allows anyone to text mine. Do users understand the outputs? Sure. Perfectly.
As I read the story, I recalled a statement in “A Review of Three Natural Language Processors, AlchemyAPI, OpenCalais, and Semantria.” Here is the quote I noted in that July 2014 write up by Marc Clifton:
I find the concept of Natural Language Processing intriguing and that it holds many possibilities for helping to filter and analyze the vast and growing amount of information out there on the web. However, I’m not quite sure exactly how one uses the output of an NLP service in a productive way that goes beyond simple keyword matching. Some people will of course be interested in whether the sentiment is positive or negative, and I think the idea of extracting concepts (AlchemyAPI) and topics (Semantria) are useful in extracting higher level abstractions regarding a document. NLP is therefore an interesting field of study and I believe that the people who provide NLP services would benefit from the feedback of users to increase the value of their service.
Perhaps the feedback was, “Make this stuff easy to do.” Now the challenge is to impart understanding to what a text mining system outputs. That might be a bit more difficult.
Stephen E Arnold, May 13, 2015
May 1, 2015
Enterprise search is limited to how well users tag their content and the preloaded taxonomies. According Tech Target’s Search Content Management blog, text analytics might be the key to turning around poor enterprise search performance: “How Analytics Engines Could Finally-Relieve Enterprise Pain.” Text analytics turns out to only be part of the solution. Someone had the brilliant idea to use text analytics to classification issues in enterprise search, making search reactive to user input to proactive to search queries.
In general, analytics search engines work like this:
“The first is that analytics engines don’t create two buckets of content, where the goal is to identify documents that are deemed responsive. Instead, analytics engines identify documents that fall into each category and apply the respective metadata tags to the documents. Second, people don’t use these engines to search for content. The engines apply metadata to documents to allow search engines to find the correct information when people search for it. Text analytics provides the correct metadata to finally make search work within the enterprise.”
Supposedly, they are fixing the tagging issue by removing the biggest cause for error: humans. Microsoft caught onto how much this could generate profit, so they purchased Equivio in 2014 and integrated the FAST Search platform into SharePoint. Since Microsoft is doing it, every other tech company will copy and paste their actions in time. Enterprise search is gull of faults, but it has improved greatly. Big data trends have improved search quality, but tagging continues to be an issue. Text analytics search engines will probably be the newest big data field for development. Hint for developers: work on an analytics search product, launch it, and then it might be bought out.
April 11, 2015
I read “Twitter Ends its Partnership with DataSift – Firehose Access Expires on August 13, 2015.” DataSift supports a number of competitive and other intelligence services with its authorized Twitter stream. The write up says:
DataSift’s customers will be able to access Twitter’s firehose of data as normal until August 13th, 2015. After that date all the customers will need to transition to other providers to receive Twitter data. This is an extremely disappointing result to us and the ecosystem of companies we have helped to build solutions around Twitter data.
I found this interesting. Plan now or lose that authorized firehose. Perhaps Twitter wants more money? On the other hand, maybe DataSift realizes that for some intelligence tasks, Facebook is where the money is. Twitter is a noise machine. Facebook, despite its flaws, is anchored in humans, but the noise is increasing. Some content processes become more tricky with each business twist and turn.
Stephen E Arnold, April 11, 2015
April 10, 2015
A rough primer: Jockers uses a tool called “sentiment analysis” to gauge “the relationship between sentiment and plot shape in fiction”; algorithms assign every word in a novel a positive or negative emotional value, and in compiling these values he’s able to graph the shifts in a story’s narrative. A lot of negative words mean something bad is happening, a lot of positive words mean something good is happening. Ultimately, he derived six archetypal plot shapes.”
Academics, however, found some problems with Jockers’s tool, such as is it possible to assign all words an emotional variance and can all plots really take basic forms? The problem is that words are as nuanced as human emotion, perspectives change in an instant, and sentiments are subjective. How would the tool rate sarcasm?
All stories have been broken down into seven basic plots, so why can it not be possible to do the same for book plots? Jockers already identified six basic book plots and there are some who are curiously optimistic about his analysis tool. It does beg the question if will staunch author’s creativity or if it will make English professors derive even more subjective meaning from Ulysses?
Whitney Grace, April 10, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com