November 11, 2014
Through the News section of their website, eDigitalResearch announces a new partnership in, “eDigitalResearch Partner with Lexalytics on Real-Time Text Analytics Solution.” The two companies are integrating Lexalytics’ Salience analysis engine into eDigital’s HUB analysis and reporting interface. The write-up tells us:
“By utilising and integrating Lexalytics Salience text analysis engine into eDigitalResearch’s own HUB system, the partnership will provide clients with a real-time, secure solution for understanding what customers are saying across the globe. Able to analyse comments from survey responses to social media – in fact any form of free text – eDigitalResearch’s HUB Text Analytics will provide the power and platform to really delve deep into customer comments, monitor what is being said and alert brands and businesses of any emerging trends to help stay ahead of the competition.”
Based in Hampshire, U.K., eDigitalResearch likes to work closely with their clients to produce the best solution for each. The company began in 1999 with the launch of the eMysteryShopper, a novel concept at the time. As of this writing, eDigitalResearch is looking to hire a developer and senior developer (in case anyone here is interested.)
Founded in 2003, Lexalytics is proud to have brought the first sentiment analysis engine to market. Designed to integrate with third-party applications, their text analysis software is chugging along in the background at many data-related companies. Lexalytics is headquartered in Amherst, Massachusetts.
Cynthia Murrell, November 11, 2014
November 6, 2014
Here’s an interesting development from the world of text-processing technology. GeekWire reports, “Microsoft and Amazon Vets Form Textio, a New Startup Looking to Discover Patterns in Documents.” The new company expects to release its first product next spring. Writer John Cook tells us:
“Kieran Snyder, a linguistics expert who previously worked at Amazon and Microsoft’s Bing unit, and Jensen Harris, who spent 16 years at Microsoft, including stints running the user experience team for Windows 8, have a formed a new data visualization startup by the name of Textio.
“The Seattle company’s tagline: ‘Turn business text into insights.’ The emergence of the startup was first reported by Re/code, which noted that the Textio tool could be used by companies to scour job descriptions, performance reviews and other corporate HR documents to uncover unintended discrimination. In fact, Textio was formed after Snyder conducted research on gender bias in performance reviews in the tech industry.”
That is an interesting origin, especially amid the discussions about gender that currently suffuse the tech community. Textio sees much room for improvement in text analytics, and hopes to help clients reach insights beyond those competing platforms can divine. CEO Snyder’s doctorate and experience in linguistics and cognitive science should give the young company an edge in the competitive field.
Cynthia Murrell, November 06, 2014
July 21, 2014
The article titled Text Analytics Company Linguamatics Boosts Enterprise Search with Semantic Enrichment on MarketWatch discusses the launch of 12E Semantic Enrichment from Linguamatics. The new release allows for the mining of a variety of texts, from scientific literature to patents to social media. It promises faster, more relevant search for users. The article states,
“Enterprise search engines consume this enriched metadata to provide a faster, more effective search for users. I2E uses natural language processing (NLP) technology to find concepts in the right context, combined with a range of other strategies including application of ontologies, taxonomies, thesauri, rule-based pattern matching and disambiguation based on context. This allows enterprise search engines to gain a better understanding of documents in order to provide a richer search experience and increase findability, which enables users to spend less time on search.”
Whether they are spinning semantics for search, or if it is search spun for semantics, Linguamatics has made their technology available to tens of thousands of users of enterprise search. Representative John M. Brimacombe was straightforward in his comments about the disappointment surrounding enterprise search, but optimistic about 12E. It is currently being used by many top organizations, as well as the Food and Drug Administration.
Chelsea Kerwin, July 21, 2014
May 5, 2014
SAS is a well-recognized player in IT game as a purveyor of data, security, and analytics software. In modern terms they are a big player in big data and in order to beef up their offerings we caught word that SAS had updated its Text Miner. SAS Text Miner is advertised as a way for users to not only harness information in legacy data, but also in Web sites, databases, and other text sources. The process can be used to discover new ideas and improve decision-making.
SAS Text Miner a variety of benefits that make it different from the standard open source download. Not only do users receive the license and tech support, but Text Miner offers the ability to process and analyze knowledge in minutes, an interactive user interface, and predictive and data mining modeling techniques. The GUI is what will draw in developers:
“Interactive GUIs make it easy to identify relevance, modify algorithms, document assignments and group materials into meaningful aggregations. So you can guide machine-learning results with human insights. Extend text mining efforts beyond basic start-and-stop lists using custom entities and term trend discovery to refine automatically generated rules.”
Being able to modify proprietary software is a deal breaker these days. With multiple options for text mining software, being able to make it unique is what will sell it.
April 27, 2014
I read “Algorithm Distinguishes Memes from Ordinary Information.” The article reports that algorithms can pick out memes. A “meme”, according to Google, is “an element of a culture or system of behavior that may be considered to be passed from one individual to another by nongenetic means, especially imitation.” The passage that caught my attention is:
Having found the most important memes, Kuhn and co studied how they have evolved in the last hundred years or so. They say most seem to rise and fall in popularity very quickly. “As new scienti?c paradigms emerge, the old ones seem to quickly lose their appeal, and only a few memes manage to top the rankings over extended periods of time,” they say.
The factoid that reminded me how far smart software has yet to travel is:
To test whether these phrases are indeed interesting topics in physics, Kuhn and co asked a number of experts to pick out those that were interesting. The only ones they did not choose were: 12. Rashba, 14. ‘strange nonchaotic’ and 15. ‘in NbSe3′. Kuhn and co also checked Wikipedia, finding that about 40 per cent of these words and phrases have their own corresponding entries. Together this provides compelling evidence that the new method is indeed finding interesting and important ideas.
Systems produce outputs that are not yet spot on. I concluded that scientists, like marketers, like whizzy new phrases and ideas. Jargon, it seems, is an important part of specialist life.
Stephen E Arnold, April 27, 2014
April 23, 2014
Small time analytics isn’t really as startup-y as people may think anymore. These companies are in high demand and are pulling in some serious cash. We discovered just how much and how serious from a recent Cambridge Science Park article, “Cambridge Text Analytics Linguamatics Hits $10m in Sales.”
According to the story:
Linguamatics’ sales showed strong growth and exceeded ten million dollars in 2013, it was announced today – outperforming the company’s targeted growth and expected sales figures. The increased sales came from a boost in new customers and increased software licenses to existing customers in the pharmaceutical and healthcare sectors. This included 130 per cent growth in healthcare sales plus increased sales in professional services.
This earning potential has clearly grabbed the attention of investors. This, is feeding a cycle of growth, which is why the Linguamaticses of the world can rake in impressive numbers. Just the other day, for example, Tech Circle reported on a microscopic Mumbai big data company that landed $3m in investments. They say it takes money to make money and right now, the world of big data analytics has that cycle down pat. It won’t last forever, but it’s fun to watch as it does.
Patrick Roland, April 23, 2014
March 11, 2014
Butler Analytics collected a list of “20+ Text Analytics Platforms” that delve through the variety of text analytics platforms available and what their capabilities are. According to the list, text analytics has not reached its full maturity yet. There are three main divisions in the area: natural language processing, text mining, and machine learning. Each is distinct and each company has their own approach to using these processes:
“Some suppliers have applied text analytics to very specific business problems, usually centering on customer data and sentiment analysis. This is an evolving field and the next few years should see significant progress. Other suppliers provide NLP based technologies so that documents can be categorized and meaning extracted from them. Text mining platforms are a more recent phenomenon and provide a mechanism to discover patterns that might be used in operational activities. Text is used to generate extra features which might be added to structured data for more accurate pattern discovery. There is of course overlap and most suppliers provide a mixture of capabilities. Finally we should not forget information retrieval, more often branded as enterprise search technology, where the aim is simply to provide a means of discovering and accessing data that are relevant to a particular query. This is a separate topic to a large extent, although again there is overlap.”
Reading through the list shows the variety of options users have when it comes to text analytics. There does not appear to be a right or wrong way, but will the diverse offerings eventually funnel
down to few fully capable platforms?
February 12, 2014
I read “Gödel, Escher, Bach: An Eternal Golden Braid” in 1999 or 2000. My reaction was, “I am glad I did not have Dr. Douglas R. Hofstadter critiquing my lame work for the PhD program at my university. Dr. Hofstadter’s intellect intimidated me. I had to look up “Bach” because I knew zero about the procreative composer of organ music. (Heh, heh)
Imagine my surprise when I read “Why Watson and Siri Are Not Real AI” in Popular Mechanics magazine. Popular Mechanics is not my first choice as an information source for analysis of artificial intelligence and related disciplines. Popular Mechanics explains saws, automobiles, and gadgets.
But there was the story, illustration with one of those bluish Jeopardy Watson photographs. The write up is meaty because Popular Mechanics asked Dr. Hofstadter questions and presented his answers. No equations. No arcane references. No intimidating the fat, ugly grad student.
The point of the write up is probably not one that IBM and Apple will like. Dr. Hofstadter does not see the “artificial intelligence” in Watson and Siri as “thinking machines.” (I share this view along with DARPA, I believe.)
Here’s a snippet of the Watson analysis:
Watson is basically a text search algorithm connected to a database just like Google search. It doesn’t understand what it’s reading. In fact, read is the wrong word. It’s not reading anything because it’s not comprehending anything. Watson is finding text without having a clue as to what the text means. In that sense, there’s no intelligence there. It’s clever, it’s impressive, but it’s absolutely vacuous.
I had to look up vacuous. It means, according to the Google “define” function: “having or showing a lack of thought or intelligence; mindless.” Okay, mindless. Isn’t IBM going to build a multi-billion dollar a year business on Watson’s technology? Isn’t IBM delivering a landslide business to the snack shops adjacent its new Watson offices in Manhattan? Isn’t Watson saving lives in Africa?
The interview uses a number of other interesting words; for example:
Yet my favorite is the aforementioned—vacuous.
Please, read the interview in its entirety. I am not sure it will blunt the IBM and Apple PR machines, but kudos to Popular Mechanics. Now if the azure chip consultants, the failed Webmasters turned search experts, and the MBA pitch people would shift from hyperbole to reality, some clarity would return to the discussion of information retrieval.
Stephen E Arnold, February 11, 2014
February 9, 2014
Search and content processing vendors are innovating for 2014. The shift from a back office function like scanning to searching and then “solutions” is a familar path for companies engaged in information retrieval.
I read a 38 page white paper explaining a new angle—fraud triangle analytics. You can get a copy of the explanation by navigating to http://bit.ly/1o6YpnXi and going through the registration process.
The ZyLab concept is that three factors usually surface when fraud exists. These are a payoff, an opportunity, and “ the mindset of the fraudster that justifies them to commit fraud.”
ZyLab’s system uses content analytics, discovery, sentiment analysis, metatagging, faceted search, and visualization to help the analyst chase down the likelihood of fraud. ZyLab weaves in the go-to functions for attorneys from its system. Four case examples are provided, including the Enron matter.
Unlike some search vendors, ZyLab is focusing on a niche. Law enforcement is a market that a number of companies are pursuing. A number of firms offer similar tools, and the competition in this sector is increasing. IBM, for example, has products that perform or can be configured to perform in a somewhat similar manner.
IBM has the i2 product and may be in the process of acquiring a company that adds dramatic fraud detection functionality to the i2 product. This rumored acquisition adds content acquisition different from traditional credit card statements and open source content (little data or big data forms).
As some commercial markets for traditional search and content processing, some vendors are embracing the infrastructure or framework approach. This is a good idea, and it is one that has been evident since the days of Fulcrum Technologies’ launch and TeraText’s infrastructure system. Both date from the 1980s. (My free analysis of the important TeraText system will appear be available on the Xenky.com Web site at the end of this month.)
At ZyLab, search is still important, but it is now a blended set of software package with the FTA notion. As the world shifts to apps and predictive methods, it is interesting to watch the re-emergence of approaches popular with vendors little known by some today.
Stephen E Arnold, February 9, 2014
February 7, 2014
What do you make of this headline from All Analytics: “Text And The City: Municipalities Discover Text Analytics”? Businesses have been using text mining software for awhile and understand the insights it can deliver to business decisions. The same goes for law firms that must wade through piles of litigation. Are governments really only catching onto text mining software now?
The article reports on several examples where municipal governments have employed text mining and analytics. Law enforcement agencies are using it to identify key concepts to deliver quick information to officials. The 311 systems, known as the source of local information and immediate contact with services, is another system that can benefit from text analytics, because it can organize and process the information faster and more consistently.
There are many ways text analytics can be helpful to local governments:
“Identifying root causes is a unique value proposition for text analytics in government. It’s one thing to know something happened — a crime, a missed garbage collection, a school expulsion — and another to understand where the problem started. Conventional data often lacks clues about causes, but text reveals a lot.”
The bigger question is will local governments spend the money on these systems? Perhaps, but analytic software is expensive and governments are pressured to find low-cost solutions. Expertise and money are in short supply on this issue.
Whitney Grace, February 07, 2014