March 6, 2014
If you are currently conducting research on natural language processing software, but have come to a halt in resources, we located Connexor’s “NLP Library.” Connexor is a company that develops text analysis software components, solutions, and services. They are experts in their line of work and are keen to help people utilize their data to its full extent. Connexor explains that:
“Connexor components have turned out to be necessary in many types of software products and solutions that need linguistic intelligence in text analytics tasks. We work with software houses, service providers, system integrators, resellers and research labs, in the fields of education, health, security, business and administration. We have customers and partners in over 30 countries.”
The company’s NLP Library includes bibliographic citations for articles. We can assume that Connexor employees wrote these articles. They range on a variety of subjects dealing with natural language processing, text evaluation, and they even touch on emotion extraction from text. These articles are a handy resource, especially if you need up to date research. There is only one article for 2014, but the year is still young and more are probably on the way.
February 12, 2014
The presentation on slideshare titled Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis discusses the work by Digital Reasoning and Paragon Science. Digital Reasoning asserts that it is an Oracle for human language data. There are color-coded sentences that illustrate the abilities of Natural Language Processing, from recognizing people and location words to entities related to a single concept and associated entities. The presentation consists of many equations, but the overview explains,
“In this presentation, O’Reilly author and Digital Reasoning CTO Matthew Russell along with Dr. Steve Kramer, founder and chief scientist at Paragon Science, discuss how Digital Reasoning processed the Enron corpus with its advanced Natural Language Processing (NLP) technology – effectively transforming it into building blocks that are viable for data science. Then, Paragon Science used dynamic graph analysis inspired from particle physics to tease out insights from the data..”
Ultimately the point of the entire process was to gain a better understanding of how the Enron catastrophe could be avoided in other enterprises. It is difficult to say whether Digital Reasoning is imitating IBM Watson or if IBM Watson is imitating Digital Reasoning. At any rate it sound familiar, didn’t Autonomy, TeraText, and other firms push into this sector decades ago?
Chelsea Kerwin, February 12, 2014
February 3, 2014
Machine translation can be a wonderful thing, but one key language has garnered less consideration than other widely-used languages. Though both Google and Babylon have made good progress [pdf] on Arabic translation, folks at The Stanford Natural Language Processing Group know there is plenty of room for improvement. These scientists are working close that gap with their Arabic Natural Language Processing project.
The page’s overview tells us:
“Arabic is the largest member of the Semitic language family and is spoken by nearly 500 million people worldwide. It is one of the six official UN languages. Despite its cultural, religious, and political significance, Arabic has received comparatively little attention by modern computational linguistics. We are remedying this oversight by developing tools and techniques that deliver state-of-the-art performance in a variety of language processing tasks. Machine translation is our most active area of research, but we have also worked on statistical parsing and part-of-speech tagging. This page provides links to our freely available software along with a list of relevant publications.”
The page holds a collection of useful links. There are software links, beginning with their statistical Stanford Arabic Parser. There are also links to eight papers, in pdf form, that either directly discuss Arabic or use it as an experimental subject. Anyone interested in machine translation may want to bookmark this helpful resource.
Cynthia Murrell, February 03, 2014
December 20, 2013
Partnerships develop when companies each possess a strength and then combine forces to build a beneficial relationship. The CogBlog, Cognition’s Semantic NLP Blog, announced a new relationship in the post, “Cognition To Power Grabbit’s Online Recommendation Engine.” Cognition is a leading name in semantic analysis and language process and Grabbit is the developer of a cloud-hosted suite of Web services. Together they have formed a strategic partnership that will combine Cognition’s natural language processing technology with Grabbit’s patent-pending system for making online recommendations of products, content, and people. The idea behind pairing the two technologies is that the semantic software would analyze social media content and then Grabbit’s software would then make product recommendations based on the data.
The article states:
“Cognition provides a powerful set of semantic tools to power Grabbit’s new web services. The scope of Cognition’s Semantic Map is more than double the size of any other computational linguistic dictionary for English, and includes more than ten million semantic connections that are comprised of semantic contexts, meaning representations, taxonomy and word meaning distinctions. The Map encompasses over 540,000 word senses (word and phrase meanings); 75,000 concept classes (or synonym classes of word meanings); 8,000 nodes in the technology’s ontology or classification scheme; and 510,000 word stems (roots of words) for the English language. Cognition’s lexical resources encode a wealth of semantic, morphological and syntactic information about the words contained within documents and their relationships to each other. These resources were created, codified and reviewed by lexicographers and linguists over a span of more than 25 years.”
Why do I get the feeling that online shopping is going to get even more complicated? Personal qualms aside, Cognition and Grabbit are not the first companies that come to mind when it comes to social media analytics and e-commerce. This partnership is not the first endeavor to cash in on Internet sales.
Whitney Grace, December 20, 2013
December 10, 2013
Natural language processing software is a boon to physicians who are required to keep immaculate documentation. Hispanic Business reports that the “Huntsman Cancer Institute uses Linguamatics I2E To Automatically Extract Insights From Clinical Pathology Documents.” The Huntsman Cancer Institute (HCI) is located at the University of Utah. By using the Linguamatics I2E natural language processing software, HCI will turn its unstructured data in EMRs into actionable information to conduct better research and seek new insights in cancer treatments and outcomes.
The article states:
“HCI is using Linguamatics I2E with its in-house clinical informatics infrastructure to extract discrete data from the unstructured text contained in surgical, pathology, radiology, and clinical notes related to hematology oncology disease areas such as Leukemia and Lymphoma. The resulting data is loaded into an integrated biobanking, clinical research, and genomic annotation platform. This enables HCI’s clinicians and principal investigators to harness the richest possible set of data for research into patient outcomes, comparative effectiveness, and genetic drivers of disease. Analysis at this scale can find information that would often be missed when reading documents one at a time. In addition HCI has a better range and quality of data to support clinical trial matching and increase numbers of patients on trials.”
There is a wealth of medical information available in unstructured data and it is one of the biggest markets for big data. Medical professionals spend hours studying patient records. The I2E gives medical professionals analytics that frees their time, improves research processes, and patient outcomes.
Whitney Grace, December 10, 2013
October 18, 2013
For those who know the open-source programming language Ruby, NLP is a script away. Sitepoint shares some basic techniques in, “Natural Language Processing with Ruby: N-Grams.” This first piece in a series begins at the beginning; developer Nathan Kleyn writes:
“Natural Language Processing (NLP for short) is the process of processing written dialect with a computer. The processing could be for anything – language modeling, sentiment analysis, question answering, relationship extraction, and much more. In this series, we’re going to look at methods for performing some basic and some more advanced NLP techniques on various forms of input data. One of the most basic techniques in NLP is n-gram analysis, which is what we’ll start with in this article!”
Kleyn explains his subject clearly, with plenty of code examples so we can see what’s going on. He goes into the following: what it means to split strings of characters into n-gram chunks; selecting a good data source (he sends readers to the comprehensive Brown Corpus); writing an n-gram class; extracting sentences from the Corpus; and, finally, n-gram analysis. The post includes links to the source code he uses in the article.
In the next installment, Kleyn intends to explore Markov chaining, which uses probability to approximate language and generate “pseudo-random” text. This series may be just the thing for folks getting into, or considering, the natural language processing field.
Cynthia Murrell, October 18, 2013
September 27, 2013
The article titled “Multimodal Natural Language Interface for Faceted Search” In Patent Application Approval Process on Hispanic Business reveals that inventors in California have applied for a patent of their natural language interface. The inventors are quoted in the article as claiming that the problem of users implementing a “successful query” revolves around an issue of transparency in the criteria of the search being held. The inventors, Farzad Ehsani, Silke Maren Witt-Ehsani filed their patent application in February of 2013 and the patent was made available online early in September of 2013. The article states,
“Solving this problem requires an interface that is natural for the user while producing validly formatted search queries that are sensitive to the structure of the data, and that gives the user an easy and natural method for identifying and modifying search criteria. Ideally, such a system should select an appropriate search engine and tailor its queries based upon the indexing system used by the search engine. Possessing this ability would allow more efficient, accurate and seamless retrieval of appropriate information.”
This quote from the inventors continues on to address the current methods which do not meet the expectations of users in terms of selecting the best search engine and data repository as well as not formulating the search query in the appropriate manner.
Chelsea Kerwin, September 27, 2013
May 29, 2013
Why Are We Still Waiting for Natural Language Processing, an article on The Chronicle of Higher Education, explores the failure of the 21st century to produce Natural Language Processing, or NLP. This would mean the ability of computers to process natural human language. The steps required are explained in the article,
“ In the 1980s I was convinced that computers would soon be able to simulate the basics of what (I hope) you are doing right now: processing sentences and determining their meanings.
To do this, computers would have to master three things. First, enough syntax to uniquely identify the sentence; second, enough semantics to extract its literal meaning; and third, enough pragmatics to infer the intent behind the utterance, and thus discerning what should be done or assumed given that it was uttered.”
Currently, typing a question into Google can result in exactly the opposite information from what you are seeking. This is because it is unable to infer, since natural conversation is full of gaps and assumptions that we are all trained to leap through without failure. According to the article, the one company that seemed to be coming close to this technology was Powerset in 2008. After making a deal with Microsoft, however, their site now only redirects to Bing, a Google clone. Maybe NLP like Big Data, business intelligence, and predictive analytics is just a buzzword with marketing value.
Chelsea Kerwin, May 29, 2013
November 21, 2012
I continue to learn about companies with high-value content processing technologies. The challenge in real-time translation, if one believes the Google marketing, is now in “game over” mode. The winner, of course, is Google. Other firms can head to the showers and maybe think about competing in another business sector.
But some of that Google confidence may be based on assumptions about Google’s language processing expertise, not more recent systems and methods. I know. This is “burn at the stake” information to a Googler.
However, I saw a demonstration which made clear to me that Google’s “kitchen sink” approach to figuring out how to handle speech input and near real time translation may not be in step with other firm’s approaches. The company with some quite interesting translation technology and a commitment to easy integration is IMT Holdings. The privately held company’s product is Rosoka.
IMT Holdings, Corp. was founded in 2007. Our background is in US government contracting. In the course of the firm’s work, Mr. Sorah saw that the existing NLP or Natural Language Processing (NLP) tools were not able to handle the volumes and complexities of the data they needed to process. In December of 2011, IMT began actively marketing its NLP technology.
I was able after some telephone tag and email to interview Mike Sorah, one of the co-founders of IMT and one of the wizards behind the Rosoka technology.
Mr. Sorah told me:
Many of the existing NLP tools claim to be multilingual, but what they mean is that they have linguistic knowledge bases usually acquired from vendors who provide dictionaries and libraries that make NLP an issue for many licensees. But most of the NLP system don’t process documents that contain English and Chinese or English and Spanish. In the world of our clients, mixed language documents are important. These have to be processed as part of the normal stream, not put in an exception folder and maybe never processed or processed after a delay of hours or days.
The Rosoka system is different from other NLP and translation systems on the market at this time. He asserted:
In most multilingual NLP systems, the customer needs to know before they process the document what language the document is so they can load the appropriate language-specific knowledge base. What we did via our proprietary Rosoka algorithms was to take a multilingual look at the world. Our system automatically understands that a document may be in English or Chinese, or even English and Spanish mixed. The language angle is huge. We randomly sample Twitter stream and have been tweeting the top 10 languages of the week are. English varies between 35 to 45% of the tweets. Every language that Rosoka can process is included. Our multilingual support is not not sold as separate, add-on functionality.
You can read the full text of the interview with Mike Sorah in the ArnoldIT.com Search Wizards Speak series at this link. More information about IMT and Rosoka is available from the firm’s Web site, http://www.imtholdings.com.
Stephen E. Arnold, November 21, 2012
August 14, 2012
The New York Times has published an extensive account of the natural-language tragedy, “Goldman Sachs and the $580 Million Black Hole.” The five page article is a very interesting read. The gist, though, is simple enough: Goldman Sachs failed to look out for their client’s best interests. What a surprise.
You have probably heard of the natural language software NaturallySpeaking, developed by Dragon Systems. Dragon Systems is, at heart, the enterprising Jim and Janet Baker, who spent almost twenty years building their innovative software and their company. In fact, their work is considered to have advanced speech technology much faster than anyone expected. Some of it might even have made its way into Apple’s Siri.
When it came time to reap their rewards, the pair turned to Goldman Sachs for advice on the over-half-billion-dollar deal. Back in 1999, it still seemed like a good idea to trust the prominent investment firm. It wasn’t. Reporter Loren Feldman summarizes the trouble:
“With Goldman Sachs on the job, the corporate takeover of Dragon Systems in an all-stock deal went terribly wrong. Goldman collected millions of dollars in fees — and the Bakers lost everything when Lernout & Hauspie was revealed to be a spectacular fraud. . . . Only later did the Bakers learn that Goldman Sachs itself had at one point considered investing in L.& H. but had walked away after some digging into the company.
“This being Wall Street, a lot of money is now at stake. In federal court in Boston, the Bakers are demanding damages, including interest and legal fees, that could top $1 billion.”
Not only did Goldman direct their own dollars away from L.& H., the suit alleges, they also failed to scrutinize L.& H. for their client when Dragon’s CFO pointed out troubling signs. I turns out that the person in charge of such investigations had left Goldman and not been replaced. Oops. That didn’t keep Goldman from keeping the $5 million consultation fee. Naturally.
Meanwhile, companies who picked up pieces of the Bakers’ technology at auction after L.& H. fell have gone on to develop them into lucrative commodities. The couple was left with neither their invention nor any fraction of the money it was worth.
The case is expected to be decided sometime this November. Feldman burrowed into the wealth of legal filings surrounding the case to craft this article. He has found eye-opening insights into Goldman Sachs’ culture and practices. The piece is worth reading for that reason alone.
It is also a moving tale about a tech- and language-savvy couple who put in the time, effort, passion, and smarts to build their business, and who are now fighting to regain what is rightfully theirs. I wish them luck.
Cynthia Murrell, August 14, 2012