NLP Download
November 15, 2010
Short honk. Want to explore NLP or natural language processing. The search engine and content processing marketing wizards chatter about NLP as often as telemarketers call me. Frequently, gentle reader. You can snag the software at Sofotex here. The Sofotex description says:
LJParser Nature Language Processing is a middleware by LING-JOIN Software is for natural language understanding and web search. LJParser provides powerful modules including precise search text, new words detection, Chinese word segmentation, language modeling and term translation, text clustering, text categorization, text summarization, keywords extraction and duplication detection, which can download and install on Windows, Linux.
We will fire up this puppy next week, but I am urging the goslings to get in gear.
Stephen E Arnold, November 15, 2010
Freebie
InQuira Inks Deal with Zebra
October 11, 2010
I learned that Zebra Technologies has selected InQuira’s natural language processing technology to enhance Zebra’s “knowledge solutions.” InQuira bills itself as a leading provider of self service and contact center support systems. Zebra’s business involves the design, manufacture, sales and supports a range of direct thermal and thermal transfer label printers, radio frequency identification (RFID) printer/encoders, dye sublimation card printers, and software.
According to CRM Marketplace,
InQuira for Web Self Service will allow Zebra to deliver customized and accurate results to partner queries initiated on its website any time of the day. Additionally, Zebra will utilize InQuira for Contact Centers to help increase agent productivity, lower training costs and improve the accuracy and satisfaction of every partner interaction.
InQuira, based in San Bruno, Calif., is an established vendor of natural language processing technology. A visitor to an InQuire-based support system can type a question in normal colloquial form. The system will parse the query, understand the user’s meaning, and display relevant information from the processed content.
The company was founded in 2002. My recollection is that two firms merged to create InQuira. I think one company was Answerfriend and the other was Electric Knowledge. In the last eight years, the engineers have supplemented search with work flow, authoring, analytics, and a feedback function.
The company was of interest to me because it was one of the first to take two search and content centric vendors, merge them, and create what appears to me a successful business. For more information about InQuira, navigate to www.inquira.com.
Stephen E Arnold, October 11, 2010
Freebie
Language Computer: Why Now for Swingly and Extractiv
September 2, 2010
I did some fooling around on the Language Computer Corp. Web site. The PR blitz is on for Swingly, the question-answering service that was featured in blogs and on the quite remarkable podcast hosted by Jason Calacanis. I listened to the Swingly segment but exited once that interview concluded. Instead of wallowing in the “ask a question, get an answer” just like Ask.com, Yahoo Answers, Mahalo, Quora, Aardvark, and others, I thought I would navigate to the Overflight archive and check out the Web site. The first thing I noted was that a click on the WebFerret button now renamed “Ferret” returned a 404 error. Okay. So much for that. I then punched the entity recognition demo which I had also examined a while ago. More luck there, but I had to dismiss an “invalid security certificate,” which I supposed would have been a deal breaker for the Steve Gibson types visiting the Language Computer Web site.
I uploaded one of my for-fee columns to CiceroLite ML.. The system accepted the file, stripped out the Word craziness, and invited me to process the file. I punched the “process” button. The system highlighted the different entities. What’s important is that Language Computer has for at least eight or nine years performed at or near the top of the heap on various US government tests of content processing systems. Here’s what the marked up text looked like. Each color represents a different type of entity. For example, red is an organization, blue a person, etc.
In operational use, the tagged entities are written to a file, not embedded in a document. But for demo purposes, it makes it easy to see that Language Computer did a pretty good job. Entity extraction is a big deal for some types of content activities. I find a tally of how many times an entity appears in a document quite useful. The big chunk of work, in my opinion, is mapping entities to synonyms and then to people and places. It’s great to know the entities in a document, but it is even more great to have these items hooked together. I quite like the ability to click and see the entities in the source document.
Language Computer Corporation has been around since 1995. It has an excellent reputation, and, like other next generation content processing systems, has been used by specialists in quite specific niche markets. I won’t name these, but you can figure out what outfits are interested in:
- Entity recognition
- Event time stamping
- Sentiment tracking
- Document summarization.
The plumbing for these industrial-strength applications is what makes Swingly.com work. Swingly.com is a demo of the Language Computer question answering function. In my opinion, I am not likely to do much typing or speaking of questions into a search box or device. I type queries and I shout into a phone, often with considerable enthusiasm. (I hate phones.)
If you want to explore the Language Computer function to turn Web content (heterogeneous and semi-structured content) into structured data, navigate to www.extractiv.com. You will need to register. In order to use the service you have to create a content job, perform some steps, and then know what the heck you are looking at. The system works.
The larger issue to consider is, “Why are companies like Language Computer, Fetch Technologies, JackBe, and others from the niche government markets suddenly bursting into the broader enterprise and consumer sector?”
The pundits have not tackled this question. Most of the Swingly.com write ups are content to beat on the Q&A drum. I don’t think question answering is a mass market service except on devices that allow me to talk. In short, the Web angle is silly. So I am at odds with the azurini. I don’t care too much about English majors and journalists who are experts in search and content processing. Feel free to fall in love. Just brush up on your Shakespeare because the plumbing in systems like Language Computer’s will mean zero to this crowd.
NLP-Based Service Swingly Now at Bat
August 30, 2010
The new service Swingly strives to offer web enthusiasts one of the first web answer engines. The service works by taking text from a variety of sources on the web which could include social media or news articles and compiling them in web databases that can be used for answering questions. Semantic Web gives information about the new service in “NLP-Based Service Swingly’s Up At Bat.” The service will rely on NLP to properly extract as well as index all of the information gathered from documents. In addition the service will rely on semantic inference techniques in order to “recognize links between questions and answers building a page rank style graph to quickly identify authoritative content.” The impressiveness of Swingly surrounds the ability to understand the semantics or meaning of a large amount of available information on the Net. Users receive better results in less time. The importance of semantics in web searches has never been clearer.
April Holmes, August 30, 2010
Freebie
Java with NLP?
August 14, 2010
Jeff’s Search Engine Caffe: Java Open Source NLP and Text Mining Tools is a mother lode of Java open-source natural language processing and test mining tools. Jeff is a PhD student at UMass Amherst’s prestigious Center for Intelligent Information Retrieval and maintains a blog, which is so well-researched, it can serve as a reference point. Jeff’s site features a link to an interesting Apache Lucene Mahout project, which is designed to create highly scalable machine learning libraries. Currently, Mahout specializes in recommendation mining, clustering, classification, and item set mining. The Mahout site welcomes contributors and looks to facilitate discussions on the project and realize potential use cases. One of the most popular text classification frameworks is Weka, a collection of machine learning algorithms.
This site contains many useful links to incubator and implemented projects, and is worth a bookmark here in Harrod’s Creek.
Bret Quinn, August 14, 2010
Is Q-Go a Yugo?
August 9, 2010
Last week, I received a call from a fancy pants MBA about NLP or natural language processing. NLP seems to be a new opportunity. NLP has been around a while, and like the formerly hot notion “taxonomy” and “semantics”, NLP is in vogue. The question concerned a company I knew about, Q-Go. I dipped into my Overflight service and realized that the company had gone quiet. In some cases, “going quiet” is a prelude to either a massive investment like Palantir’s $90 million or closing up shop like Convera did earlier this year.
Q-Go provides an application aimed at redefining customers’ web searching experiences. Research indicates that a growing number of customers are sick of turning to search engines for answers because they get responses with millions of unrelated websites.
According to their website, “31 percent of users are unhappy with their online interaction with web sites and 70% are unable to find what they are looking for.” And Q-Go asserts that it has the answer for the airline, financial services, and telecommunication industries. Q-Go reduces customer service issues by providing a search application that can more successfully interpret the meaning behind user questions—in all major Western languages. The approach sounds like InQuira’s.
The result? Fewer customer service calls, lowered costs, and higher conversion rates. It almost sounds too good to be true. With a guaranteed six-month return on investment, the only downside I see is that there are still some languages Q-Go can’t work with. But I’m guessing that will eventually change if the company avoids the “quiet” state.
Stephen E Arnold, August 12, 2010