January 4, 2012
As if to continue trying to prove that it can do anything, “IBM’s Watson to Help Doctors Diagnose, Treat Cancer,” reports eWeek. The AI supercomputer will be working with the Cedars-Sinai cancer center and insurance company WellPoint to evaluate cancer treatment options. Writer Brian T. Horowitz explains:
Using its data analytics and NLP [Natural Language Processing] capabilities, Watson would integrate data such as medical literature, patient histories, clinical trials, side effects and outcomes data to help doctors decide on courses of treatment. . . . Watson would also look at the characteristics of a patient’s cancer and make recommendations on cost-effective treatment that would lead to the best outcome.
Of course, this advice would not replace that of a doctor, but it could become a valuable tool. Other health care organizations have been turning to technology for solutions. For example, Dell just donated an entire cloud infrastructure to the Translational Genomics Research Institute for storing medical trial data on pediatric cancer.
Good to see technology being used for the good of humanity, right? We would like to see IBM put Watson up on a test corpus for the public to use. Wishful thinking I suppose.
Cynthia Murrell, January 4, 2012
Sponsored by Pandia.com
November 30, 2011
I learned from one of my two or three readers that Barcelona was home to a natural language processing company. Several years ago, I spoke with a person familiar with the company Artificial Solutions. After a bit of fumbling around, I located a trade show at which the company was exhibiting. The company’s NLP system is called “Teneo.” The application which I recalled was the use of the NLP system for customer support. The company has expanded since I first learned about the firm. The technology has been applied to mobile devices, for example.
The company told me:
Teneo Mobile is a platform independent technology designed to enable companies, organizations, manufacturers and developers to create their own virtual assistant as a mobile app, regardless of platform, mobile device and even language. The Natural Language Interaction (NLI) engine is covered by patents. The system can currently be built in up to 21 different languages, including Mandarin and Russian.
The company, founded in 2001, is owned by its founders, the private equity fund Scope Growth II and some private investors. The company has tallied more than 200 projects in the public and private sector in 30 countries and 21 languages. In the telecommunications sector, the firm’s customers include:
The firm’s technology is based on the Teneo Interaction Engine. According to the firm, its system will:
reason like a human, using advanced linguistic and business rules to decide how best to respond to your customer’s request. Context comes into play here, such as time, date and place, as well as information picked up from previous conversations, customer data retrieved from your CRM system and transaction data from your ERP system. At this point, the Teneo will also eliminate any ambiguities from its initial analysis. Even one word can alter the meaning of a customer’s request. Teneo will instantly and dynamically re-assess content as the interaction develops, to understand what has changed and give the right answers. Natural language is full of subtle nuances, which Teneo is able to pick up and interpret. It understands idiom and slang, even dialect and SMS shorthand – and it’s also sympathetic to grammar, syntax or spelling mistakes.
For more information about the technology and its vertical applications, navigate to www.artificial-solutions.com.
Stephen E Arnold, November 30, 2011
Sponsored by Pandia.com
September 29, 2011
Editor’s Note: The Beyond Search team invited Craig Bassin, president of EasyAsk, a natural language processing specialist and search solution provider to provide his view of the market for next generation search systems. For more information about EasyAsk, navigate to www.easyask.com
This past February I watched, along with millions of others, IBM’s spectacular launch of Watson on Jeopardy! Watson was IBM’s crowning achievement in developing a Natural Language based solution finely tuned to compete, and win, on Jeopardy.
By IBM’s own estimates they invested between $1 and $2 billion to develop Watson. IBM ranks Watson as one of the 3 most difficult challenges in their long and successful history, along with spectacular accomplishments such as the Deep Blue chess program and the Blue Gene, Human Genome mapping program. Rarified air, indeed.
While many were watching to see if a computer could defeat human players my interests were different. Watson was about to introduce natural language solutions to the broader public and show the world that such solutions are truly the wave of the future.
The results were historic. Watson soundly defeated the human competitors. On the marketing side, IBM continues to spend hundreds of millions of dollars to tell the world that the time for natural language is now.
IBM is not the only firm to bring natural language processing (NLP) into the application mainstream:
- Microsoft acquired Powerset, a small company with strong NLP technology, to create Bing and compete head-on with Google,
- Yahoo, one of the original Internet search companies, found Bing compelling enough to strike an OEM agreement with Microsoft and make Bing Yahoo’s search solution,
- Apple acquired a linguistic natural language interface tool called Siri, which is now being incorporated into the Mac and iPhone operating systems,
- Oracle Corporation bought Inquira for its NLP-based customer support solution,
- RightNow Technologies similarly acquired Q-Go, a Dutch company also providing NLP-based customer support solutions.
Many companies are now positioning themselves as natural language tools and have expanded the once tight definition of NLP to include things such as analyzing text to understand intent or sentiment. This is the impact of Watson – it has put natural language into the mainstream and many organizations want to ride the marketing current driven by Watson regardless of closely aligned their technology is with Watson.
But let’s also look at Watson for what it really is – one of the most expensive custom solutions every built. Watson required an extremely large (and expensive) cluster of computers to run – 90 Power Server 750 systems, totaling 2,880 processor cores. It also required substantial R&D staff to build the analytics, content and natural language processing software stack. In fact, IBM didn’t come to Jeopardy, Jeopardy came to IBM. They replicated the Jeopardy set at IBM labs, placing a a great deal of horsepower underneath that stage.
The first foray of Watson into the real world will be in healthcare and the possibilities are exciting. Clearly IBM intends to focus Watson on some of the largest, most difficult challenges. But how does that help you run your business? You’re not going to see Watson running in your IT environment or on your preferred SaaS cloud anytime soon.
If Watson is focused on big problems, how can I use natural language solutions to better my business today? Perhaps you want to increase website customer conversion and user experience, better manage sales processes, deliver superior customer support, or in general, make it easier for your workers to find the right information to do their job. So where do you go?
That’s where EasyAsk comes in.
September 26, 2011
Editor’s Note: This is an article written by Tim Estes, founder of Digital Reasoning, one of the world’s leading providers of technology for entity based analytics. You can learn more about Digital Reasoning at www.digitalreasoning.com.
Most university programming courses ignore entity extraction. Some professors talk about the challenges of identifying people, places, things, events, Social Security Numbers and leave the rest to the students. Other professors may have an assignment related to parsing text and detecting anomalies or bound phrases. But most of those emerging with a degree in computer science consign the challenge of entity extraction to the Miscellaneous file.
Entity extraction means processing text to identify, tag, and properly account for those elements that are the names of person, numbers, organizations, locations, and expressions such as a telephone number, among other items. An entity can consist of a single word like Cher or a bound sequence of words like White House. The challenge of figuring out names is tough one for several reasons. Many names exist in richly varied forms. You can find interesting naming conventions in street addresses in Madrid, Spain, and for the owner of a falafel shop in Tripoli.
Entities, as information retrieval experts have learned since the first DARPA conference on the subject in 1987, are quite important to certain types of content analysis. Digital Reasoning has been working for more than 11 years on entity extraction and related content processing problems. Entity oriented analytics have become a very important issue these days as companies deal with too much data, the need to understand the meaning and not the just the statistics of the data and finally to understand entities in context – critical to understanding code terms, etc.
I want to highlight the six weaknesses of traditional entity extraction and highlight Digital Reasoning’s patented, fully automated method. Let’s look at the weaknesses.
1 Prior Knowledge
Traditional entity extraction systems assume that the system will “know” about the entities. This information has been obtained via training or specialized knowledge bases. The idea is that a system processes content similar to that which the system will process when fully operational. When the system is able to locate or a human “helps” the system locate an entity, the software will “remember” the entity. In effect, entity extraction assumes that the system either has a list of entities to identify and tag or a human will interact with various parsing methods to “teach” the system about the entities. The obvious problem is that when a new entity becomes available and is mentioned one time, the system may not identify the entity.
2 Human Inputs
I have already mentioned the need for a human to interact with the system. The approach is widely used, even in the sophisticated systems associated with firms such as Hewlett Packard Autonomy and Microsoft Fast Search. The problem with relying on humans is a time and cost equation. As the volume of data to be processed goes up, more human time is needed to make sure the system is identifying and tagging correctly. In our era of data doubling every four months, the cost of coping with massive data flows makes human intermediated entity identification impractical.
3 Slow Throughput
Most content processing systems talk about high performance, scalability, and massively parallel computing. The reality is that most of the subsystems required to manipulate content for the purpose of identifying, tagging, and performing other operations on entities are bottlenecks. What is the solution? Most vendors of entity extraction solutions push the problem back to the client. Most information technology managers solve performance problems by adding hardware to either an on premises or cloud-based solution. The problem is that adding hardware is at best a temporary fix. In the present era of big data, content volume will increase. The appetite for adding hardware lessens in a business climate characterized by financial constraints. Not surprisingly entity extraction systems are often “turned off” because the client cannot afford the infrastructure required to deal with the volume of data to be processed. A great system that is too expensive introduces some flaws in the analytic process.
September 20, 2011
Digital Reasoning empowers decision makers with timely, actionable intelligence to creating software to automatically make sense of complex data.
Our flagship product, Synthesys®, solves the problem of achieving actionable intelligence out of massive amounts of unstructured and structured text . . . A typical customer might be trying to completely understand how to locate an individual within massive amounts of reports . . . Sifting through all this data to accurately develop this profile even among misspellings, aliases, code names, etc. is typically something that can only be done by reading. Our ability to automate understanding is critical to customers with concerns about time, accuracy, completeness, or even the ability to leverage the massive amount of data they have generated.
August 30, 2011
Microsoft is making a concerted effort to tackle natural language processing with its Redmond-based Natural Language Processing Group. The Microsoft page devoted to the group highlights current and older projects, downloads, and researchers involved.
The goal of the Natural Language Processing (NLP) group is to design and build software that will analyze, understand, and generate languages that humans use naturally, so that eventually you will be able to address your computer as though you were addressing another person. This goal is not easy to reach. “Understanding” language means, among other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way.
Of particular interest are the recent publications authored by those in the group. Work includes everything from social media implementation, to multi-lingual Wikipedia content, to syntactic language modeling. The papers are well worth a read for anyone interested in the pressing field of natural language processing. Microsoft is definitely putting time and energy into the project, but it remains to be seen who of the tech giants will emerge the victor in the battle for natural language processing supremacy.
If you track NLP, including the newly minted azure chip consultants, you will want to monitor this aspect of Microsoft’s many, many search and text processing activities.
Emily Rae Aldridge, August 29, 2011
Sponsored by Pandia.com
August 29, 2011
Have you ever tried to find ink or toner for a not-so-new printer? The process can be confusing, and shoppers are unlikely to feel warm and fuzzy about any ink seller whose Web site only adds to the frustration.
One purveyor of ink and toner made a wise choice when it picked EasyAsk’s eCommerce Edition. EasyAsk asserts, “NetSuite Customer InkJet Superstore Jets Past Competitors Using EasyAsk Natural Language E-Commerce Search Software for SaaS.” The press release states,
Using EasyAsk eCommerce edition, InkJet Superstore has dramatically simplified finding the right printer cartridges and accessories, providing the easiest online experience for customers, increasing online orders and revenue. The news release said: “InkjetSuperstore.com sells toner and ink cartridges for virtually every make and model of printer, copier, and fax machine, with over 6,000 items. InkJet Superstore’s vision is clearly articulated on the company website: ‘To be the best, the easiest, the cheapest and friendliest place to buy printer accessories.’ To back this up, InkJet Superstore offers a 100% satisfaction guarantee, which includes paying for return shipping cost.
EasyAsk is helping InkJet Superstore deliver on its promises. Since the business implemented the solution, the site has had 80% fewer “no results” returns; increased order conversion rates by six percent; and decreased its phone calls and live chat requests, indicating that customers are more easily finding what they need.
The solution didn’t stop there. With their its rapidly expanding, Inkjet Superstore is taking advantage of the EasyAsk’s auto-sync feature to assimilate new products into the Web site. Furthermore, rich analytics mine customer search terms for items that are in demand, suggesting potential new products.
August 27, 2011
Watson, the IBM supercomputer, cause quite a stir earlier this year when it swept through the Jeopardy playing field, winning round after round handedly. On a more technical level, Watson may have greater implications for how unstructured data is tackled in search. Brian McKenna’s interview with Craig Rhinehart at IBM, “Watson’s natural language processing takes crack at unstructured data,” tells us more.
Craig Rhinehart said:
We think of it (Watson) as a breakthrough in computing. Unstructured information and communicating in natural language have not been well-adopted in IT terms. This technology will enable new ways to interact with computers, opening up new solutions. Natural language is very ambiguous, as opposed to data, where a five is always a five . . . In natural language, we speak in riddles, abbreviations, with pop culture references . . . But 80% plus of our information is unstructured, and we are expecting 44 times growth more in the next 10 years.
The problems encountered by natural language processing are numerous and no one seems to have a perfect solution for how to tackle all of them at once. Watson itself made an embarrassing move once or twice on Jeopardy, when it seemed to misunderstand a question. Considering the overall success that Watson had in interpreting colloquial language, it is a major breakthrough. Are we to believe that Watson is also responsible for introducing the concept of natural language processing to everyday Americans?
Emily Rae Aldridge, August 27, 2011
Sponsored by Pandia.com
August 9, 2011
There’s a new cowboy in town and he’s shaking up the search engine industry. The article, Real Language Q&A: The Next Generation of Search?, on Search Engine Journal, explores the practicality of Oren Etzioni’s recommendations for search engines in his new paper, titled, Search Needs a Shake Up, published in Nature.
According to Etzioni, current search engines have not kept up with the time. The reliance they have on old algorithms with results displayed as a list that can run into the millions is no longer practical. As the article explains,
“In Etzioni’s view, the next generation of search would abandon the “blue link” structure in favor of directly answering the questions of users. “Moving up the information food chain requires a search engine that can interpret a user’s question, extract facts from all the information on the web, and select an appropriate answer,” he states. The tricky part, though, is in finding the answer. With so many ambiguities, it’s difficult to see how most questions could be answered by a search site.”
Conveniently, Etzioni offers his own University of Washington’s Reverb program as a step in the right direction. Reverb relies on Natural Language Processing (NLP) which is an interesting direction for search engines, but depends entirely on the reliability of the user’s question.
In a world of Etzioni’s search engine, the functionally illiterate would never receive an accurate search result because the search engine would never recognize, “who be prez bamas baby mama?”
While it would be a lovely world to live in if life was as well-spoken as Jeopardy and Watson could answer our every question quickly and precisely, that is not the case and never will be. NLP works well with voice searches and should stay there. Though Etzioni poses some interesting questions and points out the while elephant in the search engine room, the answer is not as simple as NLP. At least not yet.
Catherine Lamsfuss, August 9, 2011
July 29, 2011
We learned from a source in Silicon Valley that Oracle has acquired InQuira. We noted “Oracle Buys InQuira to Boost Fusion CRM”. InQuira is an interesting search company. The firm was formed in 2002 from two semi successful search companies, Answerfriend Inc. and Electric Knowledge Inc. The company hit its stride with its positioning of “natural language search” for customer support applications. InQuira hit my radar screen when it signed a deal with Yahoo to power the Yahoo customer support service. I wrote about the upside and downside of the Yahoo implementation and then looked at InQuira every few months. You can run a query in Beyond Search and get a list of the articles I wrote to track the company’s activities since 2008. InQuira has been able to move forward despite the lemmings of search rushing into the customer service market.
According to the IDG News write up:
The company has patented NLP (natural language processing) capabilities that enable it to determine the “true intent” of a customer question, according to its website. “We expect InQuira to be the centerpiece for Oracle Fusion CRM Service,” said Anthony Lye, senior vice president of Oracle CRM, in a statement.
Our view at Beyond Search is that buying InQuira is probably a reasonable move for Oracle. The company’s Secure Enterprise Search 11g is not suited for the Fusion type of application. Oracle purchased Triple Hop, but I have lost track of that firm’s Match Point innovation within the giantness of Oracle.
Will InQuira propel Oracle forward in enterprise search in its various manifestations? My hunch is that Oracle will generate additional revenue and put pressure on the incumbents in the customer support market. Oracle may need to acquire additional search and content processing companies in order to meet the needs of the big and diverse Oracle customer base. InQuira’s approach often requires significant computational horsepower. Oracle is positioned to sell InQuira’s customers the hardware required to deliver zippy performance.
We think the notion of a giant company building a “knowledge management” solution is sort of interesting. Big companies have to buy other companies to move forward. That’s why we think Oracle may still be shopping for search and content management solutions.
Stephen E Arnold, July 29, 2011
Freebie unlike products from Oracle and InQuira.