IBM: Recycling Old Natural Language Assertions
April 6, 2017
I have ridden the natural language processing unicycle a couple of times in the last 40 years. In fact, for a company in Europe I unearthed from my archive NLP white papers from outfits like Autonomy Software and Siderean Software among others. The message is the same: Content processing from these outfits can figure out the meaning of a document. But accuracy was a challenge. I slap the word “aboutness” on these types of assertions.
Don’t get me wrong. Progress is being made. But the advances are often incremental and delivered as the subsystem level of larger systems. A good example is the remarkable breakthrough technology of Madrid, Spain-based Bitext. The company’s Deep Linguistic Analysis Platform solves a very difficult problem when an outfit like a big online service has to figure out the who, what, when, and where in a flood of content in 10, 20, or 30 or more languages. The cost of using old-school systems is simply out of reach even for companies with billion in the bank.
I read “Your Machine Used to Crunch Numbers. Now It Can Chew over What They Mean, Too.” The write up appeared in the normally factual online publication “The Register.” The story, in my opinion, sucks in IBM marketing speak and makes some interesting assertions about what Lucene, home brew scripts, and acquired technology can deliver. In my experience, “aboutness” requires serious proprietary systems and methods. Language, no matter what one believes when Google converts 400 words of Spanish into semi-okay English.
In the article I was told:
This makes sense, because the branches of AI gaining most traction today – machine learning and deep learning – typically have non-deterministic outputs. They’re “fuzzy”, producing confidence scores relating to their inputs and outputs. This makes AI-based analytics systems good at analyzing the kind of data that has sprung up since the early 2000s; particularly social media posts.
Well, sort of. There are systems which can identify from unstructured text in many languages the actor, the action, and the outcome. In addition, these systems can apply numerical recipes to identify items of potential interest to an analyst or another software systems. The issue is error rate. Many current entity tagging systems stumble badly when it comes to accuracy.
But IBM has been nosing around NLP and smart software for a long time. Do you remember Data Fountain or Dr. Jon Kleinberg’s CLEVER system? These are important, but they too were suggestive, not definitive approaches.
The write up tells me via Debbie Landers, IBM Canada’s vice president of Cognitive Solutions:
People are constantly buying security products to fix a problem or get a patch to update something after it’s already happened, which you have to do, but that’s table stakes,” he says. Machine learning is good at spotting things as they’re happening (or in the case of predictive analytics, beforehand). Their anomaly detection can surface the ‘unknown unknowns’ – problems that haven’t been seen before, but which could pose a material threat. In short, applying this branch of AI to security analytics could help you understand where attackers are going, rather than where they’ve been. What does the future hold for analytics, as we get more adept at using them? Solutions are likely to become more predictive, because they’ll be finding patterns in empirical data that people can’t spot. They’ll also become more context-aware, using statistical modeling and neural networks to produce real-time data that correlates with specific situations.
My reaction to this write up is that IBM is “constantly” thrashing for a way to make Watson-type services a huge revenue producer for IBM. From recipes to cancer, from education to ever more spectacular assertions about what IBM technology can do—IBM is demonstrating that it cannot keep up with smart software embedded in money making products and mobile services.
Is this a promotional piece? Yep, The Reg even labels it as such with this tag:
See. A promo, not fake news exactly. It is clear that IBM is working overtime with its PR firm and writing checks to get the Watson meme in many channels, including blogs.
Beyond Search wants to do its part. However, my angle is different. Look around for innovative companies engaged in smart software and closing substantive deals. Compare the performance of these systems with that of IBM’s solutions, if you can arrange an objective demonstration. Then you will know how much of IBM’s content marketing carpet bombing falls harmlessly on deaf ears and how many payloads hit a cash register and cause it to pay out cash. (A thought: A breakthrough company in Madrid may be a touchstone for those who are looking for more than marketing chatter.)
Stephen E Arnold, April 6, 2017
Comments
One Response to “IBM: Recycling Old Natural Language Assertions”
Hello, I think your website might be having
browser compatibility issues. When I look at your blog site
in Ie, it looks fine but when opening in Internet Explorer, it has some overlapping.
I just wanted to give you a quick heads up!
Other then that, superb blog!