Honkin' News banner

More and More about NLP

August 31, 2016

Natural language processing is not a new term in the IT market, but NLP technology has only become commonplace in the last year or so. When I refer to commonplace, I refer how most computers and mobile devices have some form of NLP tool, including digital assistants and voice to text. Business 2 Community explains the basics about NLP technology in the article, “Natural Language Processing: Turning Words in Data.”

The article acts a primer for understanding how NLP works and is redundant until you get into the text about how it is applied in the real world; that is, tied to machine learning. I found this paragraph helpful:

“This has changed with the advent of machine learning. Machine learning refers to the use of a combination of real-world and human-supplied characteristics (called “features”) to train computers to identify patterns and make predictions. In the case of NLP, using a real-world data set lets the computer and machine learning expert create algorithms that better capture how language is actually used in the real world, rather than on how the rules of syntax and grammar say it should be used. This allows computers to devise more sophisticated—and more accurate—models than would be possible solely using a static set of instructions from human developers.”

It then goes into further details about how NLP is applied to big data technology and explaining the practical applications. It makes some reference to open source NLP technologies, but only in passing.

The article sums up the NLP and big data information in regular English vernacular. The technology gets even more complicate when you delve into further research on the subject.

Whitney Grace, August 31, 2016

Computers Will Talk Pretty One Day Soon with NLP

August 25, 2016

The article titled National Language Processing: Turning Words Into Data on B2C takes an in-depth look at NLP and why it is such a difficult area to perfect. Anyone who has conversed with an automated customer service system knows that NLP technology is far from ideal. Why is this? The article suggests that while computers are great at learning the basic rules of language, things get far more complex when you throw in context-dependent or ambiguous language, not to mention human error. The article explains,

“This has changed with the advent of machine learning…In the case of NLP, using a real-world data set lets the computer and machine learning expert create algorithms that better capture how language is actually used in the real world, rather than on how the rules of syntax and grammar say it should be used. This allows computers to devise more sophisticated—and more accurate—models than would be possible solely using a static set of instructions from human developers.”

Throw in Big Data and we have a treasure trove of unstructured data to glean value from in the form of text messages, emails, and social media. The article lists several exciting applications such as automatic translation, automatic summarization, Natural Language Generation, and sentiment analysis.

Chelsea Kerwin, August 25, 2016

NLP and Smart Software: Everyone Becomes a Big Data Expert

December 28, 2015

A new company seeks to make everyone a big data expert. You can get the full scoop in “Detecting Consumer Decisions within Messy Data: Software Analyzes Online Chatter to Predict Health Care Consumers’ Behavior.” The company with the natural language technology and proprietary smart software is dMetrics.

Here’s the premise:

DecisionEngine, Nemirovsky [dMetrics wizard] says, better derives meaning from text because the software — which now consists of around 2 million lines of code — is consistently trained to recognize various words and synonyms, and to interpret syntax and semantics. “Online text is incredibly tough to analyze properly,” he says. “There’s slang, misspellings, run-on sentences, and crazy punctuation. Discussion is messy.”

Now, how does the system work?

Visualize the software as a three-tiered funnel, Nemirovsky suggests, with more refined analysis happening as the funnel gets narrower. At the top of the funnel, the software mines all mentions of a particular word or phrase associated with a certain health care product, while filtering out “noise” such as fake websites and users, or spam. The next level down involves separating out commenters’ personal experiences over, say, marketing materials and news. The bottom level determines people’s decisions and responses, such as starting to use a product — or even considering doing so, experiencing fear or confusion, or switching to a different medication.

The company wants to expand beyond health care. Worth monitoring.

Stephen E Arnold, December 28, 2015

Palantir: Worth $20 Billion?

December 12, 2015

I read “U.S. Data Company Palantir Raises $679.8 Million.” The key points in the write up from my point of view were that Palantir is valued at $20 billion, which may be a record for a company providing search and content analysis. The other point is that the company has raised more than $670 million. The company keeps a low profile and reminds me of the teenage Autonomy from the early 2000s. Value may become an issue at some point.

Stephen E Arnold, December 12, 2015

More Huge Notions for Natural Language Processing

December 4, 2015

You talk to your mobile phone, right? I assume you don’t try the chat with Siri- and Cortana- type services in noisy places, in front of folks you don’t trust, and when you are in a wind storm.

I know that the idea of typing questions with subjects, verbs, adjectives, and other parts of speech is an exciting one to some people. In reality, crafting sentences is not the ideal way to interact with some search systems. If you are looking for snaps of Miley Cyrus, you don’t want to write a sentence. Just enter the word “Miley” and the Alphabet Google thing does the rest. Easy.

I read about another search related research study in “Natural Language Processing NLP Market Dynamics, Forecast, Analysis and Supply Demand 2015-2025.” I find the decade long view evidence that Excel trend functions may have helped the analysts formulate their “future insights.”

The write up about the study identifies some of the companies engaged in NLP. Here’s a sample:

IBM Corporation,

3M Co.

Hewlett-Packard Co.

Apple Inc.

Oracle Corporation

Microsoft Corporation

Dolbey Systems Inc.

SAS Institute Inc.

Netbase Solutions Inc.

Verint Systems Inc.

What no Alphabet Google? Perhaps the full study includes this outfit.

A report by MarketsAndMarkets pegged NLP as reaching $13.4 billion by 2020. I assume that the size of the market in 2025 will be larger. Since I don’t have the market size data from Future Market Insights, we will just have to wait and see what happens.

In today’s business world, projections for a decade in the future strike me as somewhat far reaching and maybe a little silly.

Who crafted this report? According to the write up:

Future Market Insights (FMI) is a premier provider of syndicated research reports, custom research reports, and consulting services. We deliver a complete packaged solution, which combines current market intelligence, statistical anecdotes, technology inputs, valuable growth insights, aerial view of the competitive framework, and future market trends.

I like the aerial view of the competitive framework thing. I wish I could do that type of work. I wonder how Verint perceives a 10 year projection when some of the firm’s intelligence works focuses on slightly shorter time horizons.

Stephen E Arnold, December 4, 2015

Stanford Offers an NLP Demo

October 8, 2015

Want to see what natural language processing does in the Stanford CoreNLP implementation. Navigate to Stanford CoreNLP. The service is free. Enter some text. The system will scan the input and display an output. NLP revealed:


What can one do with this output? Build an application around the outputs. NLP is easy. The artificial intelligence implementation is a bit of a challenge, of course, but parts of speech, named entities, and dependency parsing can be darned useful. Now mixed language inputs may be an issue. Names in different character sets could be a hurdle. I am thrilled that NLP has been visualized using the brat visualization and annotation software. Get excited, gentle reader.

Stephen E Arnold, October 8, 2015

Partridge Search for Scientific Papers and Recommendation

August 17, 2015

If you read academic papers, you may want to take a flight through Partridge. Additional details are at this link. According to the Web site: Partridge

is a web based tool used for information retrieval and filtering that makes use of Machine Learning techniques. Partridge indexes lots of academic papers and classifies them based on their content. It also indexes their content by scientific concept, providing the ability to search for phrases within specific key areas of the paper (for example, you could search for a specific outcome in the ‘conclusion’ or find out how the experiment was conducted in the ‘methodology’ section.)

The About section of the Web site explains:

Partridge is a collection of tools and web-based scripts that use artificial intelligence to run semantic analysis on large bodies of text in order to provide reader recommendations to those who query the tool. The project is named after Errol Partridge, a character from the cult Science Fiction film ‘Equilibrium’ who imparts knowledge of a cache of fictional books (banned contraband in the film) upon the protagonist, John Preston, eventually leading to his defiance of the state and the de-criminalization of literature. Partridge is my dissertation project at Aberystwyth University, United Kingdom.

Check out the system at http://paprol.org.uk.

Stephen E Arnold, August 17, 2015

Watson: The PR Blitz Continues

July 28, 2015

I know that IBM is trying to reverse 13 quarters of revenue decline. I know that most of the firm’s business units are struggling to hit their numbers. I know that IBM’s loyal employees are doing their best to belt out the IBM song “Ever Onward” in perfect harmony.


If you are not familiar with the lyrics, you can read the words at this link on the IBM Web site, which unlike the dev ops pages are still online:

That’s the spirit that has brought us fame!
We’re big, but bigger we will be
We can’t fail for all can see
That to serve humanity has been our aim!
Our products now are known, in every zone,
Our reputation sparkles like a gem!
We’ve fought our way through — and new
Fields we’re sure to conquer too

Goodness, I am tapping my foot just reading the phrase “Our reputation sparkles like a gem!”

And I don’t count the grinches who complain at EndicottAlliance.org like this:

Comment 07/27/15:
Job Title: IT Specialist
Location: Rochester MN
CustAcct: Various
BusUnit: Cloud
Message: I was forced out/bullied out through bad PBC rating/threats of PIP. I left voluntarily a few months back, rather than waiting for the inevitable layoff (since my 2014 rating was a 3, I would have probably been let go with no package). Once I got my appraisal in January, I started looking around and found another job that pays about the same as my band 10 IBM salary – and I am evaluating several other offers as we speak. I truly feel for the victims of yet another round of layoffs. But I don’t quite understand why some find it “shocking” and “unexpected” that IBM gets rid of them. Your CEO has publicly declared that many of you – especially those in the services organizations – are nothing more than “empty calories.” She went on record with those words. What do you expect? Either you organize or you better start looking for something else.

I pay attention to the “3 Lessons IBM’s Watson Can Teach Us about Our Brains’ Biases.” The write up explains:

Cognitive computing is transforming the way we work.

Read more

The Future of Enterprise and Web Search: Worrying about a Very Frail Goose

May 28, 2015

For a moment, I thought search was undergoing a renascence. But I was wrong. I noted a chart which purports to illustrate that the future is not keyword search. You can find the illustration (for now) at this Twitter location. The idea is that keyword search is less and less effective as the volume of data goes up. I don’t want to be a spoil sport, but for certain queries key words and good old Boolean may be the only way to retrieve certain types of information. Don’t believe me. Log on to your organization’s network or to Google. Now look for the telephone number of a specific person whose name you know or a tire company located in a specific city with a specific name which you know. Would you prefer to browse a directory, a word cloud, a list of suggestions? I want to zing directly to the specific fact. Yep, key word search. The old reliable.

But the chart points out that the future is composed of three “webs”: The Social Web, the Semantic Web, and the Intelligent Web. The dates for the Intelligent Web appears to be 2018 (the diagram at which I am looking is fuzzy). We are now perched half way through 2015. In 30 months, the Intelligent Web will arrive with these characteristics:

Embedded image permalink

  • Web scale reasoning (Don’t we have Watson? Oh, right. I forgot.)
  • Intelligent agents (Why not tap Connotate? Agents ready to roll.)
  • Natural language search (Yep, talk to your phone How is that working out on a noisy subway train?)
  • Semantics. (Embrace the OWL. Now.)

Now these benchmarks will arrive in the next 30 months, which implies a gradual emergence of Web 4.0.

The hitch in the git along, like most futuristic predictions about information access, is that reality behaves in some unpredictable ways. The assumption behind this graph is “Semantic technology help to regain productivity in the face of overwhelming information growth.”

Read more

Kelsen Enters Legal Search Field

February 23, 2015

A new natural-language search platform out of Berlin, Kelsen, delivers software-as-a-service to law firms. Basic Thinking discusses “The Wolfram Alpha of the Legal Industry.” Writer Jürgen Kroder interviewed Kelsen co-founder Veronica Pratzka. She explains what makes her company’s search service different (quote auto-translated from the original German):

“Kelsen is generated based on pre-existing legal cases not a search engine, but a self-learning algorithm that automatically answers. 70-80 percent of the global online data are very unstructured. Search engines look for keywords and only. Google has many answers, but you have to look for them yourself thousands of search results together and hope that you just entered the correct keywords. Kelsen, however, is rather a free online lawyer who understands natural language practitioner trained in all areas of law, works 24/7 and is always up-to-date….

“First Kelsen understands natural language compared to Google! That is, even with the entry of long sentences and questions, not just keywords, Kelsen is suitable answers. Moreover, Kelsen searches ‘only’ relevant legal data sources and provides the user with a choice of right answers ready, he can also evaluate.’
“One could easily Kelsen effusive as ‘the Wolfram Alpha the legal industry,’ respectively. We focus on Kelsen with legal data structure and analyze them in order to eventually make available. From this structuring and visualization of legal data not only seeking advice and lawyers can benefit, but also legislators, courts and research institutions.”

Pratzka notes that her company received boosts from both the Microsoft Accelerator and the IBM Entrepreneur startup support programs. Kelsen expects to turn a profit on the business-to-consumer side through premium memberships. In business-to-business, though, the company plans to excel by simply outperforming the competition. Pratzka seems very confident. Will the service garner the attention she and her team expect?

Cynthia Murrell, February 23, 2015

Sponsored by ArnoldIT.com, developer of Augmentext

Next Page »