Deloitte and NLP: Is the Analysis On Time and Off Target?

January 18, 2019

I read “Using AI to Unleash the Power of Unstructured Government Data.” I was surprised because I thought that US government agencies were using smart software (NLP, smart ETL, digital notebooks, etc.). My recollection is that use of these types of tools began in the mid 1990s, maybe a few years earlier. i2 Ltd., a firm for which I did a few minor projects, rolled out its Analyst’s Notebook in the mid 1990s, and it gained traction in a number of government agencies a couple of years after British government units began using the software.

The write up states:

DoD’s Defense Advanced Research Projects Agency (DARPA) recently created the Deep Exploration and Filtering of Text (DEFT) program, which uses natural language processing (NLP), a form of artificial intelligence, to automatically extract relevant information and help analysts derive actionable insights from it.

My recollection is that DEFT fired up in 2010 or 2011. Once funding became available, activity picked up in 2012. That was six years ago.

However, DEFT is essentially a follow on from other initiatives which reach by to Purple Yogi (Stratify) and DR-LINK, among others.

The capabilities of NLP are presented as closely linked technical activities; for example:

Name entity resolution
Relationship extraction
Sentiment analysis
Topic modeling
Text categorization
Text clustering
Information extraction

The collection of buzzwords is interesting. I would annotate each of these items to place them in the context of my research into content processing, intelware, and related topics:

Name entity resolution—The issue is identifying an entity which is not “named”. Look up lists work as well as parsing. But identifying the unknown is where the action is
Relationship extraction—Social graphs and mapping relationships have been part of the Analyst’s Notebook toolkit since 1996. The task now is manipulate fingerprints of certain social actions and examine them in terms of pattern matching. There’s some nifty math at Technion for this, but extraction is small potatoes.
Sentiment analysis—Sentiment also works when looking for signal words. But what about sentiment in spoken text, videos, and in the real time behaviors of individuals interacting with devices.
Topic modeling—Now we are talking. The idea of figuring out a known or unknown subject and creating a model which can be used to develop another model is a hot topic. The US government has funded companies to work on this project, but the effort is from a different program as I recall.
Text categorization—-This is old stuff. There are many ways to approach this. Each method invokes religious wars. The problem of determining what a human generated output is “about” is a sticky wicket. Bad actors use words like “white china plates” and “collectible teddy bear” to signal a particular subject like controlled substances. Software is not yet up to the task of dealing with categorizing text with these tokenized words and phrases. Emojis, videos, and snapshots of messages behind a picture of a hamburger joint’s logo remain areas requiring significant work.
Text clustering—Once one has extracted, counted, semanticized, and added relevant metatags, it is helpful to cluster items. Once can cluster corpuses, single documents, chapters, paragraphs, sentences, and words. The job is to make this clustering useful. Most clustering outputs are good for PowerPoint images or inclusion in free “insight” reports. But often the image delivers less value than 1,000 words.
Information extraction—This is indeed an important concept. However, it is not NLP. Think of natural language generation as a new discipline. True, NLG uses metatagging and analytics, but another layer of technology is needed to generate a useful output. There are companies working is this particular field of endeavor.

The Deloitte analysis is designed to make the reader sign on for advisory services. I used to work at Booz, Allen and I have done project work for a number of blue chip consulting firms.

Write ups which boil the ocean have to deliver billings that can easily hit seven figures.

The problem is that the spirit of the write up is true to the blue chip method of describing a problem, installing a bit of urgency, and offering a solution.

The reality, however, may differ from what’s presented as the problem.

But for those who are looking for projects, history, the interplay of technologies, and real action areas are often of little interest.

I can hear this remark now, “Really, this NLP stuff more than 35 years ago?”

Yep. Old news and a failure to recognize the actual news.

Stephen E Arnold, January 17, 2019

Written by Stephen E. Arnold · Filed Under Feature, Natural language processing

Comments

One Response to “Deloitte and NLP: Is the Analysis On Time and Off Target?”

best bitcoin faucet list 2019 on February 3rd, 2019 7:05 pm

I read this post fully on the topic of the resemblance of most up-to-date and earlier technologies,
it’s amazing article.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.