Natural Language Processing: Will It Make Government Lingo Understandable

April 11, 2019

I noted a FBO posting for a request for information for natural language processing services. You can find the RFI at this link on the FBO Web site. Here’s a passage I found interesting:

OPM seeks information on how to use artificial intelligence, particularly natural language processing (NLP), to gain insights into statutory and regulatory text to support policy analysis. The NLP capabilities should include topic modeling; text categorization; text clustering; information extraction; named entity resolution; relationship extraction; sentiment analysis; and summarization. The NLP project may include statistical techniques that can provide a general understanding of the statutory and regulatory text as a whole. In addition, OPM seeks to learn more about chatbots and transactional bots that are easy to implement and customize with the goal of extending bot-building capabilities to non-IT employees. (Maximum 4 pages requested.)

The goal is to obtain information about a system that performs the functions associated with an investigative software system; for example, Palantir Technologies, IBM i2, or one of the numerous companies operating from Herzliya, north of Tel Aviv.

I am curious about the service provider who assisted in the preparation of this RFI. The time window is responding is measured in days. With advanced text analysis systems abounding in US government agencies from the Department of Justice to the Department of Defense and beyond, I wonder why additional requests for information are required.

Ah, procurement. A process in love with buzzwords so an NLP system can make things more clear. Sounds like a plan.

Stephen E Arnold, April 11, 2019

Short Honk: NLP Tools

March 26, 2019

Making sense of unstructured content is tricky. If you are looking for opens source natural language processing tools, “12 Open Source Tools for Natural Language Processing” provides a list.

Stephen E Arnold, March 26, 2019

Text Analysis Toolkits

March 16, 2019

One of the DarkCyber team spotted a useful list, published by MonkeyLearn. Tucked into a narrative called “Text Analysis: The Only Guide You’ll Ever Need” was a list of natural language processing open source tools, programming languages, and software. Each description is accompanied with links and in several cases comments. See the original article for more information.

Caret

CoreNLP

Java

Keras

mlr

NLTK

OpenNLP

Python

SpaCy

Scikit-learn

TensorFlow

PyTorch

R

Weka

Stephen E Arnold, March 16, 2019

Deloitte and NLP: Is the Analysis On Time and Off Target?

January 18, 2019

I read “Using AI to Unleash the Power of Unstructured Government Data.” I was surprised because I thought that US government agencies were using smart software (NLP, smart ETL, digital notebooks, etc.). My recollection is that use of these types of tools began in the mid 1990s, maybe a few years earlier. i2 Ltd., a firm for which I did a few minor projects, rolled out its Analyst’s Notebook in the mid 1990s, and it gained traction in a number of government agencies a couple of years after British government units began using the software.

The write up states:

DoD’s Defense Advanced Research Projects Agency (DARPA) recently created the Deep Exploration and Filtering of Text (DEFT) program, which uses natural language processing (NLP), a form of artificial intelligence, to automatically extract relevant information and help analysts derive actionable insights from it.

My recollection is that DEFT fired up in 2010 or 2011. Once funding became available, activity picked up in 2012. That was six years ago.

However, DEFT is essentially a follow on from other initiatives which reach by to Purple Yogi (Stratify) and DR-LINK, among others.

The capabilities of NLP are presented as closely linked technical activities; for example:

  • Name entity resolution
  • Relationship extraction
  • Sentiment analysis
  • Topic modeling
  • Text categorization
  • Text clustering
  • Information extraction

The collection of buzzwords is interesting. I would annotate each of these items to place them in the context of my research into content processing, intelware, and related topics:

Read more

Natural Language Processing: Brittle and Spurious

August 24, 2018

I read “NLP’s Generalization Problem, and How Researchers Are Tackling It.” From my vantage point in rural Kentucky, the write up seems to say, “NLP does not work particularly well.”

For certain types of content in which terminology is constrained, NLP systems work okay. But, like clustering, the initial assignment of any object determines much about the system. Examples range from jargon, code words, phrases which are aliases, etc. NLP systems struggle in a single language system.

The write up provides interesting examples of NLP failure.

The fixes, alas, are not likely to deliver the bacon any time soon. Yep, “bacon” means a technical breakthrough. NLP systems struggle with this type of utterance. I refer to local restaurants as the nasty caballero, which is my way of saying “the local Mexican restaurant on the river.”

I like the suggestion that NLP systems should use common sense. Isn’t that the method that AskJeeves tried when it allegedly revolutionized NLP question answering? The problem, of course, was the humans had to craft rules and that took money, time, and even more money.

The suggestion to “Evaluate unseen distributions and unseen tasks.” That’s interesting as well. The challenge is the one that systems like IBM Watson face. Humans have to make decisions about dicey issues like clustering, then identify relevant training data, and index the text with metadata.

Same problem: Time and money.

For certain applications, NLP can be helpful. For other types of content comprehension, one ends up with the problem of getting Gertie (the NLP system) up and running. Then after a period of time (often a day or two), hooking Gertie to the next Star Trek innovation from Sillycon Valley.

How do you think NLP systems handle my writing style? Let’s ask some NLP systems? DR LINK? IBM Watson? Volunteers?

Stephen E Arnold, August 24, 2018

Alexa Is Still Taking Language Lessons

August 24, 2018

Though Amazon has been aware of the problem for a while, Alexa still responds better to people who sound like those she grew up with than she does to others. It is a problem many of us can relate to, but one the company really needs to solve as it continues to deploy its voice-activated digital assistant worldwide.   TheNextWeb cites a recent Washington Post study as it reports, “Alexa Needs Better Training to Understand Non-American Accents.” It is worth noting it is not just foreign accents the software cannot recognize—the device has trouble with many regional dialects within the US, as well.

“The team had more than 100 people from nearly 20 US cities dictate thousands of voice commands to Alexa. From the exercise, it found that Amazon’s Alexa-based voice-activated speaker was 30 percent less likely to comprehend commands issued by people with non-American accents. The Washington Post also reported that people with Spanish as their first language were understood 6 percent less often than people who grew up around California or Washington and spoke English as a first language.Amazon officials also admitted to The Washington Post that grasping non-American accents poses a major challenge both in keeping current Amazon Echo users satisfied, and expanding sales of their devices worldwide. Rachael Tatman, a Kaggle data scientist with expertise in speech recognition, told The Washington Post that this was evidence of bias in the training provided to voice recognition systems.‘These systems are going to work best for white, highly educated, upper-middle-class Americans, probably from the West Coast, because that’s the group that’s had access to the technology from the very beginning,’ she said.”

Yes, the bias we find here is the natural result of working with what you have where you are, and perhaps Amazon can be forgiven for not foreseeing the problem from the beginning. Perhaps. The article grants that the company has been working toward a resolution, and references their efforts to prepare for the Indian market as an example. It seems to be slow going.

Cynthia Murrell, August 24, 2018

IBM and Distancing: New Collar Jobs in France

May 23, 2018

I have zero idea if the information in the article “Exclusive: IBM bringing 1,800 AI jobs to France.” The story caught my attention because I had read “Macron Vowed to Make France a ‘Start-Up Nation.’ Is It Getting There?” You can find the story online at this link, although I read a version of the story in my dead tree edition of the real “news” paper at breakfast this morning (May 23, 2018).

Perhaps IBM recognizes that the “culture” of France makes it difficult for startups to get funding without the French management flair. Consequently a bold and surgical move to use IBM management expertise could make blockchain, AI, and Watson sing Johnny Hallyday’s Johnny, reviens ! Les Rocks les plus terribles and shoot to the top of YouTube views.

On the other hand, the play may be a long shot.

What I did find interesting in the write up was this statement:

IBM continues to make moves aimed at distancing itself from peers.

That is fascinating. IBM has faced a bit of pushback as it made some personnel decisions which annoyed some IBMers. One former IBM senior manager just shook his head and grimaced when I mentioned the floundering of the Watson billion dollar bet. I dared not bring up riffing workers over 55. That’s a sore subject for some Big Blue bleeders.

I also liked the “New Collar” buzzword.

To sum up, I assume that IBM will bring the New Collar fashion wave to the stylish world of French technology.

Let’s ask Watson. No, bad idea. Let’s not. I don’t have the time to train Watson to make sense of questions about French finance, technology, wine, cheese, schools, family history, and knowledge of Molière.

Stephen E Arnold, May 23, 2018

Real Time Translation: Chatbots Emulate Sci Fi

April 16, 2018

The language barrier is still one of the world’s major problems. Translation software, such as Google Translate is accurate, but it still makes mistakes that native speakers are needed to correct. Instantaneous translation is still a pipe dream, but the technology is improving with each new development. Mashable shares a current translation innovation and it belongs to Google: “Google Pixel Buds Vs. Professional Interpreters: Which Is More Accurate?”

Apple angered many devout users when it deleted the headphone jack on phones, instead replacing it with Bluetooth headphones called AirPods. They have the same minimalist sleek design as other Apple products, but Google’s Pixel Buds are far superior to them because of real time translation or so we are led to believe. Author Raymond Wong tested the Pixel Buds translation features at the United Nations to see how they faired against professional translators. He and his team tested French, Arabic, and Russian. The Pixel Buds did well with simple conversations, but certain words and phrases caused errors.

One hilarious example was when Google translated the Arabic for, “I want to eat salad” to “I want to eat power” in English. When it comes to real time translation, the experts are still the best because they can understand the context and other intricacies, such as tone, that comes with human language. The professional translators liked the technology, but it still needs work:

“Ayad and Ivanova both agreed that Pixel Buds and Google Translate are convenient technologies, but there’s still the friction of holding out a Pixel phone for the other person to talk into. And despite the Pixel Buds’ somewhat speedy translations, they both said it doesn’t compare to a professional conference interpreters, who can translate at least five times faster Google’s cloud.”

Keep working on those foreign language majors kids. Marketing noses in front of products that deliver in my view.

Whitney Grace, April 17, 2018

Udpipe for R: An NLP Solution for R

March 19, 2018

Natural language processing is a huge component in not only big data, but machine learning when it relates to reading and understanding languages. Natural language processing is not only important to English, but any foreign language in the modern age that needs to take advantage of AI and machine learning. RBloggers takes a look at another new tool in the area of NLP and its updated features, “Natural Language Processing For Non-English Languages With Udpipe.”

We learned from the write up:

“BNOSAC is happy to announce the release of the udpipe R package (https://bnosac.github.io/udpipe/en) which is a Natural Language Processing toolkit that provides language-agnostic ‘tokenization’, ‘parts of speech tagging’, ‘lemmatization’, ‘morphological feature tagging’ and ‘dependency parsing’ of raw text. Next to text parsing, the package also allows you to train annotation models based on data of ‘treebanks’ in ‘CoNLL-U’ format…”

The udpipe R package supports a wide range of languages from Latin-based to Asian, including Slavonic, Russian, Vietnamese, Finnish, Turkish, Serbian, Japanese, Basque, and Greek.

BNOSAC designed the udpipe R package for designer to build NLP applications that can integrate parts of speech tags, tokens, morphological features and dependency on parsing output. BNOSAC really wants non-English speaking designs to take advantage of the upgrade for their applications, because tools like this should not be restricted to English only communities.

Whitney Grace, March 19, 2018

The New York Times Wants to Change Your Google Habit

March 1, 2018

Sunday is a slightly less crazy day. I took time to scan “The Case Against Google.” I had the dead tree edition of the New York Times Magazine for February 25, 2018. You may be able to access this remarkable hybridization of Harvard MBA think, DNA engineered to stick pins in Google, and good old establishment journalism toasted at Yale University.

image

The author is a wildly successful author. Charles Duhigg loves his family, makes time for his children, writes advice books, and immerses himself in a single project at a time. When he comes up for air, he breathes deeply of Google outputs in order to obtain information. If the Google fails, he picks up the phone. I assume those whom he calls answer the ring tone. I find that most people do not answer their phones, but that’s another habit which may require analysis.

I worked through the write up. I noted three things straight away.

First, the timeline structure of the story is logical. However, leaving it up to me to figure out which date matched which egregious Google action was annoying. Fortunately, after writing The Google Legacy, Google Version 2.0, and Google: The Digital Gutenberg, I had the general timeline in mind. Other readers may not.

Second, the statement early in the write up reveals the drift of the essay’s argument. The best selling author of The Power of Habit writes:

Within computer science, this kind of algorithmic alchemy is sometimes known as vertical search, and it’s notoriously hard to master. Even Google, with its thousands of Ph.D.s, gets spooked by vertical-search problems.

I am not into arguments about horizontal and vertical search. I ran around that mulberry tree with a number of companies, including a couple of New York investment banks. Been there. Done that. There are differences in how the components of a findability solution operate, but the basic plumbing is similar. One must not confuse search with the specific technology employed to deliver a particular type of output. Want to argue? First, read The New Landscape of Search, published by Pandia before the outfit shut down. Then, send me an email with your argument.

Third, cherry picking from Google’s statements makes it possible to paint a somewhat negative picture of the great and much loved Google. With more than 60,000 employees, many blogs, many public presentations, oodles of YouTube videos, and a library full of technical papers and patents, the Google folks say a lot. The problem is that finding a quote to support almost any statement is not hard; it just takes persistence. Here’s an example:

We absolutely  do not make changes 5to our search algorithm to disadvantage competitors.

Read more

Next Page »

  • Archives

  • Recent Posts

  • Meta