Natural Language Processing: Brittle and Spurious

August 24, 2018

I read “NLP’s Generalization Problem, and How Researchers Are Tackling It.” From my vantage point in rural Kentucky, the write up seems to say, “NLP does not work particularly well.”

For certain types of content in which terminology is constrained, NLP systems work okay. But, like clustering, the initial assignment of any object determines much about the system. Examples range from jargon, code words, phrases which are aliases, etc. NLP systems struggle in a single language system.

The write up provides interesting examples of NLP failure.

The fixes, alas, are not likely to deliver the bacon any time soon. Yep, “bacon” means a technical breakthrough. NLP systems struggle with this type of utterance. I refer to local restaurants as the nasty caballero, which is my way of saying “the local Mexican restaurant on the river.”

I like the suggestion that NLP systems should use common sense. Isn’t that the method that AskJeeves tried when it allegedly revolutionized NLP question answering? The problem, of course, was the humans had to craft rules and that took money, time, and even more money.

The suggestion to “Evaluate unseen distributions and unseen tasks.” That’s interesting as well. The challenge is the one that systems like IBM Watson face. Humans have to make decisions about dicey issues like clustering, then identify relevant training data, and index the text with metadata.

Same problem: Time and money.

For certain applications, NLP can be helpful. For other types of content comprehension, one ends up with the problem of getting Gertie (the NLP system) up and running. Then after a period of time (often a day or two), hooking Gertie to the next Star Trek innovation from Sillycon Valley.

How do you think NLP systems handle my writing style? Let’s ask some NLP systems? DR LINK? IBM Watson? Volunteers?

Stephen E Arnold, August 24, 2018

Alexa Is Still Taking Language Lessons

August 24, 2018

Though Amazon has been aware of the problem for a while, Alexa still responds better to people who sound like those she grew up with than she does to others. It is a problem many of us can relate to, but one the company really needs to solve as it continues to deploy its voice-activated digital assistant worldwide.   TheNextWeb cites a recent Washington Post study as it reports, “Alexa Needs Better Training to Understand Non-American Accents.” It is worth noting it is not just foreign accents the software cannot recognize—the device has trouble with many regional dialects within the US, as well.

“The team had more than 100 people from nearly 20 US cities dictate thousands of voice commands to Alexa. From the exercise, it found that Amazon’s Alexa-based voice-activated speaker was 30 percent less likely to comprehend commands issued by people with non-American accents. The Washington Post also reported that people with Spanish as their first language were understood 6 percent less often than people who grew up around California or Washington and spoke English as a first language.Amazon officials also admitted to The Washington Post that grasping non-American accents poses a major challenge both in keeping current Amazon Echo users satisfied, and expanding sales of their devices worldwide. Rachael Tatman, a Kaggle data scientist with expertise in speech recognition, told The Washington Post that this was evidence of bias in the training provided to voice recognition systems.‘These systems are going to work best for white, highly educated, upper-middle-class Americans, probably from the West Coast, because that’s the group that’s had access to the technology from the very beginning,’ she said.”

Yes, the bias we find here is the natural result of working with what you have where you are, and perhaps Amazon can be forgiven for not foreseeing the problem from the beginning. Perhaps. The article grants that the company has been working toward a resolution, and references their efforts to prepare for the Indian market as an example. It seems to be slow going.

Cynthia Murrell, August 24, 2018

IBM and Distancing: New Collar Jobs in France

May 23, 2018

I have zero idea if the information in the article “Exclusive: IBM bringing 1,800 AI jobs to France.” The story caught my attention because I had read “Macron Vowed to Make France a ‘Start-Up Nation.’ Is It Getting There?” You can find the story online at this link, although I read a version of the story in my dead tree edition of the real “news” paper at breakfast this morning (May 23, 2018).

Perhaps IBM recognizes that the “culture” of France makes it difficult for startups to get funding without the French management flair. Consequently a bold and surgical move to use IBM management expertise could make blockchain, AI, and Watson sing Johnny Hallyday’s Johnny, reviens ! Les Rocks les plus terribles and shoot to the top of YouTube views.

On the other hand, the play may be a long shot.

What I did find interesting in the write up was this statement:

IBM continues to make moves aimed at distancing itself from peers.

That is fascinating. IBM has faced a bit of pushback as it made some personnel decisions which annoyed some IBMers. One former IBM senior manager just shook his head and grimaced when I mentioned the floundering of the Watson billion dollar bet. I dared not bring up riffing workers over 55. That’s a sore subject for some Big Blue bleeders.

I also liked the “New Collar” buzzword.

To sum up, I assume that IBM will bring the New Collar fashion wave to the stylish world of French technology.

Let’s ask Watson. No, bad idea. Let’s not. I don’t have the time to train Watson to make sense of questions about French finance, technology, wine, cheese, schools, family history, and knowledge of Molière.

Stephen E Arnold, May 23, 2018

Real Time Translation: Chatbots Emulate Sci Fi

April 16, 2018

The language barrier is still one of the world’s major problems. Translation software, such as Google Translate is accurate, but it still makes mistakes that native speakers are needed to correct. Instantaneous translation is still a pipe dream, but the technology is improving with each new development. Mashable shares a current translation innovation and it belongs to Google: “Google Pixel Buds Vs. Professional Interpreters: Which Is More Accurate?”

Apple angered many devout users when it deleted the headphone jack on phones, instead replacing it with Bluetooth headphones called AirPods. They have the same minimalist sleek design as other Apple products, but Google’s Pixel Buds are far superior to them because of real time translation or so we are led to believe. Author Raymond Wong tested the Pixel Buds translation features at the United Nations to see how they faired against professional translators. He and his team tested French, Arabic, and Russian. The Pixel Buds did well with simple conversations, but certain words and phrases caused errors.

One hilarious example was when Google translated the Arabic for, “I want to eat salad” to “I want to eat power” in English. When it comes to real time translation, the experts are still the best because they can understand the context and other intricacies, such as tone, that comes with human language. The professional translators liked the technology, but it still needs work:

“Ayad and Ivanova both agreed that Pixel Buds and Google Translate are convenient technologies, but there’s still the friction of holding out a Pixel phone for the other person to talk into. And despite the Pixel Buds’ somewhat speedy translations, they both said it doesn’t compare to a professional conference interpreters, who can translate at least five times faster Google’s cloud.”

Keep working on those foreign language majors kids. Marketing noses in front of products that deliver in my view.

Whitney Grace, April 17, 2018

Udpipe for R: An NLP Solution for R

March 19, 2018

Natural language processing is a huge component in not only big data, but machine learning when it relates to reading and understanding languages. Natural language processing is not only important to English, but any foreign language in the modern age that needs to take advantage of AI and machine learning. RBloggers takes a look at another new tool in the area of NLP and its updated features, “Natural Language Processing For Non-English Languages With Udpipe.”

We learned from the write up:

“BNOSAC is happy to announce the release of the udpipe R package (https://bnosac.github.io/udpipe/en) which is a Natural Language Processing toolkit that provides language-agnostic ‘tokenization’, ‘parts of speech tagging’, ‘lemmatization’, ‘morphological feature tagging’ and ‘dependency parsing’ of raw text. Next to text parsing, the package also allows you to train annotation models based on data of ‘treebanks’ in ‘CoNLL-U’ format…”

The udpipe R package supports a wide range of languages from Latin-based to Asian, including Slavonic, Russian, Vietnamese, Finnish, Turkish, Serbian, Japanese, Basque, and Greek.

BNOSAC designed the udpipe R package for designer to build NLP applications that can integrate parts of speech tags, tokens, morphological features and dependency on parsing output. BNOSAC really wants non-English speaking designs to take advantage of the upgrade for their applications, because tools like this should not be restricted to English only communities.

Whitney Grace, March 19, 2018

The New York Times Wants to Change Your Google Habit

March 1, 2018

Sunday is a slightly less crazy day. I took time to scan “The Case Against Google.” I had the dead tree edition of the New York Times Magazine for February 25, 2018. You may be able to access this remarkable hybridization of Harvard MBA think, DNA engineered to stick pins in Google, and good old establishment journalism toasted at Yale University.

image

The author is a wildly successful author. Charles Duhigg loves his family, makes time for his children, writes advice books, and immerses himself in a single project at a time. When he comes up for air, he breathes deeply of Google outputs in order to obtain information. If the Google fails, he picks up the phone. I assume those whom he calls answer the ring tone. I find that most people do not answer their phones, but that’s another habit which may require analysis.

I worked through the write up. I noted three things straight away.

First, the timeline structure of the story is logical. However, leaving it up to me to figure out which date matched which egregious Google action was annoying. Fortunately, after writing The Google Legacy, Google Version 2.0, and Google: The Digital Gutenberg, I had the general timeline in mind. Other readers may not.

Second, the statement early in the write up reveals the drift of the essay’s argument. The best selling author of The Power of Habit writes:

Within computer science, this kind of algorithmic alchemy is sometimes known as vertical search, and it’s notoriously hard to master. Even Google, with its thousands of Ph.D.s, gets spooked by vertical-search problems.

I am not into arguments about horizontal and vertical search. I ran around that mulberry tree with a number of companies, including a couple of New York investment banks. Been there. Done that. There are differences in how the components of a findability solution operate, but the basic plumbing is similar. One must not confuse search with the specific technology employed to deliver a particular type of output. Want to argue? First, read The New Landscape of Search, published by Pandia before the outfit shut down. Then, send me an email with your argument.

Third, cherry picking from Google’s statements makes it possible to paint a somewhat negative picture of the great and much loved Google. With more than 60,000 employees, many blogs, many public presentations, oodles of YouTube videos, and a library full of technical papers and patents, the Google folks say a lot. The problem is that finding a quote to support almost any statement is not hard; it just takes persistence. Here’s an example:

We absolutely  do not make changes 5to our search algorithm to disadvantage competitors.

Read more

Progress: From Selling NLP to Providing NLP Services

December 11, 2017

Years ago, Progress Software owned an NLP system. I recall conversations with natural language processing wizards from Easy Ask. Larry Harris developed a natural language system in 1999 or 2000. Progress purchased EasyAsk in 2005 if memory serves. I interviewed Craig Bassin in 2010 as part of my Search Wizards Speak series.

The recollection I have was that Progress divested itself of EasyAsk in order to focus on enterprise applications other than NLP. No big deal. Software companies are bought and sold everyday.

However, what makes this recollection interesting to me is the information in “Beyond NLP: 8 Challenges to Building a Chatbot.” Progress went from a software company who owned an NLP system to a company which is advising people like me how challenging a chatbot system can be to build and make work. (I noted that the Wikipedia entry for Progress does not mention the EasyAsk acquisition and subsequent de-acquisition.) Either small potatoes or a milestone best jumped over I assume.)

Presumably it is easier to advise and get paid to implement than funding and refining an NLP system like EasyAsk. If you are not familiar with EasyAsk, the company positions itself in eCommerce site search with its “cognitive eCommerce” technology. EasyAsk’s capabilities include voice enabled natural language mobile search. This strikes me as a capability which is similar to that of a chatbot as I understand the concept.

History is history one of my high school teachers once observed. Let’s move on.

What are the eight challenges to standing up a chatbot which sort of works? Here they are:

  1. The chat interface
  2. NLP
  3. The “context” of the bot
  4. Loops, splits, and recursions
  5. Integration with legacy systems
  6. Analytics
  7. Handoffs
  8. Character, tone, and persona.

As I review this list, I note that I have to decide whether to talk to a chatbot or type into a box so a “customer care representative” can assist me. The “representative” is, the assumption is, a smart software robot.

I also notice that the bot has to have context. Think of a car dealer and the potential customer. The bot has to know that I want to buy a car. Seems obvious. But okay.

“Loops, splits, and recursions.” Frankly I have no idea what this means. I know that chatbot centric companies use jargon. I assume that this means “programming” so the NLP system returns a semi-on point answer.

Integration with legacy systems and handoffs seem to be similar to me. I would just call these two steps “integration” and be done with it.

The “character, tone, and persona” seems to apply to how the chatbot sounds; for example, the nasty, imperious tone of a Kroger automated check out system.

Net net: Progress is in the business of selling advisory and engineering services. The reason, in my opinion, was that Progress could not crack the code to make search and retrieval generate expected payoffs. Like some Convera executives, selling search related services was a more attractive path.

Stephen E Arnold, December 11, 2017

Google: Headphones and Voice Magic

November 23, 2017

I read two interesting articles. Each provides some insight into Google’s effort to put the NLP and chatbot doggies in an Alphabet corral.

The first article is “Google SLING: An Open Source Natural Language Parser.” To refresh your memory, “SLING is a combination of recurrent neural networks and frame based parsing.”

The second article is “Google Introduces Dialogflow Enterprise Edition, a Conversational Apps Building Platform.” The idea is to provide “a platform for building voice and text conversational applications.”

Both are interesting because each seems to be “free.” I won’t drag you, gentle reader, through the consequences of building a solution around a “free” Google service. One Xoogler watches me like a hawk to remind me that Google doesn’t treat people in a will of the wisp way. Okay. Let’s move on, shall we?

Both of these systems advance Google’s quest to become the Big Dog of where the world is heading for computer interaction. Both are germane to the wireless headphones Google introduced. These headphones, unlike other wireless alternatives, can translate. Hence, the largesse for free NLP and voice freebies.

I read “Trying Out Google’s Translating Headphones” informed me that:

The most important thing you should know about Pixel Buds is that their full features only work with Google’s newest smartphone, the Pixel 2.

Is this vendor lock in?

I learned from the write up:

To be honest, it’s not exactly real-time. You call up the feature by tapping on your right earbud and asking Google Assistant to “help me speak” one of 40 languages. The phone will then open the Google Translate app. From there, the phone will translate what it hears into the language of your choice, and you’ll hear it in your ear.

Not quite like Star Trek’s universal translator, suggests the article. I noted this statement:

it’s worth realizing that the Pixel Buds are more than just a pair of headphones. They’re an early illustration of what we can expect from Google, which will try to make products that stand out from the pack with unusual artificial intelligence services such as translation.

A demo. I suppose doing the lock in tactic with a demo is better than basing lock in on vaporware.

Then there are the free APIs. These, of course, will never go away or cost too much money. The headphones are $159. The phone adds another $649.

Almost free.

Stephen E Arnold, November 23, 2017

Natural Language Processing: Tomorrow and Yesterday

October 31, 2017

I read “Will Natural Language Processing Change Search as We Know It?” The write up is by a search specialist who, I believe, worked at Convera. The Search Technologies’ Web site asserts:

He was the architect and inventor of RetrievalWare, a ground-breaking natural-language based statistical text search engine which he started in 1989 and grew to $50 million in annual sales worldwide. RetrievalWare is now owned by Microsoft Corporation.

I think Fast Search acquired a portion of Convera. When Microsoft purchased Fast Search, the Convera technology was part of the deal. When Convera faded, one rumor I captured in 2007 was that some of the Convera technology was used by Ntent, formed as the result of a merger between Convera Corporation and Firstlight ERA. If accurate, the history of Convera is fascinating with Excalibur, ConQuest, and Allen & Co. in the mix.

In the “Will Natural Language Processing Change Search As We Know It” blog post, I noted these points:

  • Intranets incorporating NLP, semantic search and AI can fuel chatbots as well as end-to-end question-answering systems that live on top of search. It is a truly semantic extension to the search box with far-reaching implications for all types of search.
  • With NLP, enterprise knowledge contained in paper documentation can be encoded in a machine-readable format so the machine can read, process and understand it enough to formulate an intelligent response.
  • it’s good to know about established tool sets and methodologies for developing and creating effective solutions for use cases like technical support. But like all development projects, take care to create the tools based on mimicking the responses of actual human domain experts. Otherwise, you may run into the proverbial development problem of “garbage in, garbage out” which has plagued many such expert system initiatives.

Mr. Nelson is painting a reasonable picture about the narrow use of widely touted technologies. In fact, the promise of NLP has been part of enterprise search marketing for decades.

What I found interesting was the Convera document called “Accurate Search: What a Concept, published by Convera in 2002. I noted this passage on page 4 of the document:

Concept Search capitalizes on the richness of language, with its multiple term meanings, and transforms it from a problem into an advantage. RetrievalWare performs natural language processing and search term expansion to paraphrase queries, enabling retrieval of documents that contain the specific concepts requested rather than just the words typed during the query while also taking advantage of its semantic richness to rank documents in results lists. RetrievalWare’s powerful pattern search abilities overcome common errors in both content and queries, resulting in greater recall and user satisfaction.

I find the shift from a broad solution to a more narrow solution interesting. In the span of 15 years, the technology of search seems to be struggling to deliver.

Perhaps consulting and engineering services are needed to make search “work”? Contrast search with mobile phone technology. Progress has been evident. For search, success narrows to improving “documentation” and “customer support.”

Has anyone tried to reach PayPal’s customer support or United Airlines’ customer support? Try it. United was at one time a “customer” of Convera’s. From my point of view, United Airlines’ customer service has remained about the same over the last decade or two.

Enterprise search, broad or narrow, remains a challenge for marketers and users in my opinion. NLP, I assume, has arrived after a long journey. For a free profile of Convera, check out this link.

Stephen E Arnold, October 31, 2017

Free Language Learning Resources That Are Not Duolingo

October 25, 2017

For those who wish to learn a foreign language, the fun and engaging Duolingo has become a go-to free resource, offering courses in more than 20 languages. However, it is not the only game in town; MakeUseOf  gives us a rundown of “The Best (Completely Free) Language Learning Alternatives to Duolingo.” Writer Briallyn Smith tells us:

One of the reasons some people are looking to move away from Duolingo is the recent introduction of in-app purchases. While the core functions of Duolingo are still free, the purchase options can give learners a boost when playing games — much like the bonuses and extra lives you can purchase on Bejewelled or other addictive gaming apps. Learners may become frustrated when they are prevented from working on a specific language skill or accomplishment because they ran out of ‘hearts’ or need to purchase ‘gems’ to continue. Other in-app purchases allow users to remove ads from their learning experience and to download offline content.

While there’s nothing wrong with Duolingo charging fees for its services, it can be frustrating for those looking for a truly free resource. Other language learners simply do not enjoy learning through games. This is especially true for those who require industry-specific vocabulary or who already have a background in the language. Thankfully, there are many other online resources available for language learners. While you won’t get the same kind of program as Duolingo for free, you can easily use these resources to put together a language learning strategy that works well for you.

Before getting to her list, Smith takes a moment to advocate for paid language-learning services, like Babbel. Basically, if you are serious about your language studies and can afford it, they are worth the investment.

The resource list begins with a compound entry, Online Communities; included here are Fluent in 3 Months/r/LanguageLearning, and The Polyglot Club. Then there are Rhino Spike, Mango Languages, the Yojik Website, and, of course, YouTube (with a list of 10 suggested channels). Furthermore,  Smith supplies a link to OpenCulture for more even options. See the article for more about each of these entries.

Cynthia Murrell, October 25, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta