Like Life, Chatbots Are Semi Perfect

September 22, 2020

Chatbots are notoriously dumb pieces of AI that parrot information coded into their programs. They are also annoying, because they never have the correct information. Chatbots, however, are useful tools and developers are improving them to actually be useful. Medium runs down the differences between chatbots: “Updated: A Comparison Of Eight Chatbot Environments.”

Most chatbot environments have the same approach for a conversational interface, but there are four distinct development groups: avant-garde, NLU/NLP tools, use-the-cloud-you’re-in, and leading commercial cloud offerings. There are cross-industry trends across these groups:

“ The merging of intents and entities

• Contextual entities. Hence entities sans a finite list and which is detected by their context within a user utterance.

• Deprecation of the State Machine. Or at least, towards a more conversational like interface.

• Complex entities; introducing entities with properties, groups, roles etc.”

Beyond the industry trends, chatbots are transitioning from the stupid instant messaging programs to interactive, natural language driven, digital employee that “thinks and acts” like a real human. Companies want to use chatbots to grow by being able to comprehend past and current conversations, from multiple sources, and from CRM sources.

Chatbots cannot be compared because their frameworks are so different, but there are five consideration points. The NLU features, ecosystem maturity, licensing/usage costs, graphic call flow front-end developing and editing, and scalability and enterprise readiness are the important consideration points.

Chatbots are becoming smarter and already handle many customer service jobs. If they can actually resolve the problems customers contact companies for, then science fiction truly has become reality.

Whitney Grace, September 22, 2020

Forget Structured Query Language Commands? Yeah, Not Yet

August 29, 2020

One of the DarkCyber team spotted a demonstration service called The idea is that the system will accept natural language queries of information stored in structured databases. According to the DarkCyber person, the queries launched into the natural language box were:

Sheva War with Whom

Sheva Frequency

The sparse interface sports a Content button which displays the information in the system.

How did this work?


Not well. NLP systems pose challenges still it seems.

Interesting idea but some rough edges need a bit of touch up.

Stephen E Arnold, August 29, 2020

NLP: A Time for Reflection or a Way to Shape Decades of Hyperbole and Handwaving?

August 2, 2020

The most unusual online information service published “The Field of Natural Language Processing Is Chasing the Wrong Goal.” The article comments about the Association for Computational Linguistics Conference held in July 2020.

The point of the write up is to express concern about the whither and why of NLP; for example:

My colleagues and I at Elemental Cognition, an AI research firm based in Connecticut and New York, see the angst as justified. In fact, we believe that the field needs a transformation, not just in system design, but in a less glamorous area: evaluation.


Yep, the discipline appears to be chasing benchmarks. DarkCyber believes this is a version of the intra-squad rivalries as players vie to start the next game.

The write up raises this question:

How did the NLP community end up with such a gap between on-paper evaluations and real-world ability? In an ACL position paper, my colleagues and I argue that in the quest to reach difficult benchmarks, evaluations have lost sight of the real targets: those sophisticated downstream applications. To borrow a line from the paper, the NLP researchers have been training to become professional sprinters by “glancing around the gym and adopting any exercises that look hard.”

The answer, in part, is for NLP developers to follow this path:

But our argument is more basic: however systems are implemented, if they need to have faithful world models, then evaluations should systematically test whether they have faithful world models.

DarkCyber’s view is that NLP like other building blocks of content analysis and access systems have some characteristics which cause intra-squad similarities; that is, the players are more similar than even they understand:

  1. Reliance on methods widely taught in universities. Who wants to go in a new direction, fail, and, therefore, be perceived as a dead ender?
  2. Competing with one’s team mates, peers, and fellow travelers is comfortable. Who wants to try and explain why NLP from A is better than NLP from B when the results are more of the same?
  3. NLP like other content functions is positioned as the big solution to tough content challenges. The reality is that language is slippery and often less fancy methods deliver good enough results. Who wants to admit that a particular approach is “good enough.” It is better to get out the pink wrapping paper and swath the procedures in colorful garb.

NLP can be and is useful in many situations. The problem is that making sense of human utterances remains a difficult challenge. DarkCyber is suspicious of appeals emitted by the Epstein-funded MIT entity.

Jargon is jargon. NLP is one of those disciplines which works overtime to deliver on promises that have been made for many years. Does NLP pay off? This is like MIT asking, “Epstein who?”

Stephen E Arnold, August 2, 2020

Natural Language Processing: Useful Papers Selected by an Informed Human

July 28, 2020

Nope, no artificial intelligence involved in this curated list of papers from a recent natural language conference. Ten papers are available with a mouse click. Quick takeaway: Adversarial methods seem to be a hot ticket. Navigate to “The Ten Must Read NLP/NLU Papers from the ICLR 2020 Conference.” Useful editorial effort and a clear, adult presentation of the bibliographic information. Kudos to jakubczakon.

Stephen E Arnold, July 27, 2020

Jargon Alert: Direct from the Video Game Universe

July 22, 2020

I scanned a write up called “Who Will Win the Epic Battle for Online Meeting Hegemony?” The write up was a rah rah for Microsoft because, you know, it’s Microsoft.

Stepping away from the “epic battle,” the write up contained a word from the video game universe. (It’s a fine place: Courteous, diverse, and welcoming.)

The word is “upleveled” and it was used in this way:

Upleveled security and encryption. Remote work sites, especially home offices, have become a prime target for a surge in cybersecurity attacks due to their less hardened and secure nature.

A “level” in a game produced the phrase “level up” to communicate that one moved from loser level 2 to almost normal level 3. That “jump” is known as a “level up.”

Now the phrase has become an adjective as in “leveled up.”

DarkCyber believes that the phrase will be applied in this way:

That AI program upleveled its accuracy.

Oh, and the article: Go Microsoft Teams. It’s an elephant and one knows what elephants do. If you are near an elephant uplevel your rubber boots. Will natural language processing get the drift?

Stephen E Arnold, July 22, 2020

NLP with an SEO Spin

July 8, 2020

If you want to know how search engine optimization has kicked librarians and professional indexers in the knee and stomped on their writing hand, you will enjoy “Classifying 200,000 Articles in 7 Hours Using NLP” makes clear that human indexers are going to become the lamp lighters of the 21st century. Imagine. No libraries, no subject matter experts curating and indexing content, no human judgment. Nifty. Perfect for a post Quibi world.

The write up explains the indexing methods of one type of smart software. The passages below highlights the main features of the method:

Weak supervision: the human annotator explains their chosen label to the AI model by highlighting the key phrases in the example that helped them make the decision. These highlights are then used to automatically generate nuanced rules, which are combined and used to augment the training dataset and boost the model’s quality.

Uncertainty sampling: it finds those examples for which the model is most uncertain, and suggests them for human review.
Diversity sampling: it helps make sure that the dataset covers as diverse a set of data as possible. This ensures the model learns to handle all of the real-world cases.

Guided learning: it allows you to search through your dataset for key examples. This is particularly useful when the original dataset is very imbalanced (it contains very few examples of the category you care about).

These phrases may not be clear. May I elucidate:

  • Weak supervision. Subject matter experts riding herd. No way. Inefficient and not optimizable.
  • Uncertainty sampling means a “fudge factor” or “fuzzifying.” A metaphor might be “close enough for horse shoes.”
  • Guided learning. Yep, manual assembly of training data, recalibration, and more training until the horse shoe thing scores a point.

The write up undermines its good qualities with a reference to Google. Has anyone noticed that Google’s first page of results for most of my queries are advertisements.

NLP and horse shoes. Perfect match. Why are the index and classification codes those which an educated person would find understandable and at hand? Forget answering this question. Just remember good enough and close enough for horse shoes. Clang and kha-ching as another ad sucks in a bidder.

Stephen E Arnold, July 8, 2020

Another Low Profile, Specialized Services Firm Goes for Mad Ave Marketing

April 25, 2020

Investigative software firm ShadowDragon looks beyond traditional cyber-attacks in its latest podcast, “Cyber Cyber Bang Bang—Attacks Exploiting Risks Within the Physical and Cyber Universe.” The four-and-a-half-minute podcast is the fourth in a series that was launched on April second. The description tells us:

“Truly Advanced Persistent attacks where physical exploitation and even death are rarely discussed. We cover some of this along with security within the Healthcare and Government space. Security Within Healthcare and government is always hard. Tensions between information security and the business make this harder. Hospitals hit in fall of 2019 had a taste of exploitation. Similarly, state governments have had issues with cartel related attackers. CISO’s that enable assessment, and security design around systems that cannot be fully hardened can kill two birds with one stone. Weighing authority versus influence, FDA approved equipment, 0day discovery within applications. Designing security around systems is a must when unpatchable vulnerabilities exist.”

Hosts Daniel Clemens and Brian Dykstra begin by answering some questions from the previous podcast then catch up on industry developments. The get into security challenges for hospitals and government agencies not quite halfway through.

A company of fewer than 50 workers, ShadowDragon keeps a low profile. Created “by investigators for investigators,” its cyber security tools include AliasDB, MalNet, OIMonitor, SocialNet, and Spotter. The firm also supports their clients with training, integration, conversion, and customization. ShadowDragon was launched in 2015 and is based in Cheyenne, Wyoming.

Cynthia Murrell, April 13, 2020

Linguistic Insight: Move Over, Parrots

February 7, 2020

DarkCyber noted an item sure to be of interest to the linguists laboring in the world of chat bots, NLP, and inference. “Penguins Follow Same Linguistic Patterns As Humans, Study Finds” states:

Words more frequently used by the animals are briefer, and longer words are composed of extra but briefer syllables, researchers say.

The write up also reveals:

Information compression is a general principle of human language.

Yep. Penguins better than parrots? Well, messier for sure.

Stephen E Arnold, February 7, 2020

Lexalytics: The RPA Market

December 12, 2019

RPA is an acronym which was new to the DarkCyber team. A bit of investigation pointed us to “Adding New NLP Capabilities for RPA: Build or Buy” plus other stories published by Lexalytics. This firm provides a sentiment analysis system. The idea is that smart software can figure out what the emotion of content objects is. Some sentiment analysis systems just use word lists. An email arrives with the word, “sue,” the smart software flags the email and, in theory, a human looks at the message. Other systems use a range of numerical recipes to figure out if a message contains an emotional payload.

Now RPA.

The idea is that robotic process automation is becoming more important. The vendors of RPA have to be aware that natural language processing related to text analytics is also increasing in importance. You can read about RPA on the Lexalytics blog at this link.

The jargon caught our attention. After a bit of discussion over lunch on December 5, 2019, we decided that RPA is a new term for workflows that are scripted and hopefully intelligent.

Now you know. RPA, workflow, not IPA.

Stephen E Arnold, December 12, 2019

Parsing Document: A Shift to Small Data

November 14, 2019

DarkCyber spotted “Eigen Nabs $37M to Help Banks and Others Parse Huge Documents Using Natural Language and Small Data.” The folks chasing the enterprise search pot of gold may need to pay attention to figuring out specific problems. Eigen uses search technology to identify the important items in long documents. The idea is “small data.”

The write up reports:

The basic idea behind Eigen is that it focuses what co-founder and CEO Lewis Liu describes as “small data”. The company has devised a way to “teach” an AI to read a specific kind of document — say, a loan contract — by looking at a couple of examples and training on these. The whole process is relatively easy to do for a non-technical person: you figure out what you want to look for and analyze, find the examples using basic search in two or three documents, and create the template which can then be used across hundreds or thousands of the same kind of documents (in this case, a loan contract).

Interesting, but the approach seems similar to identify several passages in a text and submitting these to a search engine. This used to be called “more like this.” But today? Small data.

With the cloud coming back on premises and big data becoming user identified small data, what’s next? Boolean queries?

DarkCyber hopes so.

Stephen E Arnold, November 14, 2019

Next Page »

  • Archives

  • Recent Posts

  • Meta