Linguistic Insight: Move Over, Parrots

February 7, 2020

DarkCyber noted an item sure to be of interest to the linguists laboring in the world of chat bots, NLP, and inference. “Penguins Follow Same Linguistic Patterns As Humans, Study Finds” states:

Words more frequently used by the animals are briefer, and longer words are composed of extra but briefer syllables, researchers say.

The write up also reveals:

Information compression is a general principle of human language.

Yep. Penguins better than parrots? Well, messier for sure.

Stephen E Arnold, February 7, 2020

Lexalytics: The RPA Market

December 12, 2019

RPA is an acronym which was new to the DarkCyber team. A bit of investigation pointed us to “Adding New NLP Capabilities for RPA: Build or Buy” plus other stories published by Lexalytics. This firm provides a sentiment analysis system. The idea is that smart software can figure out what the emotion of content objects is. Some sentiment analysis systems just use word lists. An email arrives with the word, “sue,” the smart software flags the email and, in theory, a human looks at the message. Other systems use a range of numerical recipes to figure out if a message contains an emotional payload.

Now RPA.

The idea is that robotic process automation is becoming more important. The vendors of RPA have to be aware that natural language processing related to text analytics is also increasing in importance. You can read about RPA on the Lexalytics blog at this link.

The jargon caught our attention. After a bit of discussion over lunch on December 5, 2019, we decided that RPA is a new term for workflows that are scripted and hopefully intelligent.

Now you know. RPA, workflow, not IPA.

Stephen E Arnold, December 12, 2019

Parsing Document: A Shift to Small Data

November 14, 2019

DarkCyber spotted “Eigen Nabs $37M to Help Banks and Others Parse Huge Documents Using Natural Language and Small Data.” The folks chasing the enterprise search pot of gold may need to pay attention to figuring out specific problems. Eigen uses search technology to identify the important items in long documents. The idea is “small data.”

The write up reports:

The basic idea behind Eigen is that it focuses what co-founder and CEO Lewis Liu describes as “small data”. The company has devised a way to “teach” an AI to read a specific kind of document — say, a loan contract — by looking at a couple of examples and training on these. The whole process is relatively easy to do for a non-technical person: you figure out what you want to look for and analyze, find the examples using basic search in two or three documents, and create the template which can then be used across hundreds or thousands of the same kind of documents (in this case, a loan contract).

Interesting, but the approach seems similar to identify several passages in a text and submitting these to a search engine. This used to be called “more like this.” But today? Small data.

With the cloud coming back on premises and big data becoming user identified small data, what’s next? Boolean queries?

DarkCyber hopes so.

Stephen E Arnold, November 14, 2019

Visual Data Exploration via Natural Language

November 4, 2019

New York University announced a natural language interface for data visualization. You can read the rah rah from the university here. The main idea is that a person can use simple English to create complex machine learning based visualizations. Sounds like the answer to a Wall Street analyst’s prayers.

The university reported:

A team at the NYU Tandon School of Engineering’s Visualization and Data Analytics (VIDA) lab, led by Claudio Silva, professor in the department of computer science and engineering, developed a framework called VisFlow, by which those who may not be experts in machine learning can create highly flexible data visualizations from almost any data. Furthermore, the team made it easier and more intuitive to edit these models by developing an extension of VisFlow called FlowSense, which allows users to synthesize data exploration pipelines through a natural language interface.

You can download (as of November 3, 2019, but no promises the document will be online after this date) “FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System.”

DarkCyber wants to point out that talking to a computer to get information continues to be of interest to many researchers. Will this innovation put human analysts out of their jobs.

Maybe not tomorrow but in the future. Absolutely. And what will those newly-unemployed people do for money?

Interesting question and one some may find difficult to consider at this time.

Stephen E Arnold, November 4, 2019


Sentiment Analysis: Still Ticking Despite Some Lickings

October 29, 2019

Sentiment analysis is a “function” that tries to identify the emotional payload of an object, typically text. Sentiment analysis of images, audio, and video is “just around the corner”, just like quantum computing and getting Windows 10 updates from killing some computers.

The Best Sentiment Analysis Tools of 2019” provides a list of go-to vendors. Like most lists some options do not appear; for example, Algeion and Vader. The list was compiled by MonkeyLearn, which is number one on the list. There are some surprises; for example, IBM Watson.

Stephen E Arnold, October 29, 2019

Real Life Q and A for Information Access Allegedly Arrives

October 14, 2019

DarkCyber noted “Promethium Tool Taps Natural Language Processing for Analytics.” The write up, which may be marketing oriented, asserts:

software, called Data Navigation System, was designed to enable non-technical users to make complex SQL requests using plain human language and ease the delivery of data.

The company developing the system is Promethium, founded in 2018, may have delivered what users have long wanted: Ask the computer a question and get a usable, actionable answer. If the write up is accurate, Promethium has achieved with $2.5 million in funding a function that many firms have pursued.

The article reports:

After users ask a question, Promethium locates the data, demonstrates how it should be assembled, automatically generates the SQL statement to get the correct data and executes the query. The queries run across all databases, data lakes and warehouses to draw actionable knowledge from multiple data sources. Simultaneously, Promethium ensures that data is complete while identifying duplications and providing lineage to confirm insights. Data Navigation System is offered as SaaS in the public cloud, in the customer’s virtual private cloud or as an on-premises option.

More information is available at the firm’s Web site.

Stephen E Arnold, October 14, 2019

Smart Software: About Those Methods?

July 23, 2019

An interesting paper germane to machine learning and smart software is available from The title? “Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches”.

The punch line for this academic document is, in the view of DarkCyber:

No way.

Your view may be different, but you will have to read the document, check out the diagrams, and scan the supporting information available on Github at this link.

The main idea is:

In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today’s machine learning scholarship and calls for improved scientific practices in this area.

So back to my summary, “No way.”

Here’s a “oh, how interesting chart.” Note the spikes:


Several observations:

  1. In an effort to get something to work, those who think in terms of algorithms take shortcuts; that is, operate in a clever way to produce something that’s good enough. “Good enough” is pretty much a C grade or “passing.”
  2. Math whiz hand waving and MBA / lawyer ignorance of what human judgments operate within an algorithmic operation guarantee that “good enough” becomes “Let’s see if this makes money.” You can substitute “reduce costs” if you wish. No big difference.
  3. Users accept whatever outputs a smart system deliver. Most people believe that “computers are right.” There’s nothing DarkCyber can do to make people more aware.
  4. Algorithms can be fiddled in the following ways: [a] Let these numerical recipes and the idiosyncrasies of calculation will just do their thing; for example, drift off in a weird direction or produce the equivalent of white noise; [b] get skewed because of the data flowing into the system automagically (very risky) or via human subject matter experts (also very risky); [c] the programmers implementing the algorithm focus on the code, speed, and deadline, not how the outputs flow; for example, k-means can be really mean and Bayesian methods can bay at the moon.

Net net: Worth reading this analysis.

Stephen E Arnold, July 23, 2019

New Jargon: Consultants, Start Your Engines

July 13, 2019

I read “What Is “Cognitive Linguistics“? The article appeared in Psychology Today. Disclaimer: I did some work for this outfit a long time ago. Anybody remember Charles Tillinghast, “CRM” when it referred to people, not a baloney discipline for a Rolodex filled with sales lead, and the use of Psychology Today as a text in a couple of universities? Yeah, I thought not. The Ziff connection is probably lost in the smudges of thumb typing too.

Onward: The write up explains a new spin on psychology, linguistics, and digital interaction. The jargon for this discipline or practice, if you will is:

Cognitive Linguistics

I must assume that the editorial processes at today’s Psychology Today are genetically linked to the procedures in use in — what was it, 1972? — but who knows.

excited fixed

Here’s the definition:

The cognitive linguistics enterprise is characterized by two key commitments. These are:
i) the Generalization Commitment: a commitment to the characterization of general principles that are responsible for all aspects of human language, and
ii) the Cognitive Commitment: a commitment to providing a characterization of general principles for language that accords with what is known about the mind and brain from other disciplines. As these commitments are what imbue cognitive linguistics with its distinctive character, and differentiate it from formal linguistics.

If you are into psychology and figuring out how to manipulate people or a Google ranking, perhaps this is the intellectual gold worth more than stolen treasure from Montezuma.

Several observations:

  1. I eagerly await an estimate from IDC for the size of the cognitive linguistics market, and I am panting with anticipation for a Garnter magic quadrant which positions companies as leaders, followers, outfits which did not pay for coverage, and names found with a Google search at Starbuck’s south of the old PanAm Building. Cognitive linguistics will have to wait until the two giants of expertise figure out how to define “personal computer market”, however.
  2. A series of posts from Dave Amerland and assorted wizards at SEO blogs which explain how to use the magic of cognitive linguistics to make a blog page — regardless of content, value, and coherence — number one for a Google query.
  3. A how to book from Wiley publishing called “Cognitive Linguistics for Dummies” with online reference material which may or many not actually be available via the link in the printed book
  4. A series of conferences run by assorted “instant conference” organizers with titles like “The Cognitive Linguistics Summit” or “Cognitive Linguistics: Global Impact”.

So many opportunities. Be still, my heart.

Cognitive linguistics — it’s time has come. Not a minute too soon for a couple of floundering enterprise search vendors to snag the buzzword and pivot to implementing cognitive linguistics for solving “all your information needs.” Which search company will embrace this technology: Coveo, IBM Watson, Sinequa?

DarkCyber is excited.

Stephen E Arnold, July 13, 2019

NLP: A Primer or Promotion?

April 19, 2019

The role of natural language processing has expanded greatly in recent years. For anyone who needs to get up to speed on this important technology, take note of this resource: IBM Developer shares “A Beginner’s Guide to Natural Language Processing.” Writer M. Tim Jones introduces his topic:

“In this article, we’ll examine natural language processing (NLP) and how it can help us to converse more naturally with computers.

Now the promotional part:

NLP is one of the most important subfields of machine learning for a variety of reasons. Natural language is the most natural interface between a user and a machine. In the ideal case, this involves speech recognition and voice generation. Even Alan Turing recognized this in his “intelligence” article, in which he defined the “Turing test” as a way to test a machine’s ability to exhibit intelligent behavior through a natural language conversation. …

We noted this statement:

“One of the key benefits of NLP is the massive amount of unstructured text data that exists in the world and acts as a driver for natural language processing and understanding. For a machine to process, organize, and understand this text (that was generated primarily for human consumption), we could unlock a large number of useful applications for future machine learning applications and a vast amount of knowledge that could be put to work.”

NLP actually represents several areas of research: speech recognition, natural language understanding, querying, ontology, natural language generation, and speech generation. Jones covers the history of NLP from 1954 to the present, then delves into some current approaches: word encodings, recurrent neural networks, reinforcement learning, and deep learning. The article closes by noting that the use of NLP continues to grow. Jones even points to Watson’s Jeopardy championship as evidence the technology is here to stay. Gee, I wonder why a more recent Watson success story wasn’t cited? And how about those IBM financials? Watson, what’s up?

Cynthia Murrell, April 19, 2019

Natural Language Processing: Will It Make Government Lingo Understandable

April 11, 2019

I noted a FBO posting for a request for information for natural language processing services. You can find the RFI at this link on the FBO Web site. Here’s a passage I found interesting:

OPM seeks information on how to use artificial intelligence, particularly natural language processing (NLP), to gain insights into statutory and regulatory text to support policy analysis. The NLP capabilities should include topic modeling; text categorization; text clustering; information extraction; named entity resolution; relationship extraction; sentiment analysis; and summarization. The NLP project may include statistical techniques that can provide a general understanding of the statutory and regulatory text as a whole. In addition, OPM seeks to learn more about chatbots and transactional bots that are easy to implement and customize with the goal of extending bot-building capabilities to non-IT employees. (Maximum 4 pages requested.)

The goal is to obtain information about a system that performs the functions associated with an investigative software system; for example, Palantir Technologies, IBM i2, or one of the numerous companies operating from Herzliya, north of Tel Aviv.

I am curious about the service provider who assisted in the preparation of this RFI. The time window is responding is measured in days. With advanced text analysis systems abounding in US government agencies from the Department of Justice to the Department of Defense and beyond, I wonder why additional requests for information are required.

Ah, procurement. A process in love with buzzwords so an NLP system can make things more clear. Sounds like a plan.

Stephen E Arnold, April 11, 2019

Next Page »

  • Archives

  • Recent Posts

  • Meta