Smart Software: About Those Methods?

July 23, 2019

An interesting paper germane to machine learning and smart software is available from Arxiv.org. The title? “Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches”.

The punch line for this academic document is, in the view of DarkCyber:

No way.

Your view may be different, but you will have to read the document, check out the diagrams, and scan the supporting information available on Github at this link.

The main idea is:

In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today’s machine learning scholarship and calls for improved scientific practices in this area.

So back to my summary, “No way.”

Here’s a “oh, how interesting chart.” Note the spikes:

image

Several observations:

  1. In an effort to get something to work, those who think in terms of algorithms take shortcuts; that is, operate in a clever way to produce something that’s good enough. “Good enough” is pretty much a C grade or “passing.”
  2. Math whiz hand waving and MBA / lawyer ignorance of what human judgments operate within an algorithmic operation guarantee that “good enough” becomes “Let’s see if this makes money.” You can substitute “reduce costs” if you wish. No big difference.
  3. Users accept whatever outputs a smart system deliver. Most people believe that “computers are right.” There’s nothing DarkCyber can do to make people more aware.
  4. Algorithms can be fiddled in the following ways: [a] Let these numerical recipes and the idiosyncrasies of calculation will just do their thing; for example, drift off in a weird direction or produce the equivalent of white noise; [b] get skewed because of the data flowing into the system automagically (very risky) or via human subject matter experts (also very risky); [c] the programmers implementing the algorithm focus on the code, speed, and deadline, not how the outputs flow; for example, k-means can be really mean and Bayesian methods can bay at the moon.

Net net: Worth reading this analysis.

Stephen E Arnold, July 23, 2019

New Jargon: Consultants, Start Your Engines

July 13, 2019

I read “What Is “Cognitive Linguistics“? The article appeared in Psychology Today. Disclaimer: I did some work for this outfit a long time ago. Anybody remember Charles Tillinghast, “CRM” when it referred to people, not a baloney discipline for a Rolodex filled with sales lead, and the use of Psychology Today as a text in a couple of universities? Yeah, I thought not. The Ziff connection is probably lost in the smudges of thumb typing too.

Onward: The write up explains a new spin on psychology, linguistics, and digital interaction. The jargon for this discipline or practice, if you will is:

Cognitive Linguistics

I must assume that the editorial processes at today’s Psychology Today are genetically linked to the procedures in use in — what was it, 1972? — but who knows.

excited fixed

Here’s the definition:

The cognitive linguistics enterprise is characterized by two key commitments. These are:
i) the Generalization Commitment: a commitment to the characterization of general principles that are responsible for all aspects of human language, and
ii) the Cognitive Commitment: a commitment to providing a characterization of general principles for language that accords with what is known about the mind and brain from other disciplines. As these commitments are what imbue cognitive linguistics with its distinctive character, and differentiate it from formal linguistics.

If you are into psychology and figuring out how to manipulate people or a Google ranking, perhaps this is the intellectual gold worth more than stolen treasure from Montezuma.

Several observations:

  1. I eagerly await an estimate from IDC for the size of the cognitive linguistics market, and I am panting with anticipation for a Garnter magic quadrant which positions companies as leaders, followers, outfits which did not pay for coverage, and names found with a Google search at Starbuck’s south of the old PanAm Building. Cognitive linguistics will have to wait until the two giants of expertise figure out how to define “personal computer market”, however.
  2. A series of posts from Dave Amerland and assorted wizards at SEO blogs which explain how to use the magic of cognitive linguistics to make a blog page — regardless of content, value, and coherence — number one for a Google query.
  3. A how to book from Wiley publishing called “Cognitive Linguistics for Dummies” with online reference material which may or many not actually be available via the link in the printed book
  4. A series of conferences run by assorted “instant conference” organizers with titles like “The Cognitive Linguistics Summit” or “Cognitive Linguistics: Global Impact”.

So many opportunities. Be still, my heart.

Cognitive linguistics — it’s time has come. Not a minute too soon for a couple of floundering enterprise search vendors to snag the buzzword and pivot to implementing cognitive linguistics for solving “all your information needs.” Which search company will embrace this technology: Coveo, IBM Watson, Sinequa?

DarkCyber is excited.

Stephen E Arnold, July 13, 2019

NLP: A Primer or Promotion?

April 19, 2019

The role of natural language processing has expanded greatly in recent years. For anyone who needs to get up to speed on this important technology, take note of this resource: IBM Developer shares “A Beginner’s Guide to Natural Language Processing.” Writer M. Tim Jones introduces his topic:

“In this article, we’ll examine natural language processing (NLP) and how it can help us to converse more naturally with computers.

Now the promotional part:

NLP is one of the most important subfields of machine learning for a variety of reasons. Natural language is the most natural interface between a user and a machine. In the ideal case, this involves speech recognition and voice generation. Even Alan Turing recognized this in his “intelligence” article, in which he defined the “Turing test” as a way to test a machine’s ability to exhibit intelligent behavior through a natural language conversation. …

We noted this statement:

“One of the key benefits of NLP is the massive amount of unstructured text data that exists in the world and acts as a driver for natural language processing and understanding. For a machine to process, organize, and understand this text (that was generated primarily for human consumption), we could unlock a large number of useful applications for future machine learning applications and a vast amount of knowledge that could be put to work.”

NLP actually represents several areas of research: speech recognition, natural language understanding, querying, ontology, natural language generation, and speech generation. Jones covers the history of NLP from 1954 to the present, then delves into some current approaches: word encodings, recurrent neural networks, reinforcement learning, and deep learning. The article closes by noting that the use of NLP continues to grow. Jones even points to Watson’s Jeopardy championship as evidence the technology is here to stay. Gee, I wonder why a more recent Watson success story wasn’t cited? And how about those IBM financials? Watson, what’s up?

Cynthia Murrell, April 19, 2019

Natural Language Processing: Will It Make Government Lingo Understandable

April 11, 2019

I noted a FBO posting for a request for information for natural language processing services. You can find the RFI at this link on the FBO Web site. Here’s a passage I found interesting:

OPM seeks information on how to use artificial intelligence, particularly natural language processing (NLP), to gain insights into statutory and regulatory text to support policy analysis. The NLP capabilities should include topic modeling; text categorization; text clustering; information extraction; named entity resolution; relationship extraction; sentiment analysis; and summarization. The NLP project may include statistical techniques that can provide a general understanding of the statutory and regulatory text as a whole. In addition, OPM seeks to learn more about chatbots and transactional bots that are easy to implement and customize with the goal of extending bot-building capabilities to non-IT employees. (Maximum 4 pages requested.)

The goal is to obtain information about a system that performs the functions associated with an investigative software system; for example, Palantir Technologies, IBM i2, or one of the numerous companies operating from Herzliya, north of Tel Aviv.

I am curious about the service provider who assisted in the preparation of this RFI. The time window is responding is measured in days. With advanced text analysis systems abounding in US government agencies from the Department of Justice to the Department of Defense and beyond, I wonder why additional requests for information are required.

Ah, procurement. A process in love with buzzwords so an NLP system can make things more clear. Sounds like a plan.

Stephen E Arnold, April 11, 2019

Short Honk: NLP Tools

March 26, 2019

Making sense of unstructured content is tricky. If you are looking for opens source natural language processing tools, “12 Open Source Tools for Natural Language Processing” provides a list.

Stephen E Arnold, March 26, 2019

Text Analysis Toolkits

March 16, 2019

One of the DarkCyber team spotted a useful list, published by MonkeyLearn. Tucked into a narrative called “Text Analysis: The Only Guide You’ll Ever Need” was a list of natural language processing open source tools, programming languages, and software. Each description is accompanied with links and in several cases comments. See the original article for more information.

Caret

CoreNLP

Java

Keras

mlr

NLTK

OpenNLP

Python

SpaCy

Scikit-learn

TensorFlow

PyTorch

R

Weka

Stephen E Arnold, March 16, 2019

Deloitte and NLP: Is the Analysis On Time and Off Target?

January 18, 2019

I read “Using AI to Unleash the Power of Unstructured Government Data.” I was surprised because I thought that US government agencies were using smart software (NLP, smart ETL, digital notebooks, etc.). My recollection is that use of these types of tools began in the mid 1990s, maybe a few years earlier. i2 Ltd., a firm for which I did a few minor projects, rolled out its Analyst’s Notebook in the mid 1990s, and it gained traction in a number of government agencies a couple of years after British government units began using the software.

The write up states:

DoD’s Defense Advanced Research Projects Agency (DARPA) recently created the Deep Exploration and Filtering of Text (DEFT) program, which uses natural language processing (NLP), a form of artificial intelligence, to automatically extract relevant information and help analysts derive actionable insights from it.

My recollection is that DEFT fired up in 2010 or 2011. Once funding became available, activity picked up in 2012. That was six years ago.

However, DEFT is essentially a follow on from other initiatives which reach by to Purple Yogi (Stratify) and DR-LINK, among others.

The capabilities of NLP are presented as closely linked technical activities; for example:

  • Name entity resolution
  • Relationship extraction
  • Sentiment analysis
  • Topic modeling
  • Text categorization
  • Text clustering
  • Information extraction

The collection of buzzwords is interesting. I would annotate each of these items to place them in the context of my research into content processing, intelware, and related topics:

Read more

Natural Language Processing: Brittle and Spurious

August 24, 2018

I read “NLP’s Generalization Problem, and How Researchers Are Tackling It.” From my vantage point in rural Kentucky, the write up seems to say, “NLP does not work particularly well.”

For certain types of content in which terminology is constrained, NLP systems work okay. But, like clustering, the initial assignment of any object determines much about the system. Examples range from jargon, code words, phrases which are aliases, etc. NLP systems struggle in a single language system.

The write up provides interesting examples of NLP failure.

The fixes, alas, are not likely to deliver the bacon any time soon. Yep, “bacon” means a technical breakthrough. NLP systems struggle with this type of utterance. I refer to local restaurants as the nasty caballero, which is my way of saying “the local Mexican restaurant on the river.”

I like the suggestion that NLP systems should use common sense. Isn’t that the method that AskJeeves tried when it allegedly revolutionized NLP question answering? The problem, of course, was the humans had to craft rules and that took money, time, and even more money.

The suggestion to “Evaluate unseen distributions and unseen tasks.” That’s interesting as well. The challenge is the one that systems like IBM Watson face. Humans have to make decisions about dicey issues like clustering, then identify relevant training data, and index the text with metadata.

Same problem: Time and money.

For certain applications, NLP can be helpful. For other types of content comprehension, one ends up with the problem of getting Gertie (the NLP system) up and running. Then after a period of time (often a day or two), hooking Gertie to the next Star Trek innovation from Sillycon Valley.

How do you think NLP systems handle my writing style? Let’s ask some NLP systems? DR LINK? IBM Watson? Volunteers?

Stephen E Arnold, August 24, 2018

Alexa Is Still Taking Language Lessons

August 24, 2018

Though Amazon has been aware of the problem for a while, Alexa still responds better to people who sound like those she grew up with than she does to others. It is a problem many of us can relate to, but one the company really needs to solve as it continues to deploy its voice-activated digital assistant worldwide.   TheNextWeb cites a recent Washington Post study as it reports, “Alexa Needs Better Training to Understand Non-American Accents.” It is worth noting it is not just foreign accents the software cannot recognize—the device has trouble with many regional dialects within the US, as well.

“The team had more than 100 people from nearly 20 US cities dictate thousands of voice commands to Alexa. From the exercise, it found that Amazon’s Alexa-based voice-activated speaker was 30 percent less likely to comprehend commands issued by people with non-American accents. The Washington Post also reported that people with Spanish as their first language were understood 6 percent less often than people who grew up around California or Washington and spoke English as a first language.Amazon officials also admitted to The Washington Post that grasping non-American accents poses a major challenge both in keeping current Amazon Echo users satisfied, and expanding sales of their devices worldwide. Rachael Tatman, a Kaggle data scientist with expertise in speech recognition, told The Washington Post that this was evidence of bias in the training provided to voice recognition systems.‘These systems are going to work best for white, highly educated, upper-middle-class Americans, probably from the West Coast, because that’s the group that’s had access to the technology from the very beginning,’ she said.”

Yes, the bias we find here is the natural result of working with what you have where you are, and perhaps Amazon can be forgiven for not foreseeing the problem from the beginning. Perhaps. The article grants that the company has been working toward a resolution, and references their efforts to prepare for the Indian market as an example. It seems to be slow going.

Cynthia Murrell, August 24, 2018

IBM and Distancing: New Collar Jobs in France

May 23, 2018

I have zero idea if the information in the article “Exclusive: IBM bringing 1,800 AI jobs to France.” The story caught my attention because I had read “Macron Vowed to Make France a ‘Start-Up Nation.’ Is It Getting There?” You can find the story online at this link, although I read a version of the story in my dead tree edition of the real “news” paper at breakfast this morning (May 23, 2018).

Perhaps IBM recognizes that the “culture” of France makes it difficult for startups to get funding without the French management flair. Consequently a bold and surgical move to use IBM management expertise could make blockchain, AI, and Watson sing Johnny Hallyday’s Johnny, reviens ! Les Rocks les plus terribles and shoot to the top of YouTube views.

On the other hand, the play may be a long shot.

What I did find interesting in the write up was this statement:

IBM continues to make moves aimed at distancing itself from peers.

That is fascinating. IBM has faced a bit of pushback as it made some personnel decisions which annoyed some IBMers. One former IBM senior manager just shook his head and grimaced when I mentioned the floundering of the Watson billion dollar bet. I dared not bring up riffing workers over 55. That’s a sore subject for some Big Blue bleeders.

I also liked the “New Collar” buzzword.

To sum up, I assume that IBM will bring the New Collar fashion wave to the stylish world of French technology.

Let’s ask Watson. No, bad idea. Let’s not. I don’t have the time to train Watson to make sense of questions about French finance, technology, wine, cheese, schools, family history, and knowledge of Molière.

Stephen E Arnold, May 23, 2018

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta