NLP with an SEO Spin

July 8, 2020

If you want to know how search engine optimization has kicked librarians and professional indexers in the knee and stomped on their writing hand, you will enjoy “Classifying 200,000 Articles in 7 Hours Using NLP” makes clear that human indexers are going to become the lamp lighters of the 21st century. Imagine. No libraries, no subject matter experts curating and indexing content, no human judgment. Nifty. Perfect for a post Quibi world.

The write up explains the indexing methods of one type of smart software. The passages below highlights the main features of the method:

Weak supervision: the human annotator explains their chosen label to the AI model by highlighting the key phrases in the example that helped them make the decision. These highlights are then used to automatically generate nuanced rules, which are combined and used to augment the training dataset and boost the model’s quality.

Uncertainty sampling: it finds those examples for which the model is most uncertain, and suggests them for human review.
Diversity sampling: it helps make sure that the dataset covers as diverse a set of data as possible. This ensures the model learns to handle all of the real-world cases.

Guided learning: it allows you to search through your dataset for key examples. This is particularly useful when the original dataset is very imbalanced (it contains very few examples of the category you care about).

These phrases may not be clear. May I elucidate:

  • Weak supervision. Subject matter experts riding herd. No way. Inefficient and not optimizable.
  • Uncertainty sampling means a “fudge factor” or “fuzzifying.” A metaphor might be “close enough for horse shoes.”
  • Guided learning. Yep, manual assembly of training data, recalibration, and more training until the horse shoe thing scores a point.

The write up undermines its good qualities with a reference to Google. Has anyone noticed that Google’s first page of results for most of my queries are advertisements.

NLP and horse shoes. Perfect match. Why are the index and classification codes those which an educated person would find understandable and at hand? Forget answering this question. Just remember good enough and close enough for horse shoes. Clang and kha-ching as another ad sucks in a bidder.

Stephen E Arnold, July 8, 2020

Another Low Profile, Specialized Services Firm Goes for Mad Ave Marketing

April 25, 2020

Investigative software firm ShadowDragon looks beyond traditional cyber-attacks in its latest podcast, “Cyber Cyber Bang Bang—Attacks Exploiting Risks Within the Physical and Cyber Universe.” The four-and-a-half-minute podcast is the fourth in a series that was launched on April second. The description tells us:

“Truly Advanced Persistent attacks where physical exploitation and even death are rarely discussed. We cover some of this along with security within the Healthcare and Government space. Security Within Healthcare and government is always hard. Tensions between information security and the business make this harder. Hospitals hit in fall of 2019 had a taste of exploitation. Similarly, state governments have had issues with cartel related attackers. CISO’s that enable assessment, and security design around systems that cannot be fully hardened can kill two birds with one stone. Weighing authority versus influence, FDA approved equipment, 0day discovery within applications. Designing security around systems is a must when unpatchable vulnerabilities exist.”

Hosts Daniel Clemens and Brian Dykstra begin by answering some questions from the previous podcast then catch up on industry developments. The get into security challenges for hospitals and government agencies not quite halfway through.

A company of fewer than 50 workers, ShadowDragon keeps a low profile. Created “by investigators for investigators,” its cyber security tools include AliasDB, MalNet, OIMonitor, SocialNet, and Spotter. The firm also supports their clients with training, integration, conversion, and customization. ShadowDragon was launched in 2015 and is based in Cheyenne, Wyoming.

Cynthia Murrell, April 13, 2020

Linguistic Insight: Move Over, Parrots

February 7, 2020

DarkCyber noted an item sure to be of interest to the linguists laboring in the world of chat bots, NLP, and inference. “Penguins Follow Same Linguistic Patterns As Humans, Study Finds” states:

Words more frequently used by the animals are briefer, and longer words are composed of extra but briefer syllables, researchers say.

The write up also reveals:

Information compression is a general principle of human language.

Yep. Penguins better than parrots? Well, messier for sure.

Stephen E Arnold, February 7, 2020

Lexalytics: The RPA Market

December 12, 2019

RPA is an acronym which was new to the DarkCyber team. A bit of investigation pointed us to “Adding New NLP Capabilities for RPA: Build or Buy” plus other stories published by Lexalytics. This firm provides a sentiment analysis system. The idea is that smart software can figure out what the emotion of content objects is. Some sentiment analysis systems just use word lists. An email arrives with the word, “sue,” the smart software flags the email and, in theory, a human looks at the message. Other systems use a range of numerical recipes to figure out if a message contains an emotional payload.

Now RPA.

The idea is that robotic process automation is becoming more important. The vendors of RPA have to be aware that natural language processing related to text analytics is also increasing in importance. You can read about RPA on the Lexalytics blog at this link.

The jargon caught our attention. After a bit of discussion over lunch on December 5, 2019, we decided that RPA is a new term for workflows that are scripted and hopefully intelligent.

Now you know. RPA, workflow, not IPA.

Stephen E Arnold, December 12, 2019

Parsing Document: A Shift to Small Data

November 14, 2019

DarkCyber spotted “Eigen Nabs $37M to Help Banks and Others Parse Huge Documents Using Natural Language and Small Data.” The folks chasing the enterprise search pot of gold may need to pay attention to figuring out specific problems. Eigen uses search technology to identify the important items in long documents. The idea is “small data.”

The write up reports:

The basic idea behind Eigen is that it focuses what co-founder and CEO Lewis Liu describes as “small data”. The company has devised a way to “teach” an AI to read a specific kind of document — say, a loan contract — by looking at a couple of examples and training on these. The whole process is relatively easy to do for a non-technical person: you figure out what you want to look for and analyze, find the examples using basic search in two or three documents, and create the template which can then be used across hundreds or thousands of the same kind of documents (in this case, a loan contract).

Interesting, but the approach seems similar to identify several passages in a text and submitting these to a search engine. This used to be called “more like this.” But today? Small data.

With the cloud coming back on premises and big data becoming user identified small data, what’s next? Boolean queries?

DarkCyber hopes so.

Stephen E Arnold, November 14, 2019

Visual Data Exploration via Natural Language

November 4, 2019

New York University announced a natural language interface for data visualization. You can read the rah rah from the university here. The main idea is that a person can use simple English to create complex machine learning based visualizations. Sounds like the answer to a Wall Street analyst’s prayers.

The university reported:

A team at the NYU Tandon School of Engineering’s Visualization and Data Analytics (VIDA) lab, led by Claudio Silva, professor in the department of computer science and engineering, developed a framework called VisFlow, by which those who may not be experts in machine learning can create highly flexible data visualizations from almost any data. Furthermore, the team made it easier and more intuitive to edit these models by developing an extension of VisFlow called FlowSense, which allows users to synthesize data exploration pipelines through a natural language interface.

You can download (as of November 3, 2019, but no promises the document will be online after this date) “FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System.”

DarkCyber wants to point out that talking to a computer to get information continues to be of interest to many researchers. Will this innovation put human analysts out of their jobs.

Maybe not tomorrow but in the future. Absolutely. And what will those newly-unemployed people do for money?

Interesting question and one some may find difficult to consider at this time.

Stephen E Arnold, November 4, 2019

 

Sentiment Analysis: Still Ticking Despite Some Lickings

October 29, 2019

Sentiment analysis is a “function” that tries to identify the emotional payload of an object, typically text. Sentiment analysis of images, audio, and video is “just around the corner”, just like quantum computing and getting Windows 10 updates from killing some computers.

The Best Sentiment Analysis Tools of 2019” provides a list of go-to vendors. Like most lists some options do not appear; for example, Algeion and Vader. The list was compiled by MonkeyLearn, which is number one on the list. There are some surprises; for example, IBM Watson.

Stephen E Arnold, October 29, 2019

Real Life Q and A for Information Access Allegedly Arrives

October 14, 2019

DarkCyber noted “Promethium Tool Taps Natural Language Processing for Analytics.” The write up, which may be marketing oriented, asserts:

software, called Data Navigation System, was designed to enable non-technical users to make complex SQL requests using plain human language and ease the delivery of data.

The company developing the system is Promethium, founded in 2018, may have delivered what users have long wanted: Ask the computer a question and get a usable, actionable answer. If the write up is accurate, Promethium has achieved with $2.5 million in funding a function that many firms have pursued.

The article reports:

After users ask a question, Promethium locates the data, demonstrates how it should be assembled, automatically generates the SQL statement to get the correct data and executes the query. The queries run across all databases, data lakes and warehouses to draw actionable knowledge from multiple data sources. Simultaneously, Promethium ensures that data is complete while identifying duplications and providing lineage to confirm insights. Data Navigation System is offered as SaaS in the public cloud, in the customer’s virtual private cloud or as an on-premises option.

More information is available at the firm’s Web site.

Stephen E Arnold, October 14, 2019

Smart Software: About Those Methods?

July 23, 2019

An interesting paper germane to machine learning and smart software is available from Arxiv.org. The title? “Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches”.

The punch line for this academic document is, in the view of DarkCyber:

No way.

Your view may be different, but you will have to read the document, check out the diagrams, and scan the supporting information available on Github at this link.

The main idea is:

In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today’s machine learning scholarship and calls for improved scientific practices in this area.

So back to my summary, “No way.”

Here’s a “oh, how interesting chart.” Note the spikes:

image

Several observations:

  1. In an effort to get something to work, those who think in terms of algorithms take shortcuts; that is, operate in a clever way to produce something that’s good enough. “Good enough” is pretty much a C grade or “passing.”
  2. Math whiz hand waving and MBA / lawyer ignorance of what human judgments operate within an algorithmic operation guarantee that “good enough” becomes “Let’s see if this makes money.” You can substitute “reduce costs” if you wish. No big difference.
  3. Users accept whatever outputs a smart system deliver. Most people believe that “computers are right.” There’s nothing DarkCyber can do to make people more aware.
  4. Algorithms can be fiddled in the following ways: [a] Let these numerical recipes and the idiosyncrasies of calculation will just do their thing; for example, drift off in a weird direction or produce the equivalent of white noise; [b] get skewed because of the data flowing into the system automagically (very risky) or via human subject matter experts (also very risky); [c] the programmers implementing the algorithm focus on the code, speed, and deadline, not how the outputs flow; for example, k-means can be really mean and Bayesian methods can bay at the moon.

Net net: Worth reading this analysis.

Stephen E Arnold, July 23, 2019

New Jargon: Consultants, Start Your Engines

July 13, 2019

I read “What Is “Cognitive Linguistics“? The article appeared in Psychology Today. Disclaimer: I did some work for this outfit a long time ago. Anybody remember Charles Tillinghast, “CRM” when it referred to people, not a baloney discipline for a Rolodex filled with sales lead, and the use of Psychology Today as a text in a couple of universities? Yeah, I thought not. The Ziff connection is probably lost in the smudges of thumb typing too.

Onward: The write up explains a new spin on psychology, linguistics, and digital interaction. The jargon for this discipline or practice, if you will is:

Cognitive Linguistics

I must assume that the editorial processes at today’s Psychology Today are genetically linked to the procedures in use in — what was it, 1972? — but who knows.

excited fixed

Here’s the definition:

The cognitive linguistics enterprise is characterized by two key commitments. These are:
i) the Generalization Commitment: a commitment to the characterization of general principles that are responsible for all aspects of human language, and
ii) the Cognitive Commitment: a commitment to providing a characterization of general principles for language that accords with what is known about the mind and brain from other disciplines. As these commitments are what imbue cognitive linguistics with its distinctive character, and differentiate it from formal linguistics.

If you are into psychology and figuring out how to manipulate people or a Google ranking, perhaps this is the intellectual gold worth more than stolen treasure from Montezuma.

Several observations:

  1. I eagerly await an estimate from IDC for the size of the cognitive linguistics market, and I am panting with anticipation for a Garnter magic quadrant which positions companies as leaders, followers, outfits which did not pay for coverage, and names found with a Google search at Starbuck’s south of the old PanAm Building. Cognitive linguistics will have to wait until the two giants of expertise figure out how to define “personal computer market”, however.
  2. A series of posts from Dave Amerland and assorted wizards at SEO blogs which explain how to use the magic of cognitive linguistics to make a blog page — regardless of content, value, and coherence — number one for a Google query.
  3. A how to book from Wiley publishing called “Cognitive Linguistics for Dummies” with online reference material which may or many not actually be available via the link in the printed book
  4. A series of conferences run by assorted “instant conference” organizers with titles like “The Cognitive Linguistics Summit” or “Cognitive Linguistics: Global Impact”.

So many opportunities. Be still, my heart.

Cognitive linguistics — it’s time has come. Not a minute too soon for a couple of floundering enterprise search vendors to snag the buzzword and pivot to implementing cognitive linguistics for solving “all your information needs.” Which search company will embrace this technology: Coveo, IBM Watson, Sinequa?

DarkCyber is excited.

Stephen E Arnold, July 13, 2019

Next Page »

  • Archives

  • Recent Posts

  • Meta