New Jargon: Consultants, Start Your Engines
July 13, 2019
I read “What Is “Cognitive Linguistics“? The article appeared in Psychology Today. Disclaimer: I did some work for this outfit a long time ago. Anybody remember Charles Tillinghast, “CRM” when it referred to people, not a baloney discipline for a Rolodex filled with sales lead, and the use of Psychology Today as a text in a couple of universities? Yeah, I thought not. The Ziff connection is probably lost in the smudges of thumb typing too.
Onward: The write up explains a new spin on psychology, linguistics, and digital interaction. The jargon for this discipline or practice, if you will is:
Cognitive Linguistics
I must assume that the editorial processes at today’s Psychology Today are genetically linked to the procedures in use in — what was it, 1972? — but who knows.
Here’s the definition:
The cognitive linguistics enterprise is characterized by two key commitments. These are:
i) the Generalization Commitment: a commitment to the characterization of general principles that are responsible for all aspects of human language, and
ii) the Cognitive Commitment: a commitment to providing a characterization of general principles for language that accords with what is known about the mind and brain from other disciplines. As these commitments are what imbue cognitive linguistics with its distinctive character, and differentiate it from formal linguistics.
If you are into psychology and figuring out how to manipulate people or a Google ranking, perhaps this is the intellectual gold worth more than stolen treasure from Montezuma.
Several observations:
- I eagerly await an estimate from IDC for the size of the cognitive linguistics market, and I am panting with anticipation for a Garnter magic quadrant which positions companies as leaders, followers, outfits which did not pay for coverage, and names found with a Google search at Starbuck’s south of the old PanAm Building. Cognitive linguistics will have to wait until the two giants of expertise figure out how to define “personal computer market”, however.
- A series of posts from Dave Amerland and assorted wizards at SEO blogs which explain how to use the magic of cognitive linguistics to make a blog page — regardless of content, value, and coherence — number one for a Google query.
- A how to book from Wiley publishing called “Cognitive Linguistics for Dummies” with online reference material which may or many not actually be available via the link in the printed book
- A series of conferences run by assorted “instant conference” organizers with titles like “The Cognitive Linguistics Summit” or “Cognitive Linguistics: Global Impact”.
So many opportunities. Be still, my heart.
Cognitive linguistics — it’s time has come. Not a minute too soon for a couple of floundering enterprise search vendors to snag the buzzword and pivot to implementing cognitive linguistics for solving “all your information needs.” Which search company will embrace this technology: Coveo, IBM Watson, Sinequa?
DarkCyber is excited.
Stephen E Arnold, July 13, 2019
Sentiment Analysis: Can a Monkey Can Do It?
June 27, 2019
Sentiment analysis is a machine learning tool companies are employing to understand how their customers feel about their services and products. It is mainly deployed on social media platforms, including Facebook, Instagram, and Twitter. The Monkey Learn blog details how sentiment analysis is specifically being used on Twitter in the post, “Sentiment Analysis Of Twitter.”
Using sentiment analysis is not a new phenomenon, but there are still individuals unaware of the possible power at their fingertips. Monkey Learn specializes in customer machine learning solutions that include intent, keywords, and, of course, sentiment analysis. The post is a guide on the basics of sentiment analysis: what it is, how it works, and real life examples. Monkey Learn defines sentiment analysis as:
“Sentiment analysis (a.k.a opinion mining) is the automated process of identifying and extracting the subjective information that underlies a text. This can be either an opinion, a judgment, or a feeling about a particular topic or subject. The most common type of sentiment analysis is called ‘polarity detection’ and consists in classifying a statement as ‘positive’, ‘negative’ or ‘neutral’.”
It also relies on natural language processing (NLP) to understand the information’s context.
Monkey Learn explains that sentiment analysis is important because most of the world’s digital data is unstructured. Machine learning with NLP’s assistance can quickly sort large data sets and detect their polarity. Monkey Learn promises with their sentiment analysis to bring their customers scalability, consistent criteria, and real-time analysis. Many companies are using Twitter sentiment analysis for customer service, brand monitoring, market research, and political campaigns.
The article is basically a promotional piece for Monkey Learn, but it does work as a starting guide for sentiment analysis.
Whitney Grace, June 27, 2019
Into R? A List for You
May 12, 2019
Computerworld, which runs some pretty unusual stories, published “Great R Packages for Data Import, Wrangling and Visualization.” “Great” is an interesting word. In the lingo of Computerworld, a real journalist did some searching, talked to some people, and created a list. As it turns out, the effort is useful. Looking at the Computerworld table is quite a bit easier than trying to dig information out of assorted online sources. Plus, people are not too keen on the phone and email thing now.
The listing includes a mixture of different tools, software, and utilities. There are more than 80 listings. I wasn’t sure what to make of XML’s inclusion in the list, but, the source is Computerworld, and I assume that the “real” journalist knows much more than I.
Two observations:
- Earthworm lists without classification or alphabetization are less useful to me than listings which are sorted by tags and alphabetized within categories. Excel does perform this helpful trick.
- Some items in the earthworm list have links and others do not. Consistency, I suppose, is the hobgoblin of some types of intellectual work
- An indication of which item is free or for fee would be useful too.
Despite these shortcomings, you may want to download the list and tuck it into your “Things I love about R” folder.
Stephen E Arnold, May 12, 2019
Cognitive Engine: What Powers the USAF Platform?
May 1, 2019
Last week I met with a university professor who does cutting edge data and text mining and also shepherds PhD candidates. In the course of our 90 minute conversation, I noticed some reference books which had SPSS on the cover. The procedures implemented at this particular university worked well.
After the meeting, I was thinking about the newer approaches which are becoming publicly available. The USAF has started talking about its “cognitive engine.” I thought I heard at a conference that some technology developed developed by Nutonian, now part of a data and text mining roll up, had influenced the project.
The Nutonian system is predictive with a twist. The person using the system can rely on the smart software to perform the numerous intermediary steps required when using more traditional systems.
The article “The US Air Force Will Showcase Its Many Technological Advances in the USAF Lab Day.” The original is in Chinese but Freetranslate.com can help out if don’t read Chinese or have a close by contact who does.
The USAF wants to deploy a cognitive platform into which vendors can “plug in” their systems. The Chinese write up reported:
AFRL’s Autonomy Capability Team 3 (ACT3) is developing artificial intelligence on a large scale through the development and application of the Air Force Cognitive Engine (ACE), an artificial intelligence software platform. Put into application. The software platform architecture reduces the barriers to entry for artificial intelligence applications and provides end-user applications with the ability to cover a range of artificial intelligence problem types. In the application, the software platform connects educated end users, developers, and algorithms implemented in software, task data, and computing hardware to the process of creating an artificial intelligence solution.
The article also provides some interesting details which were not included in some of the English language reports about this session; for example:
- Smart rockets
- An agile pod
- Pathogen identification.
A couple of observations:
First, obviously the Chinese writer had access to information about the Lab Day demonstrations.
Second, the cognitive platform does not mention foundation vendors, which I understand.
Third, it would be delightful to visit a university and see documentation and information about the next-generation predictive analytics systems available.
Stephen E Arnold, May 1, 2019
Here’s what the Chinese writer reported about the
Nosing Beyond the Machine Learning from Human Curated Data Sets: Autonomy 1996 to Smart Software 2019
April 24, 2019
How does one teach a smart indexing system like Autonomy’s 1996 “neurodynamic” system?* Subject matter experts (SMEs) assembled training collection of textual information. The article and other content would replicate the characteristics of the content which the Autonomy system would process; that is, index and make searchable or analyzable. The work was important. Get the training data wrong and the indexing system would assign metadata or “index terms” and “category names” which could cause a query to generate results the user could perceive as incorrect.
How would a licensee adjust the Autonomy “black box”? (Think of my reference to Autonomy and search as a way of approaching “smart software” and “artificial intelligence.”)
The method was to perform re-training. The approach was practical and for most content domains, the re-training worked. It was an iterative process. Because the words in the corpus fed into the “black box” included new words, concepts, bound phrases, entities, and key sequences, there were several functions integrated into the basic Autonomy system as it matured. Examples ranged from support for term lists (controlled vocabularies) and dictionaries.
The combination of re-training and external content available to the system allowed Autonomy to deliver useful outputs.
Where the optimal results departed from the real world results usually boiled down to several factors, often working in concert. First, licensees did not want to pay for re-training. Second, maintenance of the external dictionaries was necessary because new entities arrive with reasonable frequency. Third, testing and organizing the freshening training sets and the editorial work required to keep dictionaries ship shape was too expensive, time consuming, and tedious.
Not surprisingly, some licensees grew unhappy with their Autonomy IDOL (integrated data operating layer) system. That, in my opinion, was not Autonomy’s fault. Autonomy explained in the presentations I heard what was required to get a system up and running and outputting results that could easily hit 80 percent or higher on precision and recall tests.
The Autonomy approach is widely used. In fact, wherever there is a Bayesian system in use, there is the training, re-training, external knowledge base demand. I just took a look at Haystax Constellation. It’s Bayesian and Haystax makes it clear that the “model” has to be training. So what’s changed between 1996 and 2019 with regards to Bayesian methods?
Nothing. Zip. Zero.
Text Analysis Toolkits
March 16, 2019
One of the DarkCyber team spotted a useful list, published by MonkeyLearn. Tucked into a narrative called “Text Analysis: The Only Guide You’ll Ever Need” was a list of natural language processing open source tools, programming languages, and software. Each description is accompanied with links and in several cases comments. See the original article for more information.
Caret
CoreNLP
Java
Keras
mlr
NLTK
OpenNLP
Python
SpaCy
Scikit-learn
TensorFlow
PyTorch
R
Weka
Stephen E Arnold, March 16, 2019
Ontotext Rank
December 5, 2018
Ontotext, a text processing vendor, has posted a demonstration of its ranking technology. You can find the demos at this link. The graphic below was generated by the system on December 3, 2018, at 0900 am US Eastern time. I specified the industry as information technology and the sub industry as search. Here’s what the system displayed:
A few observations:
- I specified 25 companies. The system displayed 10. I assume someone from the company will send me an email that the filters I applied did not have sufficient data to generate the desired result. Perhaps those data should be displayed?
- No Google Search nor Microsoft Bing search appeared. Google, a search vendor, has been in the news in the countries I have visited recently.
- RightNow appeared. The company is (I thought) a unit of Oracle.
- Publishers Clearing House sells magazine subscriptions. PCH does not offer information retrieval in the sense that I understand the bound phrase.
Net net: I am not sure about the size of the data set or what the categories mean.
You need to decide for yourself whether to use this service or Google Trends or a similar “popularity” or “sentiment” analysis system.
Stephen E Arnold, December 5, 2018
Digital Reasoning: From Intelligence Centric Text Retrieval to Wealth Management
November 12, 2018
Vendors of text processing systems have had to find new ways to generate revenue. The early days of entity extraction and social graphs provided customers from the US government and specialized companies like Booz, Allen & Hamilton.
Today, different economic realities have forced change.
The capitalist tool published “Digital Reasoning Brings AI To Wealth Management.” The write up does little to put Digital Reasoning in context. The company was founded in 2000. The firm accepted outside financing which now amounts to about $100 million. The firm became cozy with IBM, labored in the vineyards of the star crossed Distributed Common Ground System, and then faced a fire storm of competition from companies big and small. The reason? Entity extraction and link analysis became commodities. The fancy math also migrated into a wide range of applications.
New buzzwords appeared and gained currency. These ranged from artificial intelligence (who knows that that phrase means?) to real time data analytics (Yeah, what is “real time”?).
Digital Reasoning’s response is interesting. The company, like Attivio and Coveo, has nosed into customer support. But the intriguing play is that the Digital Reasoning system, which was text centric, is now packaging its system to help wealth management firms.
Is this text based?
Sure is. I learned:
For advisors, Digital Reasoning helps them prioritize which customers to focus on, which can be useful when an adviser may have 200 or more clients. At the management level, Digital Reasoning can show if the firm has specific advisors getting a lot of complaints so it can respond with training and intervention. At a strategic level, it can sift through communications and identify if customers are looking for a specific offering or type of product.
Interesting approach.
The challenge, of course, will be to differentiate Digital Reasoning’s system from those available from dozens of vendors.
Digital Reasoning has investors who want a return on their $100 million. After 18 years, time may be compressing as once solutions once perceived as sophisticated become more widely available and subject to price pressure.
Rumors of Amazon’s interest in this “wealth management” sector have reached us in Harrod’s Creek. That might be another reason why the low profile Digital Reasoning is stirring the PR waters using the capitalist’s tool, Forbes Magazine, once a source of “real” news.
Stephen E Arnold, November 12, 2018
Picking and Poking Palantir Technologies: A New Blood Sport?
April 25, 2018
My reaction to “Palantir Has Figured Out How to Make Money by Using Algorithms to Ascribe Guilt to People, Now They’re Looking for New Customers” is a a sign and a groan.
I don’t work for Palantir Technologies, although I have been a consultant to one of its major competitors. I do lecture about next generation information systems at law enforcement and intelligence centric conferences in the US and elsewhere. I also wrote a book called “CyberOSINT: Next Generation Information Access.” That study has spawned a number of “experts” who are recycling some of my views and research. A couple of government agencies have shortened by word “cyberosint” into the “cyint.” In a manner of speaking, I have an information base which can be used to put the actions of companies which offer services similar to those available from Palantir in perspective.
The article in Boing Boing falls into the category of “yikes” analysis. Suddenly, it seems, the idea that cook book mathematical procedures can be used to make sense of a wide range of data. Let me assure you that this is not a new development, and Palantir is definitely not the first of the companies developing applications for law enforcement and intelligence professionals to land customers in financial and law firms.
A Palantir bubble gum card shows details about a person of interest and links to underlying data from which the key facts have been selected. Note that this is from an older version of Palantir Gotham. Source: Google Images, 2015
Decades ago, a friend of mine (Ev Brenner, now deceased) was one of the pioneers using technology and cook book math to make sense of oil and gas exploration data. How long ago? Think 50 years.
The focus of “Palantir Has Figured Out…” is that:
Palantir seems to be the kind of company that is always willing to sell magic beans to anyone who puts out an RFP for them. They have promised that with enough surveillance and enough secret, unaccountable parsing of surveillance data, they can find “bad guys” and stop them before they even commit a bad action.
Okay, that sounds good in the context of the article, but Palantir is just one vendor responding to the need for next generation information access tools from many commercial sectors.
CyberOSINT: Next Generation Information Access Explains the Tech Behind the Facebook, GSR, Cambridge Analytica Matter
April 5, 2018
In 2015, I published CyberOSINT: Next Generation Information Access. This is a quick reminder that the profiles of the vendors who have created software systems and tools for law enforcement and intelligence professionals remains timely.
The 200 page book provides examples, screenshots, and explanations of the tools which are available to analyze social media information. The book is the most comprehensive run down of the open source, commercial, and cloud based systems which can make sense of social media data, lawful intercept data, and general text and imagery content.
Companies described in this collection of “tools” include:
- Cyveillance (now LookingGlass)
- Decisive Analytics
- IBM i2 (Analysts Notebook)
- Geofeedia
- Leidos
- Palantir Gotham
- and more than a dozen developers of commercial and open source, high impact cyberOSINT tool vendors.
The book is available for $49. Additional information is available on my Xenky.com Web site. You can buy the PDF book online at this link gum.co/cyberosint.
Get the CyberOSINT monograph. It’s the standard reference for practical and effective analysis, text analytics, and next generation solutions.
Stephen E Arnold, April 5, 2018