Trovicor: A Slogan as an Equation

August 2, 2019

We spotted this slogan on the Trovicor Web site:

The Trovicor formula: Actionable Intelligence = f (data generation; fusion; analysis; visualization)

The function consists of four buzzwords used by vendors of policeware and intelware:

  • Data generation (which suggests metadata assigned to intercepted, scraped, or provided content objects)
  • Fusion (which means in DarkCyber’s world a single index to disparate data)
  • Analysis (numerical recipes to identify patterns or other interesting data
  • Virtualization (use of technology to replace old school methods like 1950s’ style physical wire taps, software defined components, and software centric widgets).

The buzzwords make it easy to identify other companies providing somewhat similar services.

Trovicor maintains a low profile. But obtaining open source information about the company may be a helpful activity.

Stephen E Arnold, August 2, 2019

Need a Machine Learning Algorithm?

July 17, 2019

r entry

The R-Bloggers.com Web site published “101 Machine Learning Algorithms for Data Science with Cheat Sheets.” The write up recycles information from DataScienceDojo, and some of the information looks familiar. But lists of algorithms are not original. They are useful. What sets this list apart is the inclusion of “cheat sheets.”

What’s a cheat sheet?

In this particular collection, a cheat sheet looks like this:

r entry example

You can see the entry for the algorithm: Bernoulli Naive Bayes with a definition. The “cheat sheet” is a link to a python example. In this case, the example is a link to an explanation on the Chris Albon blog.

What’s interesting is that the 101 algorithms are grouped under 18 categories. Of these 18, Bayes and derivative methods total five.

No big deal, but in my lectures about widely used algorithms I highlight 10, mostly because it is a nice round number. The point is that most of the analytics vendors use the same basic algorithms. Variations among products built on these algorithms are significant.

As analytics systems become more modular — that  is, like Lego blocks — it seems that the trajectory of development will be to select, preconfigure thresholds, and streamline processes in a black box.

Is this good or bad?

It depends on whether one’s black box is a dominant solution or platform?

Will users know that this almost inevitable narrowing has upsides and downsides?

Nope.

Stephen E Arnold, July 17, 2019

New Jargon: Consultants, Start Your Engines

July 13, 2019

I read “What Is “Cognitive Linguistics“? The article appeared in Psychology Today. Disclaimer: I did some work for this outfit a long time ago. Anybody remember Charles Tillinghast, “CRM” when it referred to people, not a baloney discipline for a Rolodex filled with sales lead, and the use of Psychology Today as a text in a couple of universities? Yeah, I thought not. The Ziff connection is probably lost in the smudges of thumb typing too.

Onward: The write up explains a new spin on psychology, linguistics, and digital interaction. The jargon for this discipline or practice, if you will is:

Cognitive Linguistics

I must assume that the editorial processes at today’s Psychology Today are genetically linked to the procedures in use in — what was it, 1972? — but who knows.

excited fixed

Here’s the definition:

The cognitive linguistics enterprise is characterized by two key commitments. These are:
i) the Generalization Commitment: a commitment to the characterization of general principles that are responsible for all aspects of human language, and
ii) the Cognitive Commitment: a commitment to providing a characterization of general principles for language that accords with what is known about the mind and brain from other disciplines. As these commitments are what imbue cognitive linguistics with its distinctive character, and differentiate it from formal linguistics.

If you are into psychology and figuring out how to manipulate people or a Google ranking, perhaps this is the intellectual gold worth more than stolen treasure from Montezuma.

Several observations:

  1. I eagerly await an estimate from IDC for the size of the cognitive linguistics market, and I am panting with anticipation for a Garnter magic quadrant which positions companies as leaders, followers, outfits which did not pay for coverage, and names found with a Google search at Starbuck’s south of the old PanAm Building. Cognitive linguistics will have to wait until the two giants of expertise figure out how to define “personal computer market”, however.
  2. A series of posts from Dave Amerland and assorted wizards at SEO blogs which explain how to use the magic of cognitive linguistics to make a blog page — regardless of content, value, and coherence — number one for a Google query.
  3. A how to book from Wiley publishing called “Cognitive Linguistics for Dummies” with online reference material which may or many not actually be available via the link in the printed book
  4. A series of conferences run by assorted “instant conference” organizers with titles like “The Cognitive Linguistics Summit” or “Cognitive Linguistics: Global Impact”.

So many opportunities. Be still, my heart.

Cognitive linguistics — it’s time has come. Not a minute too soon for a couple of floundering enterprise search vendors to snag the buzzword and pivot to implementing cognitive linguistics for solving “all your information needs.” Which search company will embrace this technology: Coveo, IBM Watson, Sinequa?

DarkCyber is excited.

Stephen E Arnold, July 13, 2019

Sentiment Analysis: Can a Monkey Can Do It?

June 27, 2019

Sentiment analysis is a machine learning tool companies are employing to understand how their customers feel about their services and products. It is mainly deployed on social media platforms, including Facebook, Instagram, and Twitter. The Monkey Learn blog details how sentiment analysis is specifically being used on Twitter in the post, “Sentiment Analysis Of Twitter.”

Using sentiment analysis is not a new phenomenon, but there are still individuals unaware of the possible power at their fingertips. Monkey Learn specializes in customer machine learning solutions that include intent, keywords, and, of course, sentiment analysis. The post is a guide on the basics of sentiment analysis: what it is, how it works, and real life examples. Monkey Learn defines sentiment analysis as:

Sentiment analysis (a.k.a opinion mining) is the automated process of identifying and extracting the subjective information that underlies a text. This can be either an opinion, a judgment, or a feeling about a particular topic or subject. The most common type of sentiment analysis is called ‘polarity detection’ and consists in classifying a statement as ‘positive’, ‘negative’ or ‘neutral’.”

It also relies on natural language processing (NLP) to understand the information’s context.

Monkey Learn explains that sentiment analysis is important because most of the world’s digital data is unstructured. Machine learning with NLP’s assistance can quickly sort large data sets and detect their polarity. Monkey Learn promises with their sentiment analysis to bring their customers scalability, consistent criteria, and real-time analysis. Many companies are using Twitter sentiment analysis for customer service, brand monitoring, market research, and political campaigns.

The article is basically a promotional piece for Monkey Learn, but it does work as a starting guide for sentiment analysis.

Whitney Grace, June 27, 2019

Into R? A List for You

May 12, 2019

Computerworld, which runs some pretty unusual stories, published “Great R Packages for Data Import, Wrangling and Visualization.” “Great” is an interesting word. In the lingo of Computerworld, a real journalist did some searching, talked to some people, and created a list. As it turns out, the effort is useful. Looking at the Computerworld table is quite a bit easier than trying to dig information out of assorted online sources. Plus, people are not too keen on the phone and email thing now.

The listing includes a mixture of different tools, software, and utilities. There are more than 80 listings. I wasn’t sure what to make of XML’s inclusion in the list, but, the source is Computerworld, and I assume that the “real” journalist knows much more than I.

Two observations:

  • Earthworm lists without classification or alphabetization are less useful to me than listings which are sorted by tags and alphabetized within categories. Excel does perform this helpful trick.
  • Some items in the earthworm list have links and others do not. Consistency, I suppose, is the hobgoblin of some types of intellectual work
  • An indication of which item is free or for fee would be useful too.

Despite these shortcomings, you may want to download the list and tuck it into your “Things I love about R” folder.

Stephen E Arnold, May 12, 2019

Cognitive Engine: What Powers the USAF Platform?

May 1, 2019

Last week I met with a university professor who does cutting edge data and text mining and also shepherds PhD candidates. In the course of our 90 minute conversation, I noticed some reference books which had SPSS on the cover. The procedures implemented at this particular university worked well.

After the meeting, I was thinking about the newer approaches which are becoming publicly available. The USAF has started talking about its “cognitive engine.” I thought I heard at a conference that some technology developed developed by Nutonian, now part of a data and text mining roll up, had influenced the project.

The Nutonian system is predictive with a twist. The person using the system can rely on the smart software to perform the numerous intermediary steps required when using more traditional systems.

The article “The US Air Force Will Showcase Its Many Technological Advances in the USAF Lab Day.” The original is in Chinese but Freetranslate.com can help out if don’t read Chinese or have a close by contact who does.

The USAF wants to deploy a cognitive platform into which vendors can “plug in” their systems. The Chinese write up reported:

AFRL’s Autonomy Capability Team 3 (ACT3) is developing artificial intelligence on a large scale through the development and application of the Air Force Cognitive Engine (ACE), an artificial intelligence software platform. Put into application. The software platform architecture reduces the barriers to entry for artificial intelligence applications and provides end-user applications with the ability to cover a range of artificial intelligence problem types. In the application, the software platform connects educated end users, developers, and algorithms implemented in software, task data, and computing hardware to the process of creating an artificial intelligence solution.

The article also provides some interesting details which were not included in some of the English language reports about this session; for example:

  • Smart rockets
  • An agile pod
  • Pathogen identification.

A couple of observations:

First, obviously the Chinese writer had access to information about the Lab Day demonstrations.

Second, the cognitive platform does not mention foundation vendors, which I understand.

Third, it would be delightful to visit a university and see documentation and information about the next-generation predictive analytics systems available.

Stephen E Arnold, May 1, 2019

Here’s what the Chinese writer reported about the

Nosing Beyond the Machine Learning from Human Curated Data Sets: Autonomy 1996 to Smart Software 2019

April 24, 2019

How does one teach a smart indexing system like Autonomy’s 1996 “neurodynamic” system?* Subject matter experts (SMEs) assembled training collection of textual information. The article and other content would replicate the characteristics of the content which the Autonomy system would process; that is, index and make searchable or analyzable. The work was important. Get the training data wrong and the indexing system would assign metadata or “index terms” and “category names” which could cause a query to generate results the user could perceive as incorrect.

image

How would a licensee adjust the Autonomy “black box”? (Think of my reference to Autonomy and search as a way of approaching “smart software” and “artificial intelligence.”)

The method was to perform re-training. The approach was practical and for most content domains, the re-training worked. It was an iterative process. Because the words in the corpus fed into the “black box” included new words, concepts, bound phrases, entities, and key sequences, there were several functions integrated into the basic Autonomy system as it matured. Examples ranged from support for term lists (controlled vocabularies) and dictionaries.

The combination of re-training and external content available to the system allowed Autonomy to deliver useful outputs.

Where the optimal results departed from the real world results usually boiled down to several factors, often working in concert. First, licensees did not want to pay for re-training. Second, maintenance of the external dictionaries was necessary because new entities arrive with reasonable frequency. Third, testing and organizing the freshening training sets and the editorial work required to keep dictionaries ship shape was too expensive, time consuming, and tedious.

Not surprisingly, some licensees grew unhappy with their Autonomy IDOL (integrated data operating layer) system. That, in my opinion, was not Autonomy’s fault. Autonomy explained in the presentations I heard what was required to get a system up and running and outputting results that could easily hit 80 percent or higher on precision and recall tests.

The Autonomy approach is widely used. In fact, wherever there is a Bayesian system in use, there is the training, re-training, external knowledge base demand. I just took a look at Haystax Constellation. It’s Bayesian and Haystax makes it clear that the “model” has to be training. So what’s changed between 1996 and 2019 with regards to Bayesian methods?

Nothing. Zip. Zero.

Read more

Text Analysis Toolkits

March 16, 2019

One of the DarkCyber team spotted a useful list, published by MonkeyLearn. Tucked into a narrative called “Text Analysis: The Only Guide You’ll Ever Need” was a list of natural language processing open source tools, programming languages, and software. Each description is accompanied with links and in several cases comments. See the original article for more information.

Caret

CoreNLP

Java

Keras

mlr

NLTK

OpenNLP

Python

SpaCy

Scikit-learn

TensorFlow

PyTorch

R

Weka

Stephen E Arnold, March 16, 2019

Ontotext Rank

December 5, 2018

Ontotext, a text processing vendor, has posted a demonstration of its ranking technology. You can find the demos at this link. The graphic below was generated by the system on December 3, 2018, at 0900 am US Eastern time. I specified the industry as information technology and the sub industry as search. Here’s what the system displayed:

image

A few observations:

  1. I specified 25 companies. The system displayed 10. I assume someone from the company will send me an email that the filters I applied did not have sufficient data to generate the desired result. Perhaps those data should be displayed?
  2. No Google Search nor Microsoft Bing search appeared. Google, a search vendor, has been in the news in the countries I have visited recently.
  3. RightNow appeared. The company is (I thought) a unit of Oracle.
  4. Publishers Clearing House sells magazine subscriptions. PCH does not offer information retrieval in the sense that I understand the bound phrase.

Net net: I am not sure about the size of the data set or what the categories mean.

You need to decide for yourself whether to use this service or Google Trends or a similar “popularity” or “sentiment” analysis system.

Stephen E Arnold, December 5, 2018

Digital Reasoning: From Intelligence Centric Text Retrieval to Wealth Management

November 12, 2018

Vendors of text processing systems have had to find new ways to generate revenue. The early days of entity extraction and social graphs provided customers from the US government and specialized companies like Booz, Allen & Hamilton.

Today, different economic realities have forced change.

The capitalist tool published “Digital Reasoning Brings AI To Wealth Management.” The write up does little to put Digital Reasoning in context. The company was founded in 2000. The firm accepted outside financing which now amounts to about $100 million. The firm became cozy with IBM, labored in the vineyards of the star crossed Distributed Common Ground System, and then faced a fire storm of competition from companies big and small. The reason? Entity extraction and link analysis became commodities. The fancy math also migrated into a wide range of applications.

New buzzwords appeared and gained currency. These ranged from artificial intelligence (who knows that that phrase means?) to real time data analytics (Yeah, what is “real time”?).

Digital Reasoning’s response is interesting. The company, like Attivio and Coveo, has nosed into customer support. But the intriguing play is that the Digital Reasoning system, which was text centric, is now packaging its system to help wealth management firms.

Is this text based?

Sure is. I learned:

For advisors, Digital Reasoning helps them prioritize which customers to focus on, which can be useful when an adviser may have 200 or more clients. At the management level, Digital Reasoning can show if the firm has specific advisors getting a lot of complaints so it can respond with training and intervention. At a strategic level, it can sift through communications and identify if customers are looking for a specific offering or type of product.

Interesting approach.

The challenge, of course, will be to differentiate Digital Reasoning’s system from those available from dozens of vendors.

Digital Reasoning has investors who want a return on their $100 million. After 18 years, time may be compressing as once solutions once perceived as sophisticated become more widely available and subject to price pressure.

Rumors of Amazon’s interest in this “wealth management” sector have reached us in Harrod’s Creek. That might be another reason why the low profile Digital Reasoning is stirring the PR waters using the capitalist’s tool, Forbes Magazine, once a source of “real” news.

Stephen E Arnold, November 12, 2018

Next Page »

  • Archives

  • Recent Posts

  • Meta