New Jargon: Consultants, Start Your Engines

July 13, 2019

I read “What Is “Cognitive Linguistics“? The article appeared in Psychology Today. Disclaimer: I did some work for this outfit a long time ago. Anybody remember Charles Tillinghast, “CRM” when it referred to people, not a baloney discipline for a Rolodex filled with sales lead, and the use of Psychology Today as a text in a couple of universities? Yeah, I thought not. The Ziff connection is probably lost in the smudges of thumb typing too.

Onward: The write up explains a new spin on psychology, linguistics, and digital interaction. The jargon for this discipline or practice, if you will is:

Cognitive Linguistics

I must assume that the editorial processes at today’s Psychology Today are genetically linked to the procedures in use in — what was it, 1972? — but who knows.

excited fixed

Here’s the definition:

The cognitive linguistics enterprise is characterized by two key commitments. These are:
i) the Generalization Commitment: a commitment to the characterization of general principles that are responsible for all aspects of human language, and
ii) the Cognitive Commitment: a commitment to providing a characterization of general principles for language that accords with what is known about the mind and brain from other disciplines. As these commitments are what imbue cognitive linguistics with its distinctive character, and differentiate it from formal linguistics.

If you are into psychology and figuring out how to manipulate people or a Google ranking, perhaps this is the intellectual gold worth more than stolen treasure from Montezuma.

Several observations:

  1. I eagerly await an estimate from IDC for the size of the cognitive linguistics market, and I am panting with anticipation for a Garnter magic quadrant which positions companies as leaders, followers, outfits which did not pay for coverage, and names found with a Google search at Starbuck’s south of the old PanAm Building. Cognitive linguistics will have to wait until the two giants of expertise figure out how to define “personal computer market”, however.
  2. A series of posts from Dave Amerland and assorted wizards at SEO blogs which explain how to use the magic of cognitive linguistics to make a blog page — regardless of content, value, and coherence — number one for a Google query.
  3. A how to book from Wiley publishing called “Cognitive Linguistics for Dummies” with online reference material which may or many not actually be available via the link in the printed book
  4. A series of conferences run by assorted “instant conference” organizers with titles like “The Cognitive Linguistics Summit” or “Cognitive Linguistics: Global Impact”.

So many opportunities. Be still, my heart.

Cognitive linguistics — it’s time has come. Not a minute too soon for a couple of floundering enterprise search vendors to snag the buzzword and pivot to implementing cognitive linguistics for solving “all your information needs.” Which search company will embrace this technology: Coveo, IBM Watson, Sinequa?

DarkCyber is excited.

Stephen E Arnold, July 13, 2019

Sentiment Analysis: Can a Monkey Can Do It?

June 27, 2019

Sentiment analysis is a machine learning tool companies are employing to understand how their customers feel about their services and products. It is mainly deployed on social media platforms, including Facebook, Instagram, and Twitter. The Monkey Learn blog details how sentiment analysis is specifically being used on Twitter in the post, “Sentiment Analysis Of Twitter.”

Using sentiment analysis is not a new phenomenon, but there are still individuals unaware of the possible power at their fingertips. Monkey Learn specializes in customer machine learning solutions that include intent, keywords, and, of course, sentiment analysis. The post is a guide on the basics of sentiment analysis: what it is, how it works, and real life examples. Monkey Learn defines sentiment analysis as:

Sentiment analysis (a.k.a opinion mining) is the automated process of identifying and extracting the subjective information that underlies a text. This can be either an opinion, a judgment, or a feeling about a particular topic or subject. The most common type of sentiment analysis is called ‘polarity detection’ and consists in classifying a statement as ‘positive’, ‘negative’ or ‘neutral’.”

It also relies on natural language processing (NLP) to understand the information’s context.

Monkey Learn explains that sentiment analysis is important because most of the world’s digital data is unstructured. Machine learning with NLP’s assistance can quickly sort large data sets and detect their polarity. Monkey Learn promises with their sentiment analysis to bring their customers scalability, consistent criteria, and real-time analysis. Many companies are using Twitter sentiment analysis for customer service, brand monitoring, market research, and political campaigns.

The article is basically a promotional piece for Monkey Learn, but it does work as a starting guide for sentiment analysis.

Whitney Grace, June 27, 2019

How Smart Software Goes Off the Rails

June 23, 2019

Navigate to “How Feature Extraction Can Be Improved With Denoising.” The write up seems like a straight forward analytics explanation. Lots of jargon, buzzwords, and hippy dippy references to length squared sampling in matrices. The concept is not defined in the article. And if you remember statistics 101, you know that there are five types of sampling: Convenience, cluster, random, systematic, and stratified. Each has its strengths and weaknesses. How does one avoid the issues? Use length squared sampling obviously: Just sample rows with probability proportional to the square of their Euclidean norms. Got it?

However, the math is not the problem. Math is a method. The glitch is in defining “noise.” Like love, there are many ways to define love. The write up points out:

Autoencoders with more hidden layers than inputs run the risk of learning the identity function – where the output simply equals the input – thereby becoming useless. In order to overcome this, Denoising Autoencoders(DAE) was developed. In this technique, the input is randomly induced by noise. This will force the autoencoder to reconstruct the input or denoise. Denoising is recommended as a training criterion for learning to extract useful features that will constitute a better higher level representation.

Can you spot the flaw in approach? Consider what happens if the training set is skewed for some reason. The system will learn based on the inputs smoothed by statistical sanding. When the system encounters real world data, the system will, by golly, convert the “real” inputs in terms of the flawed denoising method. As one wit observed, “So s?c^2 p gives us a better estimation than the zero matrix.” Yep.

To sum up, the system just generates “drifting” outputs. The fix? Retraining. This is expensive and time consuming. Not good when the method is applied to real time flows of data.

In a more colloquial turn of phrase, the denoiser may not be denoising correctly.

A more complex numerical recipes are embedded in “smart” systems, there will be some interesting consequences. Does the phrase “chain of failure”? What about “good enough”?

Stephen E Arnold, June 23, 2019

Owlin Pivots Attracts Funding

June 21, 2019

Financial-tech startup Owlin is bound to be celebrating—TechCrunch announces, “Owlin, the Text and News Analytics Platform for Financial Institutions, Raises $3.5M Series A.” This is especially good news, considering the company lost ground when its original backer went bankrupt; that twist cost the company two founders, we’re told. Now, though, Velocity Capital is leading this round of funding. Writer Steve O’Hear reports:

“The fundraise follows the fintech company’s pivot from a real-time news alert service to a more comprehensive ‘AI-based’ text and news analytics platform to help financial institutions assess risk. … This is seeing Owlin enable 15,000 counter-party risk managers worldwide to track risk events that are not captured by traditional credit risk metrics. ‘We are adding news and unstructured data to their risk monitoring. In the end, our clients don’t just gain insights, they also gain time,’ adds the Owlin CEO.”

Apparently, the platform is unusually successful at augmenting certain types of data, making for more accurate risk models. Regulators love that, we’re reminded. Founded in 2012, Owlin is based in Amsterdam Some of the companies global clients are Deutsche Bank, ING, Fitch Ratings, Adyen, and KPMG.

Cynthia Murrell, June 21, 2019

Grammar Rules Help Algorithms Grasp Language

June 20, 2019

Researchers at several universities have teamed up with IBM to teach algorithms some subtleties of language. VentureBeat reports, “IBM, MIT, and Harvard’s AI Uses Grammar Rules to Catch Linguistic Nuances of U.S. English.” Writer Kyle Wiggers links to the two resulting research papers, noting the research was to be presented at the recent North American Chapter of the Association for Computational Linguistics conference. We learn:

“The IBM team, along with scientists from MIT, Harvard, the University of California, Carnegie Mellon University, and Kyoto University, devised a tool set to suss out grammar-aware AI models’ linguistic prowess. As the coauthors explain, one model in question was trained on a sentence structure called recurrent neural network grammars, or RNNGs, that imbued it with basic grammar knowledge. The RNNG model and similar models with little-to-no grammar training were fed sentences with good, bad, or ambiguous syntax. The AI systems assigned probabilities to each word, such that in grammatically ‘off’ sentences, low-probability words appeared in the place of high-probability words. These were used to measure surprisal[sic]. The coauthors found that the RNNG system consistently performed better than systems trained on little-to-no grammar using a fraction of the data, and that it could comprehend ‘fairly sophisticated’ rules.”

See the write-up for a few details about those rules, or check out the research papers for more information (links above). This is but a start for their model, the team cautions, for the work must be validated on larger data sets. Still, they believe, this project represents a noteworthy milestone.

Cynthia Murrell, June 20, 2019

Firefox Translation Add In

May 17, 2019

The DarkCyber team encounters information in a number of languages. For years, we relied on Google Translate, but the limits on document size proved an annoyance. FreeTranslate.com has been more useful. We have an older installation of some Systran modules.

DarkCyber learned that Firefox has returned to the “translate now” territory with Translate Man. You can get an overview of the functionality of the add in in “Translate anything instantly in Firefox with Translate Man.” Translate Man uses Google’s API.

We haven’t tested the functionality of the add in in an extensive way. It did translate words and short passages in a helpful way.

The write up identifies useful features that add in delivers. Two are a translate on hover feature and a pronunciation function so you can “hear” the word or passage.

In our experience, some text requires a native speaker of the language to translate with accuracy.

Google has introduced its wonderfully named Translatotron. You can read about that innovation in “Google Unveils Translatotron, Its Speech-to-Speech Translation System.”

Now about these systems’ ability to translate the argot of insiders involved in “interesting” work in North Korea or Iran? What about making sense of emojis in clear text messages?

Someday perhaps.

Stephen E Arnold, May 17, 2019

Into R? A List for You

May 12, 2019

Computerworld, which runs some pretty unusual stories, published “Great R Packages for Data Import, Wrangling and Visualization.” “Great” is an interesting word. In the lingo of Computerworld, a real journalist did some searching, talked to some people, and created a list. As it turns out, the effort is useful. Looking at the Computerworld table is quite a bit easier than trying to dig information out of assorted online sources. Plus, people are not too keen on the phone and email thing now.

The listing includes a mixture of different tools, software, and utilities. There are more than 80 listings. I wasn’t sure what to make of XML’s inclusion in the list, but, the source is Computerworld, and I assume that the “real” journalist knows much more than I.

Two observations:

  • Earthworm lists without classification or alphabetization are less useful to me than listings which are sorted by tags and alphabetized within categories. Excel does perform this helpful trick.
  • Some items in the earthworm list have links and others do not. Consistency, I suppose, is the hobgoblin of some types of intellectual work
  • An indication of which item is free or for fee would be useful too.

Despite these shortcomings, you may want to download the list and tuck it into your “Things I love about R” folder.

Stephen E Arnold, May 12, 2019

China: Patent Translation System

May 10, 2019

Patents are usually easily findable documents. However, reading a patent once found is a challenge. Up the ante if the patent is in a language the person does not read. “AI Used to Translate Patent Documents” provides some information about a new system available from the Intellectual Property Publishing House. According to the article in China Daily:

The system can translate Chinese into English, Japanese and German and vice versa. Its accuracy in two-way translation between Chinese and Japanese has reached 95 percent, far more than the current industry average, and the rest has topped 90 percent…

The system uses a dictionary, natural language processing algorithms, and a computational model. In short, this is a collection of widely used methods tuned over a decade by the Chinese organization. In that span, Thomson Reuters dropped out of the patent game, and just finding patents, even in the US, can be a daunting task.

Translation has been an even more difficult task for some lawyers, researchers, analysts, and academics.

If the information in the China Daily article is accurate, China may have an intellectual property advantage., The write up offers some details, which sound interesting; for example:

  • Translation of a Japanese document: five seconds
  • Patent documents record 90 percent of a country’s technology and innovation
  • China has “a huge database of global patents”.

And the other 10 percent? Maybe other methods are employed.

Stephen E Arnold, May 10, 2019

Cognitive Engine: What Powers the USAF Platform?

May 1, 2019

Last week I met with a university professor who does cutting edge data and text mining and also shepherds PhD candidates. In the course of our 90 minute conversation, I noticed some reference books which had SPSS on the cover. The procedures implemented at this particular university worked well.

After the meeting, I was thinking about the newer approaches which are becoming publicly available. The USAF has started talking about its “cognitive engine.” I thought I heard at a conference that some technology developed developed by Nutonian, now part of a data and text mining roll up, had influenced the project.

The Nutonian system is predictive with a twist. The person using the system can rely on the smart software to perform the numerous intermediary steps required when using more traditional systems.

The article “The US Air Force Will Showcase Its Many Technological Advances in the USAF Lab Day.” The original is in Chinese but Freetranslate.com can help out if don’t read Chinese or have a close by contact who does.

The USAF wants to deploy a cognitive platform into which vendors can “plug in” their systems. The Chinese write up reported:

AFRL’s Autonomy Capability Team 3 (ACT3) is developing artificial intelligence on a large scale through the development and application of the Air Force Cognitive Engine (ACE), an artificial intelligence software platform. Put into application. The software platform architecture reduces the barriers to entry for artificial intelligence applications and provides end-user applications with the ability to cover a range of artificial intelligence problem types. In the application, the software platform connects educated end users, developers, and algorithms implemented in software, task data, and computing hardware to the process of creating an artificial intelligence solution.

The article also provides some interesting details which were not included in some of the English language reports about this session; for example:

  • Smart rockets
  • An agile pod
  • Pathogen identification.

A couple of observations:

First, obviously the Chinese writer had access to information about the Lab Day demonstrations.

Second, the cognitive platform does not mention foundation vendors, which I understand.

Third, it would be delightful to visit a university and see documentation and information about the next-generation predictive analytics systems available.

Stephen E Arnold, May 1, 2019

Here’s what the Chinese writer reported about the

Latest GraphDB Edition Available

April 25, 2019

A new version of GraphDB is now available, we learn from the company’s News post, “Ontotext’s GraphDB 8.9 Boosts Semantic Similarity Search.” The semantic graph database offers a couple new features inspired by user feedback. We learn:

“The semantic similarity search is based on the Random Indexing algorithm. … The latest GraphDB release enables users to create hybrid similarity searches using pre-built text-based similarity vectors for the predication-based similarity index. The index combines the power of graph topology with the text similarity. The users can control the index accuracy by specifying the number of iterations required to refine the embeddings. Another improvement is that now GraphDB 8.9 allows users to boost the term weights when searching in text-based similarity indexes. It also simplifies the processes of abortion of running queries or updates from the SPARQL editor in the Workbench.”

The database continues to be updated to the current RDF4J 2.4.6 public release. GraphDB comes in Free, Standard, and Enterprise editions. Begun in 2000, Ontotext is based in Sofia, Bulgaria, and maintains its North American office in New York City.

Cynthia Murrell, April 25, 2019

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta