Linear Math Textbook: For Class Room Use or Individual Study

October 30, 2020

Jim Hefferon’s Linear Algebra is a math textbook. You can get it for free by navigating to this page. From Mr. Hefferon’s Web page for the book, you can download a copy and access a range of supplementary materials. These include:

  • Classroom slides
  • Exercise sets
  • A “lab” manual which requires Sage
  • Video.

The book is designed for students who have completed one semester of calculus. Remember: Linear algebra is useful for poking around in search or neutralizing drones. Zaap. Highly recommended.

Stephen E Arnold, October 30, 2020

Text Analytics: Are These Really the Companies to Watch in the Next 12 Weeks?

October 16, 2020

DarkCyber spotted “Top 10 Text Analytics Companies to Watch in 2020.” Let’s take a quick look at some basic details about each firm:

Alkymi, founded in 2017, makes an email indexing system. The system, according to the company’s Web site, “understands documents using deep learning and visual analysis paired with your human in-the-loop expertise.” Interesting but text analytics appears to be a component of a much larger system. What’s interesting is that the business relies in some degree upon Amazon Web Services. The company’s Web site is https://alkymi.io/.

Aylien Ltd., based in Ireland, appears to be a company with text analysis technology. However, the company’s system is used to create intelligence reports for analysts; for example, government intelligence officers, business analysts, and media outlets. Founded in 2010, the company’s Web site is https://aylien.com.

Hewlett Packard Enterprise. The inclusion of HPE was a bit of a surprise. This outfit once owned the Autonomy technology, but divested itself of the software and services. To replace Autonomy, the company developed “Advanced Text Analysis” which appears to be an enterprise search centric system. The service is available as a Microsoft Azure function and offers 60 APIs (which seems particularly generous) “that deliver deep learning analytics on a wide range of data.” The company’s Web site is https://www.hpe.com/in/en/home.html. One product name jumped out: Ezmeral which maybe a made up word.

InData Labs lists data science, AI, AI driven mobile app development, computer vision, machine learning, data capture and optical character recognition, and big data solutions as its services. Its products include face recognition and natural language processing. Perhaps it is the NLP product which equates to text analytics? The firm’s Web site is https://indatalabs.com/ The company was founded in 2014 and operates from Belarus and has a San Francisco presence.

Kapiche, founded in 2016, focuses on “customer insights”. Customer feedback yields insight with “no set up, no manual coding, and results you can trust,” according to the company. The text analytics snaps into services like Survey Monkey and Google Forms, among others. Clients include Target and Toyota. The company is based in Australia with an office in Denver, Colorado. The firm’s Web site is https://www.kapiche.com. The firm offers applied text analytics.

Lexalytics, founded in 2003, was one of the first standalone text analytics vendors. The company’s system allows customers to “tell powerful stories from complex text data.” DarkCyber prefers to learn “stories” from the data, however. In the last 17 years, the company has not gone public nor been acquired. The firm’s Web site is https://www.lexalytics.com/.

MindGap. The MindGap identified in the article is in the business of providing “AI for business.” the company appears to be a mash up of artificial intelligence and “top tier strategy consulting:. That may be true, but we did not spot text analytics among the core competencies. The firm’s clients include Mail.ru, Gazprom, Yandex, and Huawei. The firm’s Web site is https://www.mindgap.dev/. The firm lists two employees on LinkedIn.

Primer has ingested about $60 million in venture funding since it was founded  in 2015. The company ingests text and outputs reports. The company was founded by the individual who set up Quid, another analytics company. Government and business analysts consume the outputs of the Primer system. The company’s Web site is https://primer.ai.

Semeon Analytics, now a unit of Datametrex, provides “custom language and sentiment ontology” services. Indexing and entity extraction, among other NLP modules, allows the system to deliver “insight analysis, rapid insights, and sentiment of the highest precision on the market today.” The Semeon Web site is still online at https://semeon.com.

ThoughtTrace appears to focus on analysis of text in contracts. The firm’s Web site says that its software can “find critical contract facts and opportunities.” Text analytics? Possibly, but the wording suggests search and retrieval. The company has a focus on oil and gas and other verticals. The firm’s Web site is https://www.thoughttrace.com/. (Note that the design of the Web site creates some challenges for a person looking for information.) The company, according to Crunchbase, was founded in 1999, and has three employees.

Three companies are what DarkCyber would consider text analytics firms: Aylien, Lexalytics, and Primer. The other firms mash up artificial intelligence, machine learning, and text analytics to deliver solutions which are essentially indexing and workflow tools.

Other observations include:

  1. The list is not a reliable place to locate flagship vendors; specifically, only three of the 10 companies cited in the article could be considered contenders in this sector.
  2. The text analytics capabilities and applications are scattered. A person looking for a system which is designed to handle email would have to examine the 10 listings and work from a single pointer, Alkymi.
  3. The selection of vendors confuses technical disciplines; for example, AI, machine learning, NLP, etc.

The list appears to have been generated in a short Zoom meeting, not via a rigorous selection and analysis process. Perhaps one of the vendors’ text analytics systems could have been used. Primer’s system comes to mind as one possibility. But that, of course, is work for a real journalist today.

Stephen E Arnold, October 16, 2020

Natural Language Processing: Useful Papers Selected by an Informed Human

July 28, 2020

Nope, no artificial intelligence involved in this curated list of papers from a recent natural language conference. Ten papers are available with a mouse click. Quick takeaway: Adversarial methods seem to be a hot ticket. Navigate to “The Ten Must Read NLP/NLU Papers from the ICLR 2020 Conference.” Useful editorial effort and a clear, adult presentation of the bibliographic information. Kudos to jakubczakon.

Stephen E Arnold, July 27, 2020

Cambridge Analytica: Maybe a New Name and Some of the Old Methods?

December 29, 2019

DarkCyber spotted an interesting factoid in “HH Plans to Work with the Re-Branded Cambridge Analytica to Influence 2021 Elections.”

The new company, Auspex International, will keep former Cambridge Analytica director Mark Turnbull at the helm.

Who is HH? He is President Hakainde Hichilema, serving at this time in Zambia.

The business focus of Auspex is, according to the write up:

We’re not a data company, we’re not a political consultancy, we’re not a research company and we’re not necessarily just a communications company. We’re a combination of all four.—Ahmad *Al-Khatib, a Cairo born investor

You can obtain some information about Auspex at this url: https://www.auspex.ai/.

DarkCyber noted the use of the “ai” domain. See the firm’s “What We Believe” information at this link. It is good to have a reason to get out of bed in the morning.

Stephen E Arnold, December 29, 2019

Google Trends Used to Reveal Misspelled Wirds or Is It Words?

November 25, 2019

We spotted a listing of the most misspelled words in each of the USA’s 50 states. Too bad Puerto Rico. Kentucky’s most misspelled word is “ninety.” Navigate to Considerable and learn what residents cannot spell. How often? Silly kweston.

The listing includes some bafflers and may reveal what can go wrong with data from an online ad sales data collection system; for example:

  • Washington, DC (which is not a state in DarkCyber’s book) cannot spell “enough”; for example, “enuf already with these televised hearings and talking heads”
  • Idaho residents cannot spell embarrassed, which as listeners to Kara Swisher know has two r’s and two s’s. Helpful that.
  • Montana residents cannot spell “comma.” Do those in Montana use commas?
  • And not surprisingly, those in Tennessee cannot spell “intelligent.” Imagine that!

What happens if one trains smart software on these data?

Sumthink mite go awf the railz.

Stephen E Arnold, November 25, 2019

Gender Bias in Old Books. Rewrite Them?

October 9, 2019

Here is an interesting use of machine learning. Salon tells us “What Reading 3.5 Million Books Tells Us About Gender Stereotypes.” Researchers led by University of Copenhagen’s Dr. Isabelle Augenstein analyzed 11 billion English words in literature published between 1900 and 2008. Not surprisingly, the results show that adjectives about appearance were most often applied to women (“beautiful” and “sexy” top the list), while men were more likely to be described by character traits (“righteous,” “rational,” and “brave” were most frequent). Writer Nicole Karlis describes how the team approached the analysis:

“Using machine learning, the researchers extracted adjectives and verbs connected to gender-specific nouns, like ‘daughter.’ Then the researchers analyzed whether the words had a positive, negative or neutral point of view. The analysis determined that negative verbs associated with appearance are used five times more for women than men. Likewise, positive and neutral adjectives relating to one’s body appearance occur twice as often in descriptions of women. The adjectives used to describe men in literature are more frequently ones that describe behavior and personal qualities.

“Researchers noted that, despite the fact that many of the analyzed books were published decades ago, they still play an active role in fomenting gender discrimination, particularly when it comes to machine learning sorting in a professional setting. ‘The algorithms work to identify patterns, and whenever one is observed, it is perceived that something is “true.” If any of these patterns refer to biased language, the result will also be biased,’ Augenstein said. ‘The systems adopt, so to speak, the language that we people use, and thus, our gender stereotypes and prejudices.’” Augenstein explained this can be problematic if, for example, machine learning is used to sift through employee recommendations for a promotion.”

Karlis does list some caveats to the study—it does not factor in who wrote the passages, what genre they were pulled from, or how much gender bias permeated society at the time. The research does affirm previous results, like the 2011 study that found 57% of central characters in children’s books are male.

Dr. Augenstein hopes her team’s analysis will raise awareness about the impact of gendered language and stereotypes on machine learning. If they choose, developers can train their algorithms on less biased materials or program them to either ignore or correct for biased language.

Cynthia Murrell, October 9, 2019

Trovicor: A Slogan as an Equation

August 2, 2019

We spotted this slogan on the Trovicor Web site:

The Trovicor formula: Actionable Intelligence = f (data generation; fusion; analysis; visualization)

The function consists of four buzzwords used by vendors of policeware and intelware:

  • Data generation (which suggests metadata assigned to intercepted, scraped, or provided content objects)
  • Fusion (which means in DarkCyber’s world a single index to disparate data)
  • Analysis (numerical recipes to identify patterns or other interesting data
  • Virtualization (use of technology to replace old school methods like 1950s’ style physical wire taps, software defined components, and software centric widgets).

The buzzwords make it easy to identify other companies providing somewhat similar services.

Trovicor maintains a low profile. But obtaining open source information about the company may be a helpful activity.

Stephen E Arnold, August 2, 2019

Need a Machine Learning Algorithm?

July 17, 2019

r entry

The R-Bloggers.com Web site published “101 Machine Learning Algorithms for Data Science with Cheat Sheets.” The write up recycles information from DataScienceDojo, and some of the information looks familiar. But lists of algorithms are not original. They are useful. What sets this list apart is the inclusion of “cheat sheets.”

What’s a cheat sheet?

In this particular collection, a cheat sheet looks like this:

r entry example

You can see the entry for the algorithm: Bernoulli Naive Bayes with a definition. The “cheat sheet” is a link to a python example. In this case, the example is a link to an explanation on the Chris Albon blog.

What’s interesting is that the 101 algorithms are grouped under 18 categories. Of these 18, Bayes and derivative methods total five.

No big deal, but in my lectures about widely used algorithms I highlight 10, mostly because it is a nice round number. The point is that most of the analytics vendors use the same basic algorithms. Variations among products built on these algorithms are significant.

As analytics systems become more modular — that  is, like Lego blocks — it seems that the trajectory of development will be to select, preconfigure thresholds, and streamline processes in a black box.

Is this good or bad?

It depends on whether one’s black box is a dominant solution or platform?

Will users know that this almost inevitable narrowing has upsides and downsides?

Nope.

Stephen E Arnold, July 17, 2019

New Jargon: Consultants, Start Your Engines

July 13, 2019

I read “What Is “Cognitive Linguistics“? The article appeared in Psychology Today. Disclaimer: I did some work for this outfit a long time ago. Anybody remember Charles Tillinghast, “CRM” when it referred to people, not a baloney discipline for a Rolodex filled with sales lead, and the use of Psychology Today as a text in a couple of universities? Yeah, I thought not. The Ziff connection is probably lost in the smudges of thumb typing too.

Onward: The write up explains a new spin on psychology, linguistics, and digital interaction. The jargon for this discipline or practice, if you will is:

Cognitive Linguistics

I must assume that the editorial processes at today’s Psychology Today are genetically linked to the procedures in use in — what was it, 1972? — but who knows.

excited fixed

Here’s the definition:

The cognitive linguistics enterprise is characterized by two key commitments. These are:
i) the Generalization Commitment: a commitment to the characterization of general principles that are responsible for all aspects of human language, and
ii) the Cognitive Commitment: a commitment to providing a characterization of general principles for language that accords with what is known about the mind and brain from other disciplines. As these commitments are what imbue cognitive linguistics with its distinctive character, and differentiate it from formal linguistics.

If you are into psychology and figuring out how to manipulate people or a Google ranking, perhaps this is the intellectual gold worth more than stolen treasure from Montezuma.

Several observations:

  1. I eagerly await an estimate from IDC for the size of the cognitive linguistics market, and I am panting with anticipation for a Garnter magic quadrant which positions companies as leaders, followers, outfits which did not pay for coverage, and names found with a Google search at Starbuck’s south of the old PanAm Building. Cognitive linguistics will have to wait until the two giants of expertise figure out how to define “personal computer market”, however.
  2. A series of posts from Dave Amerland and assorted wizards at SEO blogs which explain how to use the magic of cognitive linguistics to make a blog page — regardless of content, value, and coherence — number one for a Google query.
  3. A how to book from Wiley publishing called “Cognitive Linguistics for Dummies” with online reference material which may or many not actually be available via the link in the printed book
  4. A series of conferences run by assorted “instant conference” organizers with titles like “The Cognitive Linguistics Summit” or “Cognitive Linguistics: Global Impact”.

So many opportunities. Be still, my heart.

Cognitive linguistics — it’s time has come. Not a minute too soon for a couple of floundering enterprise search vendors to snag the buzzword and pivot to implementing cognitive linguistics for solving “all your information needs.” Which search company will embrace this technology: Coveo, IBM Watson, Sinequa?

DarkCyber is excited.

Stephen E Arnold, July 13, 2019

Sentiment Analysis: Can a Monkey Can Do It?

June 27, 2019

Sentiment analysis is a machine learning tool companies are employing to understand how their customers feel about their services and products. It is mainly deployed on social media platforms, including Facebook, Instagram, and Twitter. The Monkey Learn blog details how sentiment analysis is specifically being used on Twitter in the post, “Sentiment Analysis Of Twitter.”

Using sentiment analysis is not a new phenomenon, but there are still individuals unaware of the possible power at their fingertips. Monkey Learn specializes in customer machine learning solutions that include intent, keywords, and, of course, sentiment analysis. The post is a guide on the basics of sentiment analysis: what it is, how it works, and real life examples. Monkey Learn defines sentiment analysis as:

Sentiment analysis (a.k.a opinion mining) is the automated process of identifying and extracting the subjective information that underlies a text. This can be either an opinion, a judgment, or a feeling about a particular topic or subject. The most common type of sentiment analysis is called ‘polarity detection’ and consists in classifying a statement as ‘positive’, ‘negative’ or ‘neutral’.”

It also relies on natural language processing (NLP) to understand the information’s context.

Monkey Learn explains that sentiment analysis is important because most of the world’s digital data is unstructured. Machine learning with NLP’s assistance can quickly sort large data sets and detect their polarity. Monkey Learn promises with their sentiment analysis to bring their customers scalability, consistent criteria, and real-time analysis. Many companies are using Twitter sentiment analysis for customer service, brand monitoring, market research, and political campaigns.

The article is basically a promotional piece for Monkey Learn, but it does work as a starting guide for sentiment analysis.

Whitney Grace, June 27, 2019

Next Page »

  • Archives

  • Recent Posts

  • Meta