What Is Better Than Biometrics Emotion Analysis of Surveillance Videos?

October 27, 2022

Many years ago, my team worked on a project to parse messages, determine if a text message was positive or negative, and flag the negative ones. Then of those negative messages, our job was to rank the negative messages in a league table. The team involved professionals in my lab in rural Kentucky, some whiz kids in big universities, a handful of academic experts, and some memorable wizards located offshore. (I have some memories, but, alas, these are not suitable for this write up.)

We used the most recent mechanisms to fiddle information from humanoid outputs. Despite the age of some numerical recipes, we used the latest and greatest. What surprised everyone is that our approach worked, particularly for the league table of the most negative messages. After reviewing our data, we formulated a simple, speedy way to pinpoint the messages which required immediate inspection by a person.

What was our solution for the deployable system?

Did we rely on natural language processing? Nope.

Did we rely on good old Reverend Bayes? Nope.

Did we rely on statistical analysis? Nope.

How did we do this? (Now keep in mind this was more than 15 years ago.)

We used a look up table of keywords.

Why? It delivered the league table of the most negative messages more than 85 percent of the time. The lookups were orders of magnitude faster than the fancy numerical recipes. The system was explainable. The method was extensible to second order negative messages with synonym expansion and, in effect, a second pass on the non-really negative messages. Yep, we crept into the 90 percent range.

I thought about this work for a company which went the way of most lavishly funded wild and crazy start ups from the go to years when I read “U.K. Watchdog Issues First of Its Kind Warning Against ‘Immature’ Emotional Analysis Tech.” This article addresses fancy methods for parsing images and other content to determine if a person is happy or sad. In reality, the purpose of these systems for some professional groups is to identify a potential bad actor before that individual creates content for the “if it bleeds, it leads” new organizations.

The article states:

The Information Commissioner’s Office, Britain’s top privacy watchdog, issued a searing warning to companies against using so-called “emotional analysis” tech, arguing it’s still “immature” and that the risks associated with it far outweigh any potential benefits.

You should read the full article to get the juicy details. Remember the text approach required one level of technology. We used a look up table because the magical methods were too expensive and too time consuming when measured against what was needed: Reasonable accuracy.

Taking videos and images, processing them, and determining if the individual in the image is a good actor or a bad actor, a happy actor or a sad actor, a nut job actor or a relative of Mother Teresa’s is another kettle of code.

Let’s go back to the question which is the title of this blog post: What Is Better Than Biometrics Emotion Analysis?

The answer is objective data about the clicks, dwell time, and types of indexed content an individual consumes. Lots of clicks translates to a signal of interest. Dwell time indicates attention. Cross correlate these data with other available information from primary sources and one can pinpoint some factoids that are useful in “knowing” about an individual.

My interest in the article was not the source article’s reminder that expectations for a technology are usually over inflated. My reaction was, “Imagine how useful TikTok data would be in identify individuals with specific predilections, mood changes plotted over time, and high value signals about an individual’s interests.”

Yep, just a reminder that TikTok is in a much better place when it comes to individual analysis than relying on some complicated methods which don’t work very well.

Practical is better.

Stephen E Arnold, October 27, 2022

Word Problems Are Tricky for AI Language Models

October 27, 2022

If you have trouble with word problems, rest assured you are in good company. Machine-learning researchers have only recently made significant progress teaching algorithms the concept. IEEE Spectrum reports, “AI Language Models Are Struggling to ‘Get’ Math.” Writer Dan Garisto states:

“Until recently, language models regularly failed to solve even simple word problems, such as ‘Alice has five more balls than Bob, who has two balls after he gives four to Charlie. How many balls does Alice have?’ ‘When we say computers are very good at math, they’re very good at things that are quite specific,’ says Guy Gur-Ari, a machine-learning expert at Google. Computers are good at arithmetic—plugging numbers in and calculating is child’s play. But outside of formal structures, computers struggle. Solving word problems, or ‘quantitative reasoning,’ is deceptively tricky because it requires a robustness and rigor that many other problems don’t.”

Researchers threw a couple datasets with thousands of math problems at their language models. The students still failed spectacularly. After some tutoring, however, Google’s Minerva emerged as a star pupil, having achieved 78% accuracy. (Yes, the grading curve is considerable.) We learn:

“Minerva uses Google’s own language model, Pathways Language Model (PaLM), which is fine-tuned on scientific papers from the arXiv online preprint server and other sources with formatted math. Two other strategies helped Minerva. In ‘chain-of-thought prompting,’ Minerva was required to break down larger problems into more palatable chunks. The model also used majority voting—instead of being asked for one answer, it was asked to solve the problem 100 times. Of those answers, Minerva picked the most common answer.”

Not a practical approach for your average college student during an exam. Researchers are still not sure how much Minerva and her classmates understand about the answers they are giving, especially since the more problems they solve the fewer they get right. Garisto notes language models “can have strange, messy reasoning and still arrive at the right answer.” That is why human students are required to show their work, so perhaps this is not so different. More study is required, on the part of both researchers and their algorithms.

Cynthia Murrell, October 27, 2022

A Data Taboo: Poisoned Information But We Do Not Discuss It Unless… Lawyers

October 25, 2022

In a conference call yesterday (October 24, 2022), I mentioned one of my laws of online information; specifically, digital information can be poisoned. The venom can be administered by a numerically adept MBA or a junior college math major taking short cuts because data validation is hard work. The person on the call was mildly surprised because the notion of open source and closed source “facts” intentionally weaponized is an uncomfortable subject. I think the person with whom I was speaking blinked twice when I pointed what should be obvious to most individuals in the intelware business. Here’s the pointy end of reality:

Most experts and many of the content processing systems assume that data are good enough. Plus, with lots of data any irregularities are crunched down by steamrolling mathematical processes.

The problem is that articles like “Biotech Firm Enochian Says Co Founder Fabricated Data” makes it clear that MBA math as well as experts hired to review data can be caught with their digital clothing in a pile. These folks are, in effect, sitting naked in a room with people who want to make money. Nakedness from being dead wrong can lead to some career turbulence; for example, prison.

The write up reports:

Enochian BioSciences Inc. has sued co-founder Serhat Gumrukcu for contractual fraud, alleging that it paid him and his husband $25 million based on scientific data that Mr. Gumrukcu altered and fabricated.

The article does not explain precisely how the data were “fabricated.” However, someone with Excel skills or access to an article like “Top 3 Python Packages to Generate Synthetic Data” and Fiverr.com or similar gig work site can get some data generated at a low cost. Who will know? Most MBAs math and statistics classes focus on meeting targets in order to get a bonus or amp up a “service” fee for clicking a mouse. Experts who can figure out fiddled data sets take the time if they are motivated by professional jealousy or cold cash. Who blew the whistle on Theranos? A data analyst? Nope. A “real” journalist who interviewed people who thought something was goofy in the data.

My point is that it is trivially easy to whip up data to support a run at tenure or at a group of MBAs desperate to fund the next big thing as the big tech house of cards wobbles in the winds of change.

Several observations:

  1. The threat of bad or fiddled data is rising. My team is checking a smart output by hand because we simply cannot trust what a slick, new intelware system outputs. Yep, trust is in short supply among my research team.
  2. Individual inspection of data from assorted open and closed sources is accepted as is. The attitude is that the law of big numbers, the sheer volume of data, or the magic of cross correlation will minimize errors. Sure these processes will, but what if the data are weaponized and crafted to avoid detection? The answer is to check each item. How’s that for a cost center?
  3. Uninformed individuals (yep, I am including some data scientists, MBAs, and hawkers of data from app users) don’t know how to identify weaponized data nor know what to do when such data are identified.

Does this suggest that a problem exists? If yes, what’s the fix?

[a] Ignore the problem

[b] Trust Google-like outfits who seek to be the source for synthetic data

[c] Rely on MBAs

[d] Rely on jealous colleagues in the statistics department with limited tenure opportunities

[e] Blink.

Pick one.

Stephen E Arnold, October 25, 2022

Proposed EU Rule Would Allow Citizens to Seek Restitution for Harmful AI

October 10, 2022

It looks like the European Commission is taking the potential for algorithms to cause harm seriously. The Register reports, “Europe Just Might Make it Easier for People to Sue for Damage Caused by AI Tech.”  Vice-president for values and transparency V?ra Jourová frames the measure as a way to foster trust in AI technologies. Apparently EU officials believe technical innovation is helped when the public knows appropriate guardrails are in place. What an interesting perspective. Writer Katyanna Quach describes:

“The proposed AI Liability Directive aims to do a few things. One main goal is updating product liability laws so that they effectively cover machine-learning systems and lower the burden-of-proof for a compensation claimant. This ought to make it easier for people to claim compensation, provided they can prove damage was done and that it’s likely a trained model was to blame. This means someone could, for instance, claim compensation if they believe they’ve been discriminated against by AI-powered recruitment software. The directive opens the door to claims for compensation following privacy blunders and damage caused by poor safety in the context of an AI system gone wrong. Another main aim is to give people the right to demand from organizations details of their use of artificial intelligence to aid compensation claims. That said, businesses can provide proof that no harm was done by an AI and can argue against giving away sensitive information, such as trade secrets. The directive is also supposed to give companies a clear understanding and guarantee of what the rules around AI liability are.”

Officials hope such clarity will encourage developers to move forward with AI technologies without the fear of being blindsided by unforeseen allegations. Another goal is to build the current patchwork of AI standards and legislation across Europe into a cohesive set of rules. Commissioner for Justice Didier Reynders declares citizen protection top priority, stating, “technologies like drones or delivery services operated by AI can only work when consumers feel safe and protected.” Really? I’d like to see US officials tell that to Amazon.

Cynthia Murrell, October 10, 2022

The Push for Synthetic Data: What about Poisoning and Bias? Not to Worry

October 6, 2022

Do you worry about data poisoning, use of crafted data strings to cause numerical recipes to output craziness, and weaponized information shaped by a disaffected MBA big data developer sloshing with DynaPep?

No. Good. Enjoy the outputs.

Yes. Too bad. You lose.

For a rah rah, it’s sunny in Slough look at synthetic data, read “Synthetic Data Is the Safe, Low-Cost Alternative to Real Data That We Need.”

The sub title is:

A new solution for data hungry AIs

And the sub sub title is:

Content provided by IBM and TNW.

Let’s check out what this IBM content marketing write up says:

One example is Task2Sim, an AI model built by the MIT-IBM Watson AI Lab that creates synthetic data for training classifiers. Rather than teaching the classifier to recognize one object at a time, the model creates images that can be used to teach multiple tasks. The scalability of this type of model makes collecting data less time consuming and less expensive for data hungry businesses.

What are the downsides of synthetic data? Downsides? Don’t be silly:

Synthetic data, however it is produced, offers a number of very concrete advantages over using real world data. First of all, it’s easier to collect way more of it, because you don’t have to rely on humans creating it. Second, the synthetic data comes perfectly labeled, so there’s no need to rely on labor intensive data centers to (sometimes incorrectly) label data. Third, it can protect privacy and copyright, as the data is, well, synthetic. And finally, and perhaps most importantly, it can reduce biased outcomes.

There is one, very small, almost miniscule issue stated in the write up; to wit:

As you might suspect, the big question regarding synthetic data is around the so-called fidelity — or how closely it matches real-world data. The jury is still out on this, but research seems to show that combining synthetic data with real data gives statistically sound results. This year, researchers from MIT and the MIT-IBM AI Watson Lab showed that an image classifier that was pretrained on synthetic data in combination with real data, performed as well as an image classifier trained exclusively on real data.

I loved the “seems to show” phrase I put in bold face. Seems is such a great verb. It “seems” almost accurate.

But what about that disaffected MBA developer fiddling with thresholds?

I know the answer to this question, “That will never happen.”

Okay, I am convinced. You know the “we need” thing.

Stephen E Arnold, October 6, 2022

Google and Its Smart Software: Marketing Fodder and Investment Compost

September 29, 2022

Alphabet Google YouTube DeepMind is “into” smart software. The idea is that synthetic data, off-the-shelf models, and Google’s secret sauce will work wonders. Now this series of words is catnip for AGYD’s marketing and sales professionals. Grrrreat, as Tony the Tiger used to say about a fascinating cereal decades ago. Grrreat!

However, there may be a slight disconnect between the AGYD smart software papers, demonstrations, and biology-shaking protein thing and the cold, hard reality of investment payback. Keep in mind that AGYD is about money, not the social shibboleths in the stream of content marketing.

Google Ventures Shelves Its Algorithm” states:

Google Ventures has mothballed an algorithm that for years had served as a gatekeeper for new investments… GV [Google Ventures] still relies heavily on data. After all, this is the corporate venture arm of Google. But data has been relegated to its original role as aide, rather than arbiter.

I interpreted the report to mean: Yikes! It does not work and Googley humans have to make decisions about investments.

The spin is that the algos are helpful. But the decision is humanoid.

I wonder, “What other AGYD algos don’t deliver what users, advertisers, and Googlers expected?”

Google listens to those with lots of money at risk. Does Google listen to other constituencies? Did Google take the criticism of its smart software to heart?

My hunch is that the smart software is lingo perfect for marketing outputs. Some of the outputs of the smart software are compost, rarely shown to the public and not sniffed by too many people. Will Tony the Tiger inhale and growl, “Grrreat”? Sure, sure, Tony will.

Stephen E Arnold, September 29, 2022

Psycho AI: Seems Possible

September 29, 2022

As if we needed to be convinced, scientists at MIT conducted an experiment that highlights the importance of machine learning data quality. The U.S. Sun reports, “Rogue Robot: ‘Psychopath AI’ Created by Scientists who Fed It Content from ‘Darkest Corners of Web’.” Citing this article from the BBC, writer Jona Jaupi tells us the demented AI is aptly named Norman (as in Psycho’s Norman Bates). We also learn:

“The aim of this experiment was to see how training AI on data from ‘the dark corners of the net’ would alter its viewpoints. ‘Norman’ was pumped with continuous image captions from macabre Reddit groups that share death and gore content. And this resulted in the AI meeting traditional ‘psychopath‘ criteria, per psychiatrists. Researchers came to their diagnosis after showing ‘Norman’ the Rorschach test. The test comprises a series of inkblots and depending on how viewers interpret them, they can indicate mental disorders. AI with neutral training interprets the images as day-to-day objects like umbrellas. However, ‘Norman’ appeared to perceive the images as executions and car crashes.”

Lovely. This horror-in-horror-out result should be no surprise to anyone who follows developments in AI and machine learning. The researchers say this illustrates AI bias is not the fault of algorithms themselves but of the data they are fed. Perhaps, but that is a purely academic distinction as long as unbiased datasets remain figments of imagination. While some point to synthetic data as the solution, that approach has its own problems. Despite the dangers, the world is being increasingly run by algorithms. We are unlikely to reverse course, so each development team will just have to choose which flawed method to embrace.

Cynthia Murrell, September 29, 2022

Palantir Technologies: Not Intelware, Now a Leader in Artificial Intelligence

September 27, 2022

I spotted this rather small advertisement in the Wall Street Journal dead tree edition on September 22, 2022. (I have been on the road and I had a stack of newspapers to review upon my return, so I may have the date off by a day or two. No big deal.)

Here’s the ad:

palantir ad fixed

A couple of points jumped out. First, Palantir says in this smallish ad, “Palantir. The industry leader in artificial intelligence software.” That’s a very different positioning for the intelware centric company. I think Palantir was pitching itself a business intelligence solution and maybe a mechanism to identify fraud. Somewhere along the line there was a save the planet or save the children angle to the firm’s consulting-centric solutions.

For me, “consulting centric solutions” means that software (some open source, some whipped up by wizards) is hooked together by Palantir-provided or Palantir-certified engineers. The result is a dashboard with functionality tailored to a licensee’s problem. The money is in the consulting services for this knowledge work. Users of Palantir can fiddle, but to deliver real rock ‘em sock ‘em outputs, the bill by the hour folks are needed. This is no surprise to those familiar with migrations of software developed for one thing which is then, in a quest for revenues, is morphed into a Swiss Army knife and some wowza PowerPoint presentations and slick presentations at conferences. Feel free to disagree, please.

The second thing I noticed is that Palantir presents other leaders in smart software; specifically, the laggards at Microsoft, IBM, Amazon, and the Google. There are many ways to rank leaders. One distinction Palantir has it that it is not generating much of a return for those who bought the company’s stock since the firm’s initial public offering. On the other hand, the other four outfits, despite challenges, don’t have Palantir’s track record in the money department. (Yes, I know the core of Palantir made out for themselves, but the person I know in Harrod’s Creek who bought shares after the IPO: Not a good deal at this time.

The third thing is that Google, which has been marketing the heck out of its smart software is dead last in the Palantir list. Google and its estimable DeepMind outfit is probably not thrilled to be sucking fumes from Microsoft, IBM, and the outstanding product search solution provider Amazon. Google has articles flowing from Medium, technical papers explaining the magic of its AI/ML approach, and cheerleaders in academia and government waving pom poms for the GOOG.

I have to ask myself why? Here’s a breakdown of the notes I made after my team and I talked about this remarkable ad:

  1. Palantir obviously thinks its big reputation can be conveyed in a small ad. Palantir is perhaps having difficulty thinking objectively about the pickle the company’s sales team is in and wants to branch out. (Hey, doesn’t this need big ads?)
  2. Palantir has presented a ranking which is bound to irritate some at Amazon AWS. I have heard that some Palantir clients and some Palantir’s magic software runs on AWS. Is this a signal that Palantir wants to shift cloud providers? Maybe to the government’s go-to source of PowerPoint?
  3. Palantir may want to point out that Google’s Snorkeling and diversity methods are, in fact, not too good. Lagging behind a company like Palantir is not something the senior managers consider after a morning stretching routine.

Net net: This marketing signal, though really small, may presage something more substantive. Maybe a bigger ad, a YouTube video, a couple of TikToks, and some big sales not in the collectible business would be useful next steps. But the AI angle? Well, it is interesting.

Stephen E Arnold, September 27, 2022

Robots Write Poems for Better or Verse

September 23, 2022

Remember studying the Romantic poets and memorizing the outputs of Percy Bysshe Shelley? What about Lord Byron and his problematic foot which he tucked under a chair as he crafted “Don Juan.” What about that cocktail party thing by TS Eliot? No, well, don’t worry. Those poets will not have traction in the poetical outputs of 2022 and beyond.

Robots Are Writing Poetry, and Many People Can’t Tell the Difference” reports:

Dozens of websites, with names like Poetry Ninja or Bored Human, can now generate poems with a click of a key. One tool is able to free-associate images and ideas from any word “donated” to it. Another uses GPS to learn your whereabouts and returns with a haiku incorporating local details and weather conditions (Montreal on December 8, 2021, at 9:32 a.m.: “Thinking of you / Cold remains / On Rue Cardinal.”) Twitter teems with robot verse: a bot that mines the platform for tweets in iambic pentameter it then turns into rhyming couplets; a bot that blurts out Ashbery-esque questions (“Why are coins kept in changes?”); a bot that constructs tiny odes to trending topics. Many of these poetry generators are DIY projects that operate on rented servers and follow preset instructions not unlike the fill-in-the-blanks algorithm that powered Racter. But, in recent years, artificial-intelligence labs have unveiled automated bards that emulate, with sometimes eerie results, the more conscious, reflective aspects of the creative process.

The main point of the article is not that Microsoft’s smart software can knock out Willie-like sonnets. The article states what I think is a very obvious point:

There is no question that poetry will be subsumed, and soon, into the ideology of data collection, existing on the same spectrum as footstep counters, high-frequency stock trading, and Netflix recommendations. Maybe this is how the so-called singularity—the moment machines exceed humans and, in turn, refashion us—comes about. The choice to off-load the drudgery of writing to our solid-state brethren will happen in ways we won’t always track, the paradigm shift receding into the background, becoming omnipresent, normalized.

The write up asserts:

as long as the ability to write poems remains a barrier for admission into the category of personhood, robots will stay Racters. Against the onslaught of thinking machines, poetry is humanity’s last, and best, stand.

Wrong. Plus, Gen Z wizards can’t read cursive. Too bad.

Stephen E Arnold, September 23, 2022

Let Technology Solve the Problem: Ever Hear of Russell and His Paradox?

September 21, 2022

I read “You Can’t Solve AI Security Problems with More AI.” The main idea, in my opinion, is that Russell’s Paradox is alive and well. The article states:

When you’re engineering for security, a solution that works 99% of the time is no good. You are dealing with adversarial attackers here. If there is a 1% gap in your protection they will find it—that’s what they do!

Obvious? Yep. That one percent is an issue. But the belief that technology can solve a problem is more of a delusional, marketing-oriented approach to reality. Some informed people are confident that one percent does not make much of a difference. Maybe? But what about a smart software system that is generating outputs with probabilities greater than one percent. Can technology address these issues? The answer offered by some is, “Sure, we have added this layer, that process, and these procedures to deliver accuracy in the 85, 90, or 95 percent range. Yes, that’s “confidence.”

The write up points out:

Trying to prevent AI attacks with more AI doesn’t work like this. If you patch a hole with even more AI, you have no way of knowing if your solution is 100% reliable. The fundamental challenge here is that large language models remain impenetrable black boxes. No one, not even the creators of the model, has a full understanding of what they can do.


The article has what I think is a quite helpful suggestion; to wit:

There may be systems that should not be built at all until we have a robust solution.

What if we generalize beyond the issue of cyber security? What if we think about the smart software “fixing up” the problems in today’s zippy digitized world?

Rethink, go slow, and remembering Russell’s Law? Not a chance.

Stephen E Arnold, September 21, 2022

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta