A Gentle Ripple in the Datasphere: Soft Fraud

May 18, 2022

Compared with some of the cyber behavior, soft fraud is a small fish, possibly a candiru. My definition of “soft fraud” is a behavior which does not violate the letter of the law. The spirit of the law? That’s a matter for discussion.

Soft fraud sits squarely between the Bernie Madoff-type play and a clueless Web designed happily leading a user into a rat’s nest of captchas.

I have been nagging my research team to look for examples of behavior which though technically legal in the country from which the actor operates, trigger a visceral reaction in some people.

What’s an example of soft fraud?

Apple and the Subscription Trick

Recently Apple announced that an authorized vendor with the Johnny Appleseed seal of approval can sell an Apple customer a subscription at a cut rate price. When the trial or initial order expires, the vendor can just raise the price. The customer does not have to be reminded that billing excitement ensues. What’s a customer to do? Call Apple customer support? Ho ho ho. That works like the feedback forms for podcasts. Perhaps call the outfit selling the subscription? Ha ha ha. No one works, and if they do, these valiant souls operate from office space in a beautiful suburb of Mumbai.  That’s an example of what I call soft fraud. Apple may disagree, but that — so far — is my personal opinion. See “Apple will allow some apps to Automatically Charge You Higher Subscription Prices.”

Say One Thing, Do Whatever One Wants

Examples of this abound. I recall executives from Amazon, Facebook, and Google explaining how their businesses operate. In addition to the popular, “senator, thank you for the question,” the core response was “I will check and send you the information.” In the meantime what happens, absolutely no substantive change in the business processes under discussion. Hiring and firing issues. I will check and send you the information. Monopolistic and predatory behaviors. I will check and send you the information. Content manipulation via oh, so opaque smart software. I will check and send you the information. Yep, I nudge these methods into the soft fraud category. See “Facebook, Twitter and Google CEOs Grilled by Congress on Misinformation.”

The Copyright Violation Play

This is a cute money making maneuver involving some big names. The idea is that an agent representing some “big names” uses ageing image recognition software. The software bot prowls the Web looking for images whose hash code matches that of the rights holder. When a match is identified, an outfit with permission to move forward with legal action against the copyright violators springs into action. You can get a sense of what’s happening in this sector by check out some of these online articles and comments. Note: These may be distorted, crazy, or dead center. I leave it to you:



















New Opportunity?

My hunch is that soft fraud is likely to get a boost. I noted “DeviantArt Can Now Notify Anyone Whose Art’s Been Used in NFTs without Permission.” The write up explains:

DeviantArt, an online art and design community founded in 2000, is now opening up its NFT protection tool to everyone… You can pay $9.95 per month to get protection for 1,000 pieces of art with a size limit of 50GB.

Is this an opportunity for an individual or entity to use the service to request payment for the NFT. The NFT holder might be grateful for getting control of the bitmap or other digital object. Would the helpful intermediary charge whatever the market will bear and then take a professional services fee?

This strikes me as perfectly legal. The existing copyright laws have a Disneyland feel about them from my perspective.

Net net: Soft fraud may benefit from the advent of NFT and services like that offered by DeviantArt, which is an interesting name in my opinion. Will regulators seize the day and create a category to handle soft fraud, mishandling of NFTs, and other innovations? Sure. Job One after re-election, fund raising, and getting media attention.

Stephen E Arnold, May 18, 2022

Google, Smart Software, and Prime Mover for Hyperbole

May 17, 2022

In my experience, the cost of training smart software is very big problem. The bigness does not become evident until the licensee of a smart system realizes that training the smart software must take place on a regular schedule. Why is this a big problem? The reason is the effort required to assemble valid training sets is significant. Language, data types, and info peculiarities change over time; for example, new content is fed into a smart system, and the system cannot cope with the differences between the training set that was used and the info flowing into the system now. A gap grows, and the fix is to assemble new training data, reindex the content, and get ready to do it again. A failure to keep the smart software in sync with what is processed is a tiny bit of knowledge not explained in sales pitches.

Accountants figure out that money must be spent on a cost not in the original price data. Search systems return increasingly lousy results. Intelligence software outputs data which make zero sense to a person working out a surveillance plan. An art history major working on a PowerPoint presentation cannot locate the version used by the president of the company for last week’s pitch to potential investors.

The accountant wants to understand overruns associated with smart software, looks into the invoices and time sheets, and discovers something new: Smart software subject matter experts, indexing professionals, interns buying third-party content from an online vendor called Elsevier. These are not what CPAs confront unless there are smart software systems chugging along.

The big problem is handled in this way: Those selling the system don’t talk too much about how training is a recurring cost which increases over time. Yep, reindexing is a greedy pig and those training sets have to be tested to see if the smart software gets smarter.

The fix? Do PR about super duper even smarter methods of training. Think Snorkel. Think synthetic data. Think PowerPoint decks filled with jargon that causes clueless MBAs do high fives because the approach is a slam dunk. Yes! Winner!

I read “DeepMind’s Astounding New ‘Gato’ AI Makes Me Fear Humans Will Never Achieve AGI” and realized that the cloud of unknowing has not yet yield to blue skies. The article states:

Just like it took some time between the discovery of fire and the invention of the internal combustion engine, figuring out how to go from deep learning to AGI won’t happen overnight.

No kidding. There are gotchas beyond training, however. I have a presentation in hand which I delivered in 1997 at an online conference. Training cost is one dot point; there are five others. Can you name them? Here’s a hint for another big issue: An output that kills a patient. The accountant understands the costs of litigation when that smart AI makes a close enough for horseshoes output for a harried medical professional. Yeah, go catscan, go.

Stephen E Arnold, May 17, 2022

A Really Tiny Issue with Language AIs

May 13, 2022

Why not rely on Google Docs’s linguistic AI instead of thinking for yourself? Well, there may be some issues with that approach. The Hustle advises, “Don’t Rely on Google Docs’ New Writing Tool to Do Your Work for You.” Writer Juliet Bennett Rylah explains:

“AI is fun when it eats horror movies and churns out Netflix’s Mr. Puzzles Wants You to Be Less Alive. But sometimes the lack of context is an issue. Take Google Docs’ new ‘assistive writing’ feature, which makes suggestions as you write (e.g., switching from passive to active voice or deleting repetitive words). This makes your writing more accessible, which is great. It may also suggest more inclusive language, while flagging words that could be deemed inappropriate. That’s cool, in theory, but many users have found the suggestions to be, well, a little weird. Motherboard tested out several text excerpts. While the tool suggested more gender-inclusive phrasing (e.g., ‘policemen’ to ‘police officers’), it also flagged the word ‘Motherboard.’ And while it suggested ‘property owner’ in lieu of ‘landlord,’ it didn’t flag anything in a slur-laden interview with ex-KKK leader David Duke.”

Is that good? As always, it comes down one inconvenient fact: An AI is only as good as its machine-learning materials. It seems those are as rife with bias as ever, and language tools are far from immune. Alas, it will be a while before we can confidently source out our writing to an algorithm. Meanwhile, Rylah suggests this guide for humans wishing to employ more inclusive language.

Cynthia Murrell, May 13, 2022

Google and Skin Color: What Is AI Unable to Learn?

May 12, 2022

In the wake of high school science club management innovations, the Google has turned its attention to skin color. “Google Adopts 10-Step Skin Tone Scale to Teach Diversity to Its AI” reports:

Google has adopted a 10-grade scale to help it better judge and present skin tones, a change that highlights the tech giant’s efforts to better reflect the range of people who use Google Photos, search and other products.

Interesting. Pantone (the color for printers people) has a YouTube video with more than 100 skin tones. Not to be outdone, , I recall seeing on Creativa Fabrica a chart with 180 skin colors.

Will 10 do the trick? I assume that Google’s smart software was not able to identify human skin color using the “learning while processing” methods of some AI wizards. But 10? That seems like a modest number when a cosmetic outfit requires 60 to move its products.

Why would a consumer products company waste money on unneeded skin hues? Maybe L’Oral is just not Googley?

Stephen E Arnold, May 12, 2022

AI as a Service Dominated by the Usual Science Club Members

May 12, 2022

A burgeoning AI-as-a-Service field may enable businesses of all sizes to take advantage of AI tools without the high cost of developing their own solutions in-house. So declares ReadWrite in its look at the “Growth of AI as a Service (AIaaS) Market.” Writer Neeraj Agarwal touts the purported advantages of such tools, which include NLP, robotics, machine learning, and computer vision:

“After having a view of what the future holds for you, businesses will be able to:

* Create strategies specifically for regional and country based on figures and facts.

* Identify where and when to invest.

* Outperform against key competitors using data and the drivers and trends shaping the market.

* Understand customers based on the latest market research findings.

* Grow Businesses by strategically positioning themselves in tech run.”

Sounds great. But we wonder what happens when just a few companies dominate. We are told:

“Big Companies like Microsoft, IBM, Google, and other market leaders have actively introduced AI services in their business models, increasing their reach and revenue without much time investments.”

So are smaller firms are shut out of this lucrative market? If so, does the lack of competition limit the benefits of these tools? These points are not addressed in the write-up. It does share some other information about the AIaaS arena, however, like how much, and in which directions, it has grown:

“The global AI-as-a-Service market was valued at USD 1.35 Bn in 2016 and is estimated to reach USD 43.1 Bn by 2028, at a CAGR of 46.9% during the forecast period. The base year considered for the study is 2017, while the forecast period is between 2018 and 2028. The compound annual growth rate (CAGR) forecast till 2028 of the different categories of the AI market are:

* AI Services- 22%

* AI Hardware- 20.5%

* AI platforms- 34.6%

* AI System Infrastructure Software- 14.1%

* AI Lifecycle Software- 38.9%

* AI Business Services- 21.9%

The market has been bifurcated into cloud-based and on-premises deployment models based on the deployment model.”

The piece also discusses types of applications available, segmentation of the market, and an analysis of competition among the major players. See the article for those details.

Cynthia Murrell, May 12, 2022

Smart Software: Bureaucrats Race the 20 Somethings

May 11, 2022

One of the ArnoldIT team is delivering a talk about smart software. One of our points is that AI is moving along the innovation curve and bright young sprouts are installing “smart” software in interesting applications.

The European Union is nervous about smart software. Johnny on the spot is that bureaucratic outfit.

One of the most common tropes in science-fiction is a computer or a robot achieving sentience. In the stories, technology gains sentience in a variety of ways: lightning strikes, digital evolution, alien intervention, etc. While some of the stories end on a positive note, many end with a warning message to humanity: Don’t play God. Tech Radar explains that the European Union is taking preventive measures: “New EU Rules Would Allow It To Shut Down Ai Before It Got Dangerous.”

The European Union has worked on an AI regulation framework since March 2018 as part of its Digital Decade regulations. Work on AI regulations has been slow because the EU has focused on the Digital Services Act and the Digital Markets Act that manage how much power American tech companies can have.

The EU AI Act is also undergoing a review and critique phase through the Ada Lovelace Institute, an independent research facility that works on data policy. The Ada Lovelace Institute scrutinizes the AI Act:

“The full report (via TechCrunch) includes a lot of detail on the pros and cons of the regulation, which is a global first, with the main takeaway is that the EU is setting itself up to have some pretty powerful tools at its disposal. The EU plans to create and empower oversight bodies that can, theoretically, order the withdrawal of an AI system that is deemed high risk, before requiring the model be retrained. The draft AI Act has been under a lot of scrutiny – and has received a fair amount of criticism – and will likely still fall short of the EU’s most expansive goals: creating the conditions for “trustworthy” and “human-centric” AI.”

The current EU AI Act needs to be revised, but that does not mean it is a failure. The act is a good beginning to creating a viable framework to govern AI.

Our fearless but quite aged leader (Stephen E Arnold) believes that it may be difficult to regulate smart software, especially in the United States where big tech companies are influential in the economy and politics. AI often thrives in powerful black boxes that are inordinately programmed with ethnic and socioeconomic biases. Developers have yet to remove these biases because at the Google there may be zero biases or the datasets are synthetic. (Yep, that means what you think it means: Statistical confections and close enough for horseshoes outputs.)

Can the EU set the standard for how AI is regulated across the globe, a bit like kicking the Russian oil and natural gas habits? Worth watching… from a distance.

Whitney Grace, May 11, 2022

Issues with the Zuckbook Smart Software: Imagine That

May 10, 2022

I was neither surprised by nor interested in “Facebook’s New AI System Has a ‘High Propensity’ for Racism and Bias.” The marketing hype encapsulated in PowerPoint decks and weaponized PDF files on Arxiv paint fantastical pictures of today’s marvel-making machine learning systems. Those who have been around smart software and really stupid software for a number of years understand two things: PR and marketing are easier than delivering high-value, high-utility systems and smart software works best when tailored and tuned to quite specific tasks. Generalized systems are not yet without a few flaws. Addressing these will take time, innovation, and money. Innovation is scarce in many high-technology companies. The time and money factors dictate that “good enough” and “close enough for horseshoes” systems and methods are pushed into products and services. “Good enough” works for search because no one knows what is in the index. Comparative evaluations of search and retrieval is tough when users (addicts) operate within a cloud of unknowing. The “close enough for horseshoes” produces applications which are sort of correct. Perfect for ad matching and suggesting what Facebook pages or Tweets would engage a person interested in tattoos or fad diets.

The cited article explains:

Facebook and its parent company, Meta, recently released a new tool that can be used to quickly develop state-of-the-art AI. But according to the company’s researchers, the system has the same problem as its predecessors: It’s extremely bad at avoiding results that reinforce racist and sexist stereotypes.

My recollection is that the Google has terminated some of its wizards and transformed these professionals into Xooglers in the blink of an eye. Why? Exposing some of the issues that continue to plague smart software.

Those interns, former college professors, and start up engineers rely on techniques used for decades. These are connected together, fed synthetic data, and bolted to an application. The outputs reflect the inherent oddities of the methods; for example, feed the system images spidered from Web sites and the system “learns” what is on the Web sites. Then generalize from the Web site images and produce synthetic data. The who process zooms along and costs less. The outputs, however, have minimal information about that which is not on a Web site; for example, positive images of a family in a township outside of Cape Town.

The write up states:

Meta researchers write that the model “has a high propensity to generate toxic language and reinforce harmful stereotypes, even when provided with a relatively innocuous prompt.” This means it’s easy to get biased and harmful results even when you’re not trying. The system is also vulnerable to “adversarial prompts,” where small, trivial changes in phrasing can be used to evade the system’s safeguards and produce toxic content.

What’s new? These issues surfaced in the automated content processing in the early versions of the Autonomy Neuro Linguistic Programming approach. The fix was to retrain the system and tune the outputs. Few licensees had the appetite to spend the money needed to perform the retraining and reindexing of the processed content when the search results drifted into weirdness.

Since the mid 1990s, have developers solved this problem?


Has the email with this information reached the PR professionals and the art history majors with a minor in graphic design who produce PowerPoints? What about the former college professors and a bunch of interns and recent graduates?


What’s this mean? Here’s my view:

  1. Narrow applications of smart software can work and be quite useful; for example, the Preligens system for aircraft identification. Broad applications have to be viewed as demonstrations or works in progress.
  2. The MBA craziness which wants to create world-dominating methods to control markets must be recognized and managed. I know that running wild for 25 years creates some habits which are going to be difficult to break. But change is needed. Craziness is not a viable business model in my opinion.
  3. The over-the-top hyperbole must be identified. This means that PowerPoint presentations should carry a warning label: Science fiction inside. The quasi-scientific papers with loads of authors who work at one firm should carry a disclaimer: Results are going to be difficult to verify.

Without some common sense, the flood of semi-functional smart software will increase. Not good. Why? The impact of erroneous outputs will cause more harm than users of the systems expect. Screwing up content filtering for a political rally is one thing; outputting an incorrect medical action is another.

Stephen E Arnold, May 10, 2022

Voyager Labs Exposed: Another NSO Group?

May 10, 2022

I read “Voyager Labs: L’Arma spuntata dell’intelienza artificiale.” I was expecting some high-flying smart software. What the article delivers is some juicy detail about intelware, conferences where quite non-public stories are told, and an alleged tie up between those fine folks at Palantir Technologies and the shadowy Israeli company. One caveat: One has to be able to read Italian or have a way to work around the limitations of online translation systems. (Good luck with finding a free to use system. I just asked my local Pizza Hut delivery person, who speaks and reads Italian like a Roma fan.)

Here are some allegedly spot on factoids from the write up:

  • One of the directors of the company has a remarkably unusual career at a US government agency. The individual presided over specialized interrogation activities and allowing a person with a bomb to enter a government facility. There were a handful of deaths.
  • The Voyager Labs’ cloud services are allegedly “managed globally by Palantir’s Gotham platform.
  • Voyager’s Labs’ content was described at an intelligence conference owned and managed by an American in this way: “usable and previously unattainable information by analyzing and understanding huge amounts of open, deep and obscure Web data.”
  • Allegations about the use of Voyager Labs’ system to influence an Italian election.
  • Voyager Labs identifies for licensees people with red, orange, and green icons. Green is good; red is bad; orange is in the middle?

Interesting stuff. But the zinger is the assertion that Voyager Labs’ smart software can output either dumb or aberrant results. The whiz kids at Gartner Group concluded in 2017 that Voyager Labs was a “cool vendor.” That’s good to know. Gartner likes intelware that sort of works. Cool.

Interesting profile and there are more than 100 footnotes. I assume that the founder of Voyager Labs, the conference organizer, and assorted clients were not will to participate in an interview. This is an understandable position, particularly when an Israeli outfit could be the next in the NSO Group spotlight.

Stephen E Arnold, May 10, 2022

Let Us Let Google Think for Us? Yeah, Why Not?

May 10, 2022

What wonderful news… for the Google.

TechRadar reports that “Google Docs Will Now Practically Do Your Writing for You.” What an effective way to nudge language and information a certain direction. Docs’ “Smart Compose” feature already offers autocomplete suggestions as one types but, citing a recent Google blog post, writer Joel Khalili explains how its AI is poised to make even more “helpful” recommendations:

“The company is adding a number of new ‘assistive writing features’ to the word processing software, including synonym and sentence structure suggestions. The service will also flag up any ‘inappropriate’ language, as well as instances in which the writer would be better served by using the active rather than passive voice. … The arrival of further recommendation features for Docs is another step in the campaign to make the company’s product suite more intelligent. ‘Suggestions will appear as you type and help guide you when there are opportunities to avoid repeated or unnecessary words, helping diversify your writing and ensuring you’re using the most effective word for the situation,’ Google explained. ‘We hope this will help elevate your writing style and make more dynamic, clear, inclusive, and concise documents.’ When the tools are active, suggestions will be underlined in purple. Selecting the underline will bring up a small pop up that prompts the user to accept or decline the change. These suggestions will be switched on by default, but can be deactivated under the Tools menu at the top of the page.”

At least users who prefer to choose their own words have the option to turn suggestions off. The write-up states these new AI intrusions are being rolled out to all premium business customers now, a process that should be complete by the end of April. Alas, they are not available to Workspace Essentials, Business Starter, nor Enterprise Essentials users.

Cynthia Murrell, May 10, 2022

Do Marketers See You As Special? Nope.

May 9, 2022

I read “Forget Personalisation, It’s Impossible and It Doesn’t Work.” My hunch is that the idea that a zippy modern system would “know” a user, assemble an appropriate info-filter, and display what that individual required has lost traction. I remember Pointcast and Desktop Data which suggested a user could get the information he/she/it/them needed each day. My recollection is that individual information needs in business changed. Fiddling with the filters was a hassle. As a result, the services were novel at first and then became a hassle. Maybe automation via processes tuned to figure out what the user needed would make such services more useful. If memory serves, the increasing costs of making these systems work within budget and developer constraints were not very good. The most recent example is my explanation of how a Google alert is about half right or half wrong when it flags an item I am supposed to need. See this “Cheerleading” article.

The Forget Personalisation write up calls individuation “the worst idea in the marketing industry.” The statement is not exactly a vote of confidence, is it? The article states:

There’s just one little problem with personalisation: it doesn’t make any sense.

I thought marketing types were optimists. I am wrong again.

The article includes some factoids about the accuracy of third party data. These are infobits which allows marketers and investigators to pinpoint behaviors and even identify people. Here’s what the article reports as actual factual:

Spoiler alert: it’s not. Most third-party data is, to put it politely, garbage. In an academic study from MIT and Melbourne Business School, researchers decided to test the accuracy of third-party marketing data. So, how accurate is gender targeting? It’s accurate 42.3% of the time. How accurate is age targeting? It’s accurate between 4% and 44% of the time. And those are the numbers for the leading global data brokers.

I assume that this is a news flash because informed individuals from investigative reporters at the Wall Street Journal to law enforcement administrators assume that data gathered from clicks, apps, and other high value inputs are “accurate.” Well, sometimes yes, but in my experience 50 to 75 percent accuracy is darned good. Lower scores are common. The 95 percent accuracy is doable under certain circumstances.

What’s the fix? Once again marketers have the answer. Keep in mind that many marketers majored in business administration or art history. Just sayin’. Note this solutions from the cited article:

Marketers would be much better off investing in ‘performance branding’; in other words, one-size-fits-most creative that speaks to the common category needs of all potential buyers, all the time. This is a much simpler approach that also happens to be supported by the evidence. Reach is, and has always been, the greatest predictor of marketing success.

I think this means TikTok. What do you think?

And the future? Impersonalization. And how does Marketing Week know this? Here’s the source of the insight:

Gartner predicts 80% of marketers will abandon personalisation by 2025.

Yep, Gartner. Wow. Solid indeed.

Net net: Those marketing types are on the beam. What else does not work in marketing? Smart ad matching to a user query?

Stephen E Arnold, May 9, 2022

Next Page »

  • Archives

  • Recent Posts

  • Meta