Smart Software: Can the Outputs Be Steered Like a Mini Van? Well, Yesssss

October 13, 2023

Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Nature Magazine may have exposed the crapola output about how the whiz kids in the smart software game rig their game. Want to know more? Navigate to “Reproducibility Trial: 246 Biologists Get Different Results from Same Data Sets.” The write up explains “how analytical choices drive conclusions.”

Baking in biases. “What shall we fiddle today, Marvin?” Marvin replies, “Let’s adjust what video is going to be seen by millions.” Thanks, for nameless and faceless, MidJourney.

Translating Nature speak, I think the estimable publication is saying, “Those who set thresholds and assemble numerical recipes can control outcomes.” An example might be suppressing certain types of information and boosting other information. If one is clueless, the outputs of the system will be the equivalent of “the truth.” JPMorgan Chase found itself snookered by outputs to the tune of $175 million. Frank Financial’s customer outputs were algorithmized with the assistance of some clever people. That’s how the smartest guys in the room were temporarily outfoxed by a 31 year old female Wharton person.

What about outputs from any smart system using open source information. That’s the same inputs to the smart system. But the outputs? Well, depending on who is doing the threshold setting and setting up the work flow of the processed information, there are some opportunities to shade, shape, and weaponize outputs.

Nature Magazine reports:

Despite the wide range of results, none of the answers are wrong, Fraser says. Rather, the spread reflects factors such as participants’ training and how they set sample sizes. So, “how do you know, what is the true result?” Gould asks. Part of the solution could be asking a paper’s authors to lay out the analytical decisions that they made, and the potential caveats of those choices, Gould [Elliot Gould, an ecological modeler at the University of Melbourne] says. Nosek [Brian Nosek, executive director of the Center for Open Science in Charlottesville, Virginia] says ecologists could also use practices common in other fields to show the breadth of potential results for a paper. For example, robustness tests, which are common in economics, require researchers to analyze their data in several ways and assess the amount of variation in the results.

Translating Nature speak: Individual analyses can be widely divergent. A method to normalize the data does not seem to be agreed upon.

Thus, a widely used smart software can control framing on a mass scale. That means human choices buried in a complex system will influence “the truth.” Perhaps I am not being fair to Nature? I am a dinobaby. I do not have to be fair just like the faceless and hidden “developers” who control how the smart software is configured.

Stephen E Arnold, October 13, 2023

Written by Stephen E. Arnold · Filed Under AI, News, Text analytics | Comments Off on Smart Software: Can the Outputs Be Steered Like a Mini Van? Well, Yesssss

Are Open Source Investigators Multiplying Like Star Trek Tribbles?

October 13, 2023

Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

The idea of using the Internet to solve crimes is not a new idea. I learned about “open source” in 1981 when I worked in the online unit of the Courier Journal & Louisville Times Co. A fellow named Robert David Steele contacted me. He wanted to meet me when I was in Washington, DC. My recollection is that he showed up in a quasi-military outfit and preceded to explain that commercial online information was important to intelligence professionals. He wanted free access to our databases, and I politely explained that access was available via online timesharing services and from specialized vendors. He was not happy, but eventually we became tolerant of one another and ended up working on a number of interesting projects. Now open source information or OSINT is the go-to method for conducting research, investigations, gathering intelligence, and identifying persons of interest.

An OSINT investigator tracks down with OSINT geo tools the animal suspected of eating a knowledge worker’s flowers. Thanks, MidJourney. Close to Sherlock, but not on the money.

Until the surprise attack on Israel, it seemed as if open source or OSINT could work wonders. It didn’t, and (spoiler alert) it cannot. OSINT is one source of actionable information. Steele and I collaborated on numerous presentations and used this diagram to explain where OSINT fit into the world of professional information gatherers:

Open source is one pillar of the intelligence infrastructure. The keystone of OSINT is the staff, the management method, the techniques used to fuse and analyze source information, and presenting it in a way that makes sense to others.

I mention this because I read “The Disturbing Rise of Amateur Internet Detectives.” Please, consult the original to get a feel for the point of view of the author and the implicit endorsement of Netflix programming.

However, I want to highlight one passage from the article:

What’s the future of web sleuthing? It’s clear amateur online detectives are to stay. The depths of the internet can encourage our worst instincts – but also, as these series prove, our best, too. The trend for programs celebrating these sleuths, though, is harder to welcome. It’s difficult to avoid the sense that they amplify the messy, fractious instincts of the online world, and make sleuths reluctant celebrities. Dragging them into the limelight can misrepresent their work, doing a disservice to their peculiar talents and experiences. Still, there is an undeniable pull to the world of online sleuthing. We can expect far more coverage of that murky empire.

What is interesting to me is that OSINT has moved from an almost unknown activity to the mainstream. Who would have anticipated TV shows about online investigations. Even more surprising is the number of people who have adopted the method as an avocation. Others have set up businesses because the founder is an expert in OSINT. Amazing or shocking? I have not decided.

I have formulated several observations; these are:

Determining what is important and then verifying the accuracy of the information are different skills from doing Google searches or using an OSINT toolkit
Machine-generated content can degrade the accuracy of some of the most sophisticated OSINT intelligence systems. The surprise attack on Israel is a grim reminder of the limitations of highly sophisticated, multi-language, multi-source systems
Gathering intelligence is not an activity conducted without care, careful consideration, and a keen awareness of the cognitive blind spots that each human possesses.

Net net: Pursuing a career is OSINT is probably a better choice than trying to become an influencer on TikTok.

Stephen E Arnold, October 13, 2023

Written by Stephen E. Arnold · Filed Under News, OSINT | Comments Off on Are Open Source Investigators Multiplying Like Star Trek Tribbles?

Google Bard: Expensive and Disappointing? The Answer Is… Ads?

October 13, 2023

Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Google seemed to have hit upon a great idea to position its chatbot above the competition: personalize the output by linking it to users’ content across their Gmail, Docs, Drive, Maps, YouTube, and other Googleverse accounts. Unfortunately, according to VentureBeat‘s Michael Nuñez, “Google Bard Fails to Deliver on its Promise—Even After Latest Updates.” After putting Bard through its paces, Nuñez reports the AI does not, in fact, play well with Google apps and still supplies wrong or nonsensical answers way too often. He writes:

“I stress-tested Bard’s new capabilities by trying dozens of prompts that were similar to the ones advertised by Google in last week’s launch. For example, I asked Bard to pull up the key points from a document in Docs and create an email summary. Bard responded by saying ‘I do not have enough information’ and refused to pull up any documents from my Google Drive. It later poorly summarized another document and drafted an unusable email for me. Another example: I asked Bard to find me the best deals on flights from San Francisco to Los Angeles on Google Flights. The chat responded by drafting me an email explaining how to search manually for airfare on Google Flights. Bard’s performance was equally dismal when I tried to use it for creative tasks, such as writing a song or a screenplay. Bard either ignored my input or produced bland and boring content that lacked any originality or flair. Bard also lacks any option to adjust its creativity level, unlike GPT-4, which has a dial that allows the user to control how adventurous or conservative the output is.”

Nuñez found Bard particularly lacking when compared to OpenAI’s GPT-4. It is rumored that Microsoft-backed project has been trained on a dataset of 1.8 trillion parameters, while Bard’s underlying model, PaLM 2, is trained a measly 340 billion. GPT-4 also appears to have more personality, which could be good, bad, or indifferent depending on one’s perspective. The write-up allows one point in Bard’s favor: a built in feature can check its answers against a regular Google search and highlight any dubious information. Will Google’s next model catch up to OpenAI as the company seems to hope?

Cynthia Murrell, October 13, 2023

Written by Stephen E. Arnold · Filed Under AI, Google, News | Comments Off on Google Bard: Expensive and Disappointing? The Answer Is… Ads?

Big, Fat AI Report: Free and Meaty for Marketing Collateral

October 12, 2023

Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Curious about AI, machine learning, and smart software? You will want to obtain a free (at least as of October 6, 2023) report called “Artificial Intelligence Index Report 2023.” The 386 page PDF contains information selected to make it clear that AI is a big deal. There is no reference to the validity of the research conducted for the document. I find that interesting since the president of Stanford University stepped carefully from the speeding world of academia to find his future elsewhere. Making up data seems to be a signature feature of outfits like Stanford and, of course, Harvard.

A Musk-inspired robot reads a print out of the PDF report. The robot looks … like a robot. Thanks, Microsoft Bing. You do a good robot.

But back to the report.

For those who lack the time and swipe left deflector, an two page summary identifies the big finds from the work. Let me highlight three or 30 percent of the knowledge gems. Please, consult the full report for the other seven discoveries. No blood pressure reduction medicine is needed, but you may want to use the time between plays at an upcoming NFL game to work through the full document.

Three big reveals:

AI continued to post state-of-the-art results, but year-over-year improvement on many benchmarks continues to be marginal.
… The number of AI-related job postings has increased on average from 1.7% in 2021 to 1.9% in 2022.
An AI Index analysis of the legislative records of 127 countries shows that the number of bills containing “artificial intelligence” that were passed into law grew from just 1 in 2016 to 37 in 2022.

My interpretation of these full suite of 10 key points: The hype is stabilizing.

Who funded the project. Not surprisingly the Google and OpenAI kicked in. There is a veritable who is who of luminaries and high-profile research outfits providing some assistance as well. Headhunters will probably want to print out the pages with the names and affiliations of the individuals listed. One never knows where the next Elon Musk lurks.

The report has eight chapters, but the bulk of the information appears in the first four; to wit:

R&D
Technical performance
Technical AI ethics
The economy.

I want to be up front. I scanned the document. Does it confront issues like the objective of Google and a couple of other firms dominating the AI landscape? Nah. Does it talk about the hallucination and ethical features of smart software? Nah. Does it delve into the legal quagmire which seems to be spreading faster than dilapidated RVs parked on El Camino Real? Nah.

I suggest downloading a copy and checking out the sections which appear germane to your interests in AI. I am happy to have a copy for reference. Marketing collateral from an outfit whose president resigned due to squishy research does not reassure me. Yes, integrity matters to me. Others? Maybe not.

Stephen E Arnold, October 12, 2023

Written by Stephen E. Arnold · Filed Under AI, Business strategy, News | Comments Off on Big, Fat AI Report: Free and Meaty for Marketing Collateral

India: Okay, No More CSAM or Else the Cash Register Will Ring

October 12, 2023

Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

X (the Tweeter thing), YouTube, and Telegram get a tough assignment. India wants child sexual abuse material or CSAM for those who want to do acronym speak scrubbed from content or services delivered in the great nation of India. There are some interesting implications for these US technology giants. First, the outfits are accustomed to just agreeing and not doing much to comply with government suggestions. In fact, most of the US high-tech firms offer promises, and those can be slippery fish. Second, determining what is and what is not CSAM can be a puzzler as well. Bad actors are embracing smart software and generating some realistic images and videos without having to find, coerce, film, and pay off humans involved in the distasteful but lucrative business. Questions about the age of a synthetic child porno star are embarrassing to ask and debate. Remember the need for a diverse group to deliberate about such matters. Also, the advent of smart software invites orchestration so that text prompts can be stuffed into a system. The system happily outputs videos with more speed than a human adult industry star speeding to a shoot after a late call. Zeros and ones are likely to take over CSAM because … efficiency.

“India Tells X, YouTube, Telegram to Remove Any Child Sexual Abuse Material from Platforms” reports:

The companies could be stripped of their protection from legal liability if they don’t comply, the government said in a statement. The notices, sent by the federal Ministry of Electronics and Information Technology (MEITY), emphasized the importance of prompt and permanent removal of any child sexual abuse material on these platforms.

My dinobaby perspective is that [a] these outfits cannot comply because neither smart software nor legions of human content curators can keep up with the volume of videos and images pumped by these systems. [b] India probably knows that the task is a tough one and may be counting on some hefty fines to supplement other sources of cash for a delightful country. [c] Telegram poses a bit of a challenge because bad actors use Dark Web and Clear Web lures to attract CSAM addicts and then point to a private Telegram group to pay for and get delivery of the digital goods. That encryption thing may be a sticky wicket.

Net net: Some high-tech outfits may find doing business in India hotter than a Chettinad masala.

Stephen E Arnold, October 13, 2023

Written by Stephen E. Arnold · Filed Under cybercrime, Government, News | Comments Off on India: Okay, No More CSAM or Else the Cash Register Will Ring

Open Source Companies: Bet on Expandability and Extendibility

October 12, 2023

Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Naturally, a key factor driving adoption of open source software is a need to save money. However, argues Lago co-founder Anh-Tho Chuong, “Open Source Does Not Win by Being Cheaper” than the competition. Not just that, anyway. She writes:

“What we’ve learned is that open-source tools can’t rely on being an open-source alternative to an already successful business. A developer can’t just imitate a product, tag on an MIT license, and call it a day. As awesome as open source is, in a vacuum, it’s not enough to succeed. … [Open-source companies] either need a concrete reason for why they are open source or have to surpass their competitors.”

One caveat: Chuong notes she is speaking of businesses like hers, not sponsored community projects like React, TypeORM, or VSCode. Outfits that need to turn a profit to succeed must offer more than savings to distinguish themselves, she insists. The post notes two specific problems open-source developers should aim to solve: transparency and extensibility. It is important to many companies to know just how their vendors are handling their data (and that of their clients). With closed software one just has to trust information is secure. The transparency of open-source code allows one verify that it is. The extensibility advantage comes from the passion of community developers for plugins, which are often merged into the open-source main branch. It can be difficult for closed-source engineering teams to compete with the resulting extendibility.

See the write-up for examples of both advantages from the likes of MongoDB, PostHog, and Minio. Chuong concludes:

“Both of the above issues contribute to commercial open-source being a better product in the long run. But by tapping the community for feedback and help, open-source projects can also accelerate past closed-source solutions. … Open-source projects—not just commercial open source—have served as a critical driver for the improvement of products for decades. However, some software is going to remain closed source. It’s just the nature of first-mover advantage. But when transparency and extensibility are an issue, an open-source successor becomes a real threat.”

Cynthia Murrell, October 12, 2023

Written by Stephen E. Arnold · Filed Under Business strategy, News, Open source | Comments Off on Open Source Companies: Bet on Expandability and Extendibility

Intelware: Some Advanced Technology Is Not So New

October 11, 2023

Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

I read “European Spyware Consortium Supplied Despots and Dictators.” The article is a “report” about intelware vendors. The article in Spiegel International is a “can you believe this” write up. The article identifies a number of companies past and present. Plus individuals are identified.

The hook is technology that facilitates exfiltration of data from mobile devices. Mobile phones are a fashion item and a must have for many people. It does not take much insight to conclude that data on these ubiquitous gizmos can provide potentially high value information. Even better, putting a software module on a mobile device of a person of interest can save time and expense. Modern intelligence gathering techniques are little more than using technology to minimize the need for humans sitting in automobiles or technicians planting listening devices in interesting locations. The other benefits of technology include real time or near real time data acquisition, geo-location data, access to the digital information about callers and email pals, and data available to the mobile’s ever improving cameras and microphones.

The write up points out:

One message, one link, one click. That’s all it takes to lose control of your digital life, unwittingly and in a matter of seconds.

The write up is story focused, probably because a podcast or a streaming video documentary was in the back of the mind of the writers and possibly Spiegel International itself. If you like write ups that have a slant, you will find the cited article interesting.

I want to mentions several facets of the write up which get less attention from “real” journalists.

First, the story of the intelware dates back to the late 1970s. Obviously some of the technology has been around for decades, although refined over time. If this “shady” technology were a problem, why has it persisted, been refined, and pressed into service around the world by many countries? It is tempting to focus on a current activity because it makes a good story, but the context and longevity of some of the systems and methods are interesting to me. But 40 years?

Second, in the late 1970s and the block diagrams I have seen presenting the main features of the Amesys system (i2e Technologies) and its direct descendants have had remarkable robustness. In fact, were one to look at the block diagram for a system provided to a controversial government in North Africa and one of the NSO Group Pegasus block diagrams, the basics are retained. Why? A good engineering solution is useful even thought certain facets of the system are improved with modern technology. What’s this mean? From my point of view, the clever individual or group eager to replicate this type of stealth intelware can do it, just with modern tools and today’s robust cloud environment. The cloud was not a “thing” in 1980, but today it is a Teflon for intelware. This means quicker, faster, better, cheaper, and smarter with each iteration.

Source: IT News in Australia

Third, this particular type of intelware is available from specialized software companies worldwide. Want to buy a version from a developer in Spain? No problem. How about a Chinese variety? Cultivate your contacts in Hong Kong or Singapore and your wish will be granted. What about a version from an firm based in India? No problem, just hang out at telecommunications conference in Mumbai.

Net net: Newer and even more stealthy intelware technologies are available today. Will these be described and stories about the use of them be written? Yep. Will I identify some of these firms? Sure, just attend one of my lectures for law enforcement and intelligence professionals. But the big question is never answered, “Why are these technologies demonstrating such remarkable magnetic appeal?” And a related question, “Why do governments permit these firms to operate?”

Come on, Spiegel International. Write about a more timely approach, not one that is decades old and documented in detail on publicly accessible sources. Oh, is location tracking enabled on your phone to obviate some of the value of Signal, Telegram, and Threema encrypted messaging apps?

PS. Now no clicks are needed. The technology can be deployed when a mobile number is known and connected to a network. There is an exception too. The requisite code can be pre-installed on one’s mobile device. Is that a story? Nah, that cannot be true. I agree.

Stephen E Arnold, October 11, 2023

Written by Stephen E. Arnold · Filed Under Government, intelware, News | Comments Off on Intelware: Some Advanced Technology Is Not So New

Cognitive Blind Spot 4: Ads. What Is the Big Deal Already?

October 11, 2023

Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Last week, I presented a summary of Dark Web Trends 2023, a research update my team and I prepare each year. I showed a visual of the ads on a Dark Web search engine. Here’s an example of one of my illustrations:

The TorLanD service, when it is accessible via Tor, displays a search box and advertising. What is interesting about this service and a number of other Dark Web search engines is the ads. The search results are so-so, vastly inferior to those information retrieval solutions offered by intelware vendors.

Some of the ads appear on other Dark Web search systems as well; for example, Bobby and DarkSide, among others. The advertisements off a range of interesting content. TorLanD screenshot pitches carding, porn, drugs, gadgets (skimmers and software), illegal substances. I pointed out that the ads on TorLanD looked a lot like the ads on Bobby; for instance:

I want to point out that the Silk Road 4.0 and the Gadgets, Docs, Fakes ads are identical. Notice also that TorLanD advertises on Bobby. The Helsinki Drug Marketplace on the Bobby search system offers heroin.

Most of these ads are trade outs. The idea is that one Dark Web site will display an ad for another Dark Web site. There are often links to Dark Web advertising agencies as well. (For this short post, I won’t be listing these vendors, but if you are interested in this research, contact benkent2020 at yahoo dot com. One of my team will follow up and explain our for-fee research policy.)

The point of these two examples is make clear that advertising has become normalized, even among bad actors. Furthermore, few are surprised that bad actors (or alleged bad actors) communicate, pat one another on the back, and support an ecosystem to buy and sell space on the increasingly small Dark Web. Please, note that advertising appears in public and private Telegram groups focused on he topics referenced in these Dark Web ads.

Can you believe the ads? Some people do. Users of the Clear Web and the Dark Web are conditioned to accept ads and to believe that these are true, valid, useful, and intended to make it easy to break the law and buy a controlled substance or CSAM. Some ads emphasize “trust.”

People trust ads. People believe ads. People expect ads. In fact, one can poke around and identify advertising and PR agencies touting the idea that people “trust” ads, particularly those with brand identity. How does one build brand? Give up? Advertising and weaponized information are two ways.

The cognitive bias that operates is that people embrace advertising. Look at a page of Google results. Which are ads and which are ads but not identified. What happens when ads are indistinguishable from plausible messages? Some online companies offer stealth ads. On the Dark Web pages illustrating this essay are law enforcement agencies masquerading as bad actors. Can you identify one such ad? What about messages on Twitter which are designed to be difficult to spot as paid messages or weaponized content. For one take on Twitter technology, read “New Ads on X Can’t Be Blocked or Reported, and Aren’t Labeled as Advertisements.”

Let me highlight some of the functions on online ads like those on the Dark Web sites. I will ignore the Clear Web ads for the purposes of this essay:

Click on the ad and receive malware
Visit the ad and explore the illegal offer so that the site operator can obtain information about you
Sell you a product and obtain the identifiers you provide, a deliver address (either physical or digital), or plant a beacon on your system to facilitate tracking
Gather emails for phishing or other online initiatives
Blackmail.

I want to highlight advertising as a vector of weaponization for three reasons: [a] People believe ads. I know it sound silly, but ads work. People suspend disbelief when an ad on a service offers something that sounds too good to be true; [b] many people do not question the legitimacy of an ad or its message. Ads are good. Ads are everywhere. and [c] Ads are essentially unregulated.

What happens when everything drifts toward advertising? The cognitive blind spot kicks in and one cannot separate the false from the real.

Public service note: Before you explore Dark Web ads or click links on social media services like Twitter, consider that these are vectors which can point to quite surprising outcomes. Intelligence agencies outside the US use Dark Web sites as a way to harvest useful information. Bad actors use ads to rip off unsuspecting people like the doctor who once lived two miles from my office when she ordered a Dark Web hitman to terminate an individual.

Ads are unregulated and full of surprises. But the cognitive blind spot for advertising guarantees that the technique will flourish and gain technical sophistication. Are those objective search results useful information or weaponized? Will the Dark Web vendor really sell you valid stolen credit cards? Will the US postal service deliver an unmarked envelope chock full of interesting chemicals?

Stephen E Arnold, October 11, 2023

Written by Stephen E. Arnold · Filed Under cybercrime, law enforcement, News, OSINT, search engine | Comments Off on Cognitive Blind Spot 4: Ads. What Is the Big Deal Already?

New Website Brings Focus to State Courts and Constitutions

October 11, 2023

Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Despite appearances, constitutional law is not all about the US Supreme Court. State courts and constitutions are integral to the maintenance (and elimination) of citizen rights. But the national media often underplays or overlooks key discussions and decisions on the state level. NYU Law’s Brennan Center‘s nonpartisan State Court Report is a new resource that seeks to address that imbalance. Its About page explains:

“What’s been missing is a forum where experts come together to analyze and discuss constitutional trends emerging from state high courts, as well as a place where noteworthy state cases and case materials are easy to find and access. State constitutions share many common provisions, and state courts across the country frequently grapple with similar questions about constitutional interpretation. Enter State Court Report, which is dedicated to covering legal news, trends, and cutting-edge scholarship, offering insights and commentary from a nationwide network of academics, journalists, judges, and practitioners with diverse perspectives and expertise. By providing original content and resources that are easily accessible, State Court Report fosters informed dialogue, research, and public understanding about an essential but chronically underappreciated source of law. Our newsletter offers a deep dive into legal developments across the states. Our case database highlights notable state constitutional decisions and cases to watch in state high courts. Our state pages provide information on high courts and constitutions in all 50 states. In addition, State Court Report supports and participates in symposia, conferences, educational training, and panels, and also partners with other organizations to disseminate and share information and research.”

The organization debuts with two high-profile guest essays, from former U.S. Attorney General Eric H. Holder Jr. and former Michigan Chief Justice Bridget Mary McCormack. One can browse articles by issue or state or, of course, search for any particulars. Not surprisingly, a subscription to the newsletter is one prominent click away. We hope the Brennan Center succeeds with this effort to bring attention to important constitutional issues before they wind their way to SCOTUS.

Cynthia Murrell, October 11, 2023

Written by Stephen E. Arnold · Filed Under Legal matters, News, OSINT | Comments Off on New Website Brings Focus to State Courts and Constitutions

Data Drift: Yes, It Is Real and Feeds on False Economy Methods

October 10, 2023

Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

When I mention statistical drift, most of those in my lectures groan and look at their mobile phone. I am delighted to call attention to a write up called “The Model-Eat-Model World’ of Clinical AI: How Predictive Power Becomes a Pitfall.” The article focuses on medical information, but its message applies to a wide range of “smart” models. These include the Google shortcuts of Snorkel to the Bayesian based systems in vogue in many policeware and intelware products. The behavior appears to have influenced Dr. Timnit Gebru and contributed to her invitation to find her future elsewhere from none other than the now marginalized Google Brain group. (Googlers do not appreciate being informed of their shortcomings it seems.)

The young shark of Wall Street ponders his recent failure at work. He thinks, “I used those predictive models as I did last year. How could they have gone off the rails. I am ruined.” Thanks, MidJourney. Manet you are not.

The main idea is that as numerical recipes iterate, the outputs deteriorate or wander off the desired path. The number of cycles require to output baloney depends on the specific collections of procedures. But wander these puppies do. To provide a baseline, users of the Autonomy Bayesian system found that after three months of operation, precision and recall were deteriorated. The fix was to retrain the system. Flash forward today to systems that iterate many times faster than the Autonomy neurolinguistic programming method, and the lousy outputs can appear in a matter of hours. There are corrective steps one can take, but these are expensive when they involve humans. Thus, some predictive outputs have developed smart software to try and keep the models from jumping their railroad tracks. When the models drift, the results seem off kilter.

The write up says:

Last year, an investigation from STAT and the Massachusetts Institute of Technology captured how model performance can degrade over time by testing the performance of three predictive algorithms. Over the course of a decade, accuracy for predicting sepsis, length of hospitalization, and mortality varied significantly. The culprit? A combination of clinical changes — the use of new standards for medical coding at the hospital — and an influx of patients from new communities. When models fail like this, it’s due to a problem called data drift.

Yep, data drift.

I need to check my mobile phone. Fixing data drift is tricky and in today’s zoom zoom world, “good enough” is the benchmark of excellence. Marketers do not want to talk about data drift. What if bad things result? Let the interns fix it next summer?

Stephen E Arnold, October 10, 2023

Written by Stephen E. Arnold · Filed Under AI, Financial, News | Comments Off on Data Drift: Yes, It Is Real and Feeds on False Economy Methods

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Smart Software: Can the Outputs Be Steered Like a Mini Van? Well, Yesssss

Are Open Source Investigators Multiplying Like Star Trek Tribbles?

Google Bard: Expensive and Disappointing? The Answer Is… Ads?

Big, Fat AI Report: Free and Meaty for Marketing Collateral

India: Okay, No More CSAM or Else the Cash Register Will Ring

Open Source Companies: Bet on Expandability and Extendibility

Intelware: Some Advanced Technology Is Not So New

Cognitive Blind Spot 4: Ads. What Is the Big Deal Already?

New Website Brings Focus to State Courts and Constitutions

Data Drift: Yes, It Is Real and Feeds on False Economy Methods

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta