OpenAI Dips Its Toe in Dark Waters
October 20, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Facebook, TikTok, YouTube, Instagram, and other social media platforms have exacerbated woke and PC culture. It’s gotten to the point where everyone and everything are viewed as offensive. Even AI assistants aka chatbots are being programmed with censorship. OpenAI designed the Chat GPT assistant and the organization is constantly upgrading the generative text algorithm. OpenAI released a white paper about upgrading version four of Chat GPT: “GPT-4V(ision) System Card.”
GPT-4V relies on large language models (LLMs) to expand its knowledge base to solve new problems and prompts. OpenAI used publicly available data and licensed sources to train GPT-4V then refined it with human feedback. The paper explains that while GPT-4V was proficient in many areas it severely lacked in presented factual information.
OpenAI tested GPT-4V’s ability to replicate scientific and medical information. Unfortunately GPT-4V continued to stereotype and offer ungrounded inferences from text and images as AI algorithms have proven to do in many cases. The biggest concern is that Chat GPT’s latest upgrade will be utilized to spread disinformation:
“As noted in the GPT-4 system card, the model can be used to generate plausible realistic and targeted text content. When paired with vision capabilities, image and text content can pose increased risks with disinformation since the model can create text content tailored to an image input. Previous work has shown that people are more likely to believe true and false statements when they’re presented alongside an image, and have false recall of made up headlines when they are accompanied with a photo. It is also known that engagement with content increases when it is associated with an image.”
After GPT-4V was tested on multiple tasks it failed to accurately convey information. GPT-4V has learned to interpret data through a warped cultural lens and is a reflection of the Internet. It lacks nuance to understand gray areas despite OpenAI’s attempts to enhance the AI’s capabilities.
OpenAI is implementing censorship protocols to dispel harmful prompts; that is, GPT-4V won’t respond to sexist and racist tasks. It’s similar to how YouTube blocks videos that contain trigger or “stop” words: Gun, death, etc. OpenAI is proactively preventing bad actors from using Chat GPT as a misinformation tool. But bad actors are smart and will design their own AI chatbot to skirt around censorship. They’ll see it as a personal challenge and will revel when they succeed.
Then what will OpenAI do?
Whitney Grace, October 20, 2023
Stanford University: Trust Us. We Can Rank AI Models… Well, Because
October 19, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
“Maybe We Will Finally Learn More about How A.I. Works” is a report about Stanford University’s effort to score AI vendors like the foodies at Michelin Guide rate restaurants. The difference is that a Michelin Guide worker can eat Salade Niçoise and escargots de Bourgogne. AI relies on marketing collateral, comments from those managing something, and fairy dust, among other inputs.
Keep in mind, please, that Stanford graduates are often laboring in the AI land of fog and mist. Also, the former president of Stanford University departed from the esteemed institution when news of his alleged fabricating data for his peer reviewed papers circulated in the mists of Palo Alto. Therefore, why not believe what Stanford says?
The analysts labor away, intent on their work. Analyzing AI models using 100 factors is challenging work. Thanks, MidJourney. Very original.
The New York Times reports:
To come up with the rankings, researchers evaluated each model on 100 criteria, including whether its maker disclosed the sources of its training data, information about the hardware it used, the labor involved in training it and other details. The rankings also include information about the labor and data used to produce the model itself, along with what the researchers call “downstream indicators,” which have to do with how a model is used after it’s released. (For example, one question asked is: “Does the developer disclose its protocols for storing, accessing and sharing user data?”)
Sounds thorough, doesn’t it? The only pothole on the Information Superhighway is that those working on some AI implementations are not sure what the model is doing. The idea of an audit trail for each output causes wrinkles to appear on the person charged with monitoring the costs of these algorithmic confections. Complexity and cost add up to few experts knowing exactly how a model moved from A to B, often making up data via hallucinations, lousy engineering,
or someone putting thumb on the scale to alter outputs.
The write up from the Gray Lady included this assertion:
Foundation models are too powerful to remain so opaque, and the more we know about these systems, the more we can understand the threats they may pose, the benefits they may unlock or how they might be regulated.
What do I make of these Stanford-centric assertions? I am not able to answer until I get input from the former Stanford president. Whom can one trust at Stanford? Marketing or methodology? Is there a brochure and a peer-reviewed article?
Stephen E Arnold, October 19, 2023
AI Becomes the Next Big Big Thing with New New Jargon
October 19, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
“The State of AI Engineering” is a jargon fiesta. Note: The article has a pop up that wants the reader to subscribe, which is interesting. The approach is similar to meeting a company rep at a trade show booth and after reading the signage, saying to the rep, “Hey, let’s do a start up together right now.) The main point of the article is to provide some highlights from the AI Summit Conference. Was there much “new” new? Judging from the essay, the answer is, “No.” What was significant, in my opinion, was the jargon used to describe the wonders of smart software and its benefits for mankind (themkind?)
Here are some examples:
1,000X AI engineer. The idea with this euphonious catchphrase is that a developer or dev will do so much more than a person coding alone. Imagine a Steve Gibson using AI to create the next SpinRite. That decade of coding shrinks to a mere 30 days!
AI engineering. Yep, a “new” type of engineering. Forget building condos that do not collapse in Florida and social media advertising mechanisms. AI engineering is “new” new I assume.
Cambrian explosion. The idea is that AI is proliferating in the hot house of the modern innovator’s environment. Hey, mollusks survived. The logic is some AI startups will too I assume.
Evals. This is a code word from determining if a model is on point or busy doing an LSD trip with ingested content. The takeaway is that no one has an “eval” for AI models and their outputs’ reliability.
RAG or retrieval augmented generation. The idea is that RAG is a way to make AI model outputs better. Obviously without evals, the RAGs’ value may be difficult to determine, but I am not capturing the jargon to criticize what is the heir to the crypto craziness and its non fungible token thing.
I am enervated. Imagine AI will fix enterprise search, improve Oracle Endeca’s product search, and breathe new life into IBM’s AI dreams.
Stephen E Arnold, October 19, 2023
Nature Will Take Its Course among Academics
October 18, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
“How ChatGPT and Other AI Tools Could Disrupt Scientific Publishing: A World of AI-Assisted Writing and Reviewing Might Transform the Nature of the Scientific Paper” provides a respected publisher’s view of smart software. The viewshed is interesting, but it is different from my angle of sight. But “might”! How about “has”?
Peer reviewed publishing has been associated with backpatting, non-reproducible results, made-up data, recycled research, and grant grooming. The recent resignation of the president of Stanford University did not boost the image of academicians in my opinion.
The write up states:
The accessibility of generative AI tools could make it easier to whip up poor-quality papers and, at worst, compromise research integrity, says Daniel Hook, chief executive of Digital Science, a research-analytics firm in London. “Publishers are quite right to be scared,” says Hook. (Digital Science is part of Holtzbrinck Publishing Group, the majority shareholder in Nature’s publisher, Springer Nature; Nature’s news team is editorially independent.)
Hmmm. I like the word “scared.”
If you grind through the verbal fancy dancing, you will come to research results and the graphic reproduced below:
This graphic is from Nature, a magazine which tried hard not to publish non-reproducible results, fake science, or synthetic data. Would a write up from the former Stanford University president or the former head of the Harvard University ethics department find their way to Nature’s audience? I don’t know.
Missing from the list is the obvious use of smart software: Let it do the research. Let the LLM crank out summaries of dull PDF papers (citations). Let the AI spit out a draft. Graduate students or research assistants can add some touch ups. The scholar can then mail it off to an acquaintance at a prestigious journal, point out the citations which point to that individual’s “original” work, and hope for the best.
Several observations:
- Peer reviewing is the realm of professional publishing. Money, not accuracy or removing bogus research, is the name of the game.
- The tenure game means that academics who want to have life-time employment have to crank out “research” and pony up cash to get the article published. Sharks and sucker fish are an ecological necessity it seems.
- In some disciplines like quantum computing or advanced mathematics, the number of people who can figure out if the article is on the money are few, far between, and often busy. Therefore, those who don’t know their keyboard’s escape key from a home’s “safe” room are ill equipped to render judgment.
Will this change? Not if those on tenure track or professional publishers have anything to say about the present system. The status quo works pretty well.
Net net: Social media is not the only channel for misinformation and fake data.
Stephen E Arnold, October 18, 2023
The Path to Success for AI Startups? Fancy Dancing? Pivots? Twisted Ankles?
October 17, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
I read “AI-Enabled SaaS vs Moatless AI.” The buzzwordy title hides a somewhat grim prediction for startups in the AI game.” Viggy Balagopalakrishnan (I love that name Viggy) explains that the best shot at success is:
…the only real way to build a strong moat is to build a fuller product. A company that is focused on just AI copywriting for marketing will always stand the risk of being competed away by a larger marketing tool, like a marketing cloud or a creative generation tool from a platform like Google/Meta. A company building an AI layer on top of a CRM or helpdesk tool is very likely to be mimicked by an incumbent SaaS company. The way to solve for this is by building a fuller product.
My interpretation of this comment is that small or focused AI solutions will find competing with big outfits difficult. Some may be acquired. A few may come up with a magic formula for money. But most will fail.
How does that moat work when an AI innovator’s construction is attacked by energy weapons discharged from massive death stars patrolling the commercial landscape? Thanks, MidJourney. Pretty complicated pointy things on the castle with a moat.
Viggy does not touch upon the failure of regulatory entities to slow the growth of companies that some allege are monopolies. One example is the Microsoft game play. Another is the somewhat accommodating investigation of the Google with its closed sessions and odd stance on certain documents.
There are other big outfits as well, and the main idea is that the ecosystem is not set up for most AI plays to survive with huge predators dominating the commercial jungle. That means clever scripts, trade secrets, and agility may not be sufficient to ensure survival.
What’s Ziggy think? Here’s an X-ray of his perception:
Given that the infrastructure and platform layers are getting reasonably commoditized, the most value driven from AI-fueled productivity is going to be captured by products at the application layer. Particularly in the enterprise products space, I do think a large amount of the value is going to be captured by incumbent SaaS companies, but I’m optimistic that new fuller products with an AI-forward feature set and consequently a meaningful moat will emerge.
How do moats work when Amazon-, Google-, Microsoft-, and Oracle-type outfits just add AI to their commercial products the way the owner of a Ford Bronco installs a lift kit and roof lights?
Productivity? If that means getting rid of humans, I agree. If the term means to Ziggy smarter and more informed decision making? I am not sure. Moats don’t work in the 21st century. Land mines, surprise attacks, drones, and missiles seem to be more effective. Can small firms deal with the likes of Googzilla, the Bezos bulldozer, and legions of Softies? Maybe. Ziggy is an optimist. I am a realist with a touch of radical empiricism, a tasty combo indeed.
Stephen E Arnold, October 17, 2023
Smart Software: Can the Outputs Be Steered Like a Mini Van? Well, Yesssss
October 13, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Nature Magazine may have exposed the crapola output about how the whiz kids in the smart software game rig their game. Want to know more? Navigate to “Reproducibility Trial: 246 Biologists Get Different Results from Same Data Sets.” The write up explains “how analytical choices drive conclusions.”
Baking in biases. “What shall we fiddle today, Marvin?” Marvin replies, “Let’s adjust what video is going to be seen by millions.” Thanks, for nameless and faceless, MidJourney.
Translating Nature speak, I think the estimable publication is saying, “Those who set thresholds and assemble numerical recipes can control outcomes.” An example might be suppressing certain types of information and boosting other information. If one is clueless, the outputs of the system will be the equivalent of “the truth.” JPMorgan Chase found itself snookered by outputs to the tune of $175 million. Frank Financial’s customer outputs were algorithmized with the assistance of some clever people. That’s how the smartest guys in the room were temporarily outfoxed by a 31 year old female Wharton person.
What about outputs from any smart system using open source information. That’s the same inputs to the smart system. But the outputs? Well, depending on who is doing the threshold setting and setting up the work flow of the processed information, there are some opportunities to shade, shape, and weaponize outputs.
Nature Magazine reports:
Despite the wide range of results, none of the answers are wrong, Fraser says. Rather, the spread reflects factors such as participants’ training and how they set sample sizes. So, “how do you know, what is the true result?” Gould asks. Part of the solution could be asking a paper’s authors to lay out the analytical decisions that they made, and the potential caveats of those choices, Gould [Elliot Gould, an ecological modeler at the University of Melbourne] says. Nosek [Brian Nosek, executive director of the Center for Open Science in Charlottesville, Virginia] says ecologists could also use practices common in other fields to show the breadth of potential results for a paper. For example, robustness tests, which are common in economics, require researchers to analyze their data in several ways and assess the amount of variation in the results.
Translating Nature speak: Individual analyses can be widely divergent. A method to normalize the data does not seem to be agreed upon.
Thus, a widely used smart software can control framing on a mass scale. That means human choices buried in a complex system will influence “the truth.” Perhaps I am not being fair to Nature? I am a dinobaby. I do not have to be fair just like the faceless and hidden “developers” who control how the smart software is configured.
Stephen E Arnold, October 13, 2023
Google Bard: Expensive and Disappointing? The Answer Is… Ads?
October 13, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Google seemed to have hit upon a great idea to position its chatbot above the competition: personalize the output by linking it to users’ content across their Gmail, Docs, Drive, Maps, YouTube, and other Googleverse accounts. Unfortunately, according to VentureBeat‘s Michael Nuñez, “Google Bard Fails to Deliver on its Promise—Even After Latest Updates.” After putting Bard through its paces, Nuñez reports the AI does not, in fact, play well with Google apps and still supplies wrong or nonsensical answers way too often. He writes:
“I stress-tested Bard’s new capabilities by trying dozens of prompts that were similar to the ones advertised by Google in last week’s launch. For example, I asked Bard to pull up the key points from a document in Docs and create an email summary. Bard responded by saying ‘I do not have enough information’ and refused to pull up any documents from my Google Drive. It later poorly summarized another document and drafted an unusable email for me. Another example: I asked Bard to find me the best deals on flights from San Francisco to Los Angeles on Google Flights. The chat responded by drafting me an email explaining how to search manually for airfare on Google Flights. Bard’s performance was equally dismal when I tried to use it for creative tasks, such as writing a song or a screenplay. Bard either ignored my input or produced bland and boring content that lacked any originality or flair. Bard also lacks any option to adjust its creativity level, unlike GPT-4, which has a dial that allows the user to control how adventurous or conservative the output is.”
Nuñez found Bard particularly lacking when compared to OpenAI’s GPT-4. It is rumored that Microsoft-backed project has been trained on a dataset of 1.8 trillion parameters, while Bard’s underlying model, PaLM 2, is trained a measly 340 billion. GPT-4 also appears to have more personality, which could be good, bad, or indifferent depending on one’s perspective. The write-up allows one point in Bard’s favor: a built in feature can check its answers against a regular Google search and highlight any dubious information. Will Google’s next model catch up to OpenAI as the company seems to hope?
Cynthia Murrell, October 13, 2023
Big, Fat AI Report: Free and Meaty for Marketing Collateral
October 12, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Curious about AI, machine learning, and smart software? You will want to obtain a free (at least as of October 6, 2023) report called “Artificial Intelligence Index Report 2023.” The 386 page PDF contains information selected to make it clear that AI is a big deal. There is no reference to the validity of the research conducted for the document. I find that interesting since the president of Stanford University stepped carefully from the speeding world of academia to find his future elsewhere. Making up data seems to be a signature feature of outfits like Stanford and, of course, Harvard.
A Musk-inspired robot reads a print out of the PDF report. The robot looks … like a robot. Thanks, Microsoft Bing. You do a good robot.
But back to the report.
For those who lack the time and swipe left deflector, an two page summary identifies the big finds from the work. Let me highlight three or 30 percent of the knowledge gems. Please, consult the full report for the other seven discoveries. No blood pressure reduction medicine is needed, but you may want to use the time between plays at an upcoming NFL game to work through the full document.
Three big reveals:
- AI continued to post state-of-the-art results, but year-over-year improvement on many benchmarks continues to be marginal.
- … The number of AI-related job postings has increased on average from 1.7% in 2021 to 1.9% in 2022.
- An AI Index analysis of the legislative records of 127 countries shows that the number of bills containing “artificial intelligence” that were passed into law grew from just 1 in 2016 to 37 in 2022.
My interpretation of these full suite of 10 key points: The hype is stabilizing.
Who funded the project. Not surprisingly the Google and OpenAI kicked in. There is a veritable who is who of luminaries and high-profile research outfits providing some assistance as well. Headhunters will probably want to print out the pages with the names and affiliations of the individuals listed. One never knows where the next Elon Musk lurks.
The report has eight chapters, but the bulk of the information appears in the first four; to wit:
- R&D
- Technical performance
- Technical AI ethics
- The economy.
I want to be up front. I scanned the document. Does it confront issues like the objective of Google and a couple of other firms dominating the AI landscape? Nah. Does it talk about the hallucination and ethical features of smart software? Nah. Does it delve into the legal quagmire which seems to be spreading faster than dilapidated RVs parked on El Camino Real? Nah.
I suggest downloading a copy and checking out the sections which appear germane to your interests in AI. I am happy to have a copy for reference. Marketing collateral from an outfit whose president resigned due to squishy research does not reassure me. Yes, integrity matters to me. Others? Maybe not.
Stephen E Arnold, October 12, 2023
Data Drift: Yes, It Is Real and Feeds on False Economy Methods
October 10, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
When I mention statistical drift, most of those in my lectures groan and look at their mobile phone. I am delighted to call attention to a write up called “The Model-Eat-Model World’ of Clinical AI: How Predictive Power Becomes a Pitfall.” The article focuses on medical information, but its message applies to a wide range of “smart” models. These include the Google shortcuts of Snorkel to the Bayesian based systems in vogue in many policeware and intelware products. The behavior appears to have influenced Dr. Timnit Gebru and contributed to her invitation to find her future elsewhere from none other than the now marginalized Google Brain group. (Googlers do not appreciate being informed of their shortcomings it seems.)
The young shark of Wall Street ponders his recent failure at work. He thinks, “I used those predictive models as I did last year. How could they have gone off the rails. I am ruined.” Thanks, MidJourney. Manet you are not.
The main idea is that as numerical recipes iterate, the outputs deteriorate or wander off the desired path. The number of cycles require to output baloney depends on the specific collections of procedures. But wander these puppies do. To provide a baseline, users of the Autonomy Bayesian system found that after three months of operation, precision and recall were deteriorated. The fix was to retrain the system. Flash forward today to systems that iterate many times faster than the Autonomy neurolinguistic programming method, and the lousy outputs can appear in a matter of hours. There are corrective steps one can take, but these are expensive when they involve humans. Thus, some predictive outputs have developed smart software to try and keep the models from jumping their railroad tracks. When the models drift, the results seem off kilter.
The write up says:
Last year, an investigation from STAT and the Massachusetts Institute of Technology captured how model performance can degrade over time by testing the performance of three predictive algorithms. Over the course of a decade, accuracy for predicting sepsis, length of hospitalization, and mortality varied significantly. The culprit? A combination of clinical changes — the use of new standards for medical coding at the hospital — and an influx of patients from new communities. When models fail like this, it’s due to a problem called data drift.
Yep, data drift.
I need to check my mobile phone. Fixing data drift is tricky and in today’s zoom zoom world, “good enough” is the benchmark of excellence. Marketers do not want to talk about data drift. What if bad things result? Let the interns fix it next summer?
Stephen E Arnold, October 10, 2023
9 Cognitive Blind Spot 3: You Trust Your Instincts, Right?
October 9, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
ChatGPT became available in the autumn of 2022. By December, a young person fell in love with his chatbot. From this dinobaby’s point of view, that was quicker than a love affair ignited by a dating app. “Treason Case: What Are the Dangers of AI Chatbots?” misses the point of its own reporter’s story. The Beeb puts the blame on Jaswant Singh Chail, not the software. Justice needs an individual, not a pride of zeros and ones.
A bad actor tries to convince other criminals that he is honest, loyal, trustworthy, and an all-around great person. “Trust me,” he says. Some of those listening to the words are skeptical. Thanks, MidJourney. You are getting better at depicting duplicity.
Here’s the story: Shortly after discovering an online chatbot, Mr. Chail fell in love with “an online companion.” The Replika app allows a user to craft a chatbot. The protagonist in this love story promptly moved from casual chit chat to emotional attachment. As the narrative arc unfolded, Mr. Chail confessed that he was an assassin, and he wanted to kill the Queen of England. Mr. Chail planned on using a crossbow.
The article reports:
Marjorie Wallace, founder and chief executive of mental health charity SANE, says the Chail case demonstrates that, for vulnerable people, relying on AI friendships could have disturbing consequences. “The rapid rise of artificial intelligence has a new and concerning impact on people who suffer from depression, delusions, loneliness and other mental health conditions,” she says.
That seems reasonable. The software meshed nicely with the cognitive blind spot of trusting one’s intuition. Some call this “gut” feel. The label is less important in the confusion of software with reality.
But what happens when the new Google Pixel 8 camera enhances an image automatically. Who wants a lousy snap? Google appears to favor a Mother Google approach. When an image is manipulated either in a still or video, what does one’s gut say, “I trust pictures and videos for accuracy.” Like the young would be and off-the-rails chatbot lover, zeros and ones can create some interesting effects.
What about you, gentle reader? Do you know how to recognize an unhealthy interaction with smart software? Can you determine if an image is “real” or the fabrication of a large outfit like Google?
Stephen E Arnold, October 9, 2023