AI Risk: Are We Watching Where We Are Going?
December 27, 2023
This essay is the work of a dumb dinobaby. No smart software required.
To brighten your New Year, navigate to “Why We Need to Fear the Risk of AI Model Collapse.” I love those words: Fear, risk, and collapse. I noted this passage in the write up:
When an AI lives off a diet of AI-flavored content, the quality and diversity is likely to decrease over time.
I think the idea of marrying one’s first cousin or training an AI model on AI-generated content is a bad idea. I don’t really know, but I find the idea interesting. The write up continues:
Is this model at risk of encountering a problem? Looks like it to me. Thanks, MSFT Copilot. Good enough. Falling off the I beam was a non-starter, so we have a more tame cartoon.
Model collapse happens when generative AI becomes unstable, wholly unreliable or simply ceases to function. This occurs when generative models are trained on AI-generated content – or “synthetic data” – instead of human-generated data. As time goes on, “models begin to lose information about the less common but still important aspects of the data, producing less diverse outputs.”
I think this passage echoes some of my team’s thoughts about the SAIL Snorkel method. Googzilla needs a snorkel when it does data dives in some situations. The company often deletes data until a legal proceeding reveals what’s under the company’s expensive, smooth, sleek, true blue, gold trimmed kimonos
The write up continues:
There have already been discussions and research on perceived problems with ChatGPT, particularly how its ability to write code may be getting worse rather than better. This could be down to the fact that the AI is trained on data from sources such as Stack Overflow, and users have been contributing to the programming forum using answers sourced in ChatGPT. Stack Overflow has now banned using generative AIs in questions and answers on its site.
The essay explains a couple of ways to remediate the problem. (I like fairy tales.) The first is to use data that comes from “reliable sources.” What’s the definition of reliable? Yeah, problem. Second, the smart software companies have to reveal what data were used to train a model. Yeah, techno feudalists totally embrace transparency. And, third, “ablate” or “remove” “particular data” from a model. Yeah, who defines “bad” or “particular” data. How about the techno feudalists, their contractors, or their former employees.
For now, let’s just use our mobile phone to access MSFT Copilot and fix our attention on the screen. What’s to worry about? The person in the cartoon put the humanoid form in the apparently risky and possibly dumb position. What could go wrong?
Stephen E Arnold, December 27, 2023