How Does One Train Smart Software?
June 8, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
It is awesome when geekery collides with the real world, such as the development of AI. These geekery hints prove that fans are everywhere and the influence of fictional worlds leave a lasting impact. Usually these hints are naming a new discovery after a favorite character or franchise, but it might not be good for copyrighted books beloved by geeks everywhere. The New Scientist reports that “ChatGPT Seems To Be Trained On Copyrighted Books Like Harry Potter.”
In order to train AI models, AI developers need large language models or datasets. Datasets can range from information on social media platforms to shopping databases like Amazon. The problem with ChatGPT is that it appears its developers at OpenAI used copyrighted books as language models. If OpenAI used copyrighted materials it brings into question if the datasets were legality created.
Associate Professor David Bamman of the University of California, Berkley campus, and his team studied ChatGPT. They hypothesized that OpenAI used copyrighted material. Using 600 fiction books from 1924-2020, Bamman and his team selected 100 passages from each book that ha a single, named character. The name was blanked out of the passages, then ChatGPT was asked to fill them. ChatGPT had a 98% accuracy rate with books ranging from J.K. Rowling, Ray Bradbury, Lewis Carroll, and George R.R. Martin.
If ChatGPT is only being trained from these books, does it violate copyright?
“ ‘The legal issues are a bit complicated,’ says Andres Guadamuz at the University of Sussex, UK. ‘OpenAI is training GPT with online works that can include large numbers of legitimate quotes from all over the internet, as well as possible pirated copies.’ But these AIs don’t produce an exact duplicate of a text in the same way as a photocopier, which is a clearer example of copyright infringement. ‘ChatGPT can recite parts of a book because it has seen it thousands of times,’ says Guadamuz. ‘The model consists of statistical frequency of words. It’s not reproduction in the copyright sense.’”
Individual countries will need to determine dataset rules, but it is preferential to notify authors their material is being used. Fiascos are already happening with stolen AI generated art.
ChatGPT was mostly trained on science fiction novels, while it did not read fiction from minority authors like Toni Morrison. Bamman said ChatGPT is lacking representation. That his one way to describe the datasets, but it more likely pertains to the human AI developers reading tastes. I assume there was little interest in books about ethics, moral behavior, and the old-fashioned William James’s view of right and wrong. I think I assume correctly.
Whitney Grace, June 8, 2023
Software Cannot Process Numbers Derived from Getty Pix, Honks Getty Legal Eagle
June 6, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
I read “Getty Asks London Court to Stop UK Sales of Stability AI System.” The write up comes from a service which, like Google, bandies about the word trust with considerable confidence. The main idea is that software is processing images available in the form of Web content, converting these to numbers, and using the zeros and ones to create pictures.
The write up states:
The Seattle-based company [Getty] accuses the company of breaching its copyright by using its images to “train” its Stable Diffusion system, according to the filing dated May 12, [2023].
I found this statement in the trusted write up fascinating:
Getty is seeking as-yet unspecified damages. It is also asking the High Court to order Stability AI to hand over or destroy all versions of Stable Diffusion that may infringe Getty’s intellectual property rights.
When I read this, I wonder if the scribes upon learning about the threat Gutenberg’s printing press represented were experiencing their “Getty moment.” The advanced technology of the adapted olive press and hand carved wooden letters meant that the quill pen champions had to adapt or find their future emptying garderobes (aka chamber pots).
Scribes prepare to throw a Gutenberg printing press and the evil innovator Gutenberg in the Rhine River. Image was produced by the evil incarnate code of MidJourney. Getty is not impressed like letters on paper with the outputs of Beelzebub-inspired innovations.
How did that rebellion against technology work out? Yeah. Disruption.
What happens if the legal system in the UK and possibly the US jump on the no innovation train? Japan’s decision points to one option: Using what’s on the Web is just fine. And China? Yep, those folks in the Middle Kingdom will definitely conform to the UK and maybe US rules and regulations. What about outposts of innovation in Armenia? Johnnies on the spot (not pot, please). But what about those computer science students at Cambridge University? Jail and fines are too good for them. To the gibbet.
Stephen E Arnold, June 6, 2023
The Google AI Way: EEAT or Video Injection?
June 5, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Over the weekend, I spotted a couple of signals from the Google marketing factory. The first is the cheerleading by that great champion of objective search results, Danny Sullivan who wrote with Chris Nelson “Rewarding High Quality Content, However, It Is Produced.” The authors pointed out that their essay is on behalf of the Google Search Quality team. This “team” speaks loudly to me when we run test queries on Google.com. Once in a while — not often, mind you — a relevant result will appear in the first page or two of results.
The subject of this essay by Messrs.Sullivan and Nelson is EEAT. My research team and I think that the fascinating acronym is pronounced like to word “eat” in the sense of ingesting gummy cannabinoids. (One hopes these are not the prohibited compounds such as Delta-9 THC.) The idea is to pop something in your mouth and chew. As the compound (fact and fiction, GPT generated content and factoids) dissolve and make their way into one’s system, the psychoactive reaction is greater perceived dependence on the Google products. You may not agree, but that’s how I interpret the essay.
So what’s EEAT? I am not sure my team and I are getting with the Google script. The correct and Googley answer is:
Expertise, experience, authoritativeness, and trustworthiness.
The write up says:
Focusing on rewarding quality content has been core to Google since we began. It continues today, including through our ranking systems designed to surface reliable information and our helpful content system. The helpful content system was introduced last year to better ensure those searching get content created primarily for people, rather than for search ranking purposes.
I wonder if this text has been incorporated in the Sundar and Prabhakar Comedy Show? I would suggest that it replace the words about meeting users’ needs.
The meat of the synthetic turkey burger strikes me as:
it’s important to recognize that not all use of automation, including AI generation, is spam. Automation has long been used to generate helpful content, such as sports scores, weather forecasts, and transcripts. AI has the ability to power new levels of expression and creativity, and to serve as a critical tool to help people create great content for the web.
Synthetic or manufactured information, content objects, data, and other outputs are okay with us. We’re Google, of course, and we are equipped with expertise, experience, authoritativeness, and trustworthiness to decide what is quality and what is not.
I can almost visualize a T shirt with the phrase “EEAT It” silkscreened on the back with a cheerful Google logo on the front. Catchy. EEAT It. I want one. Perhaps a pop tune can be sampled and used to generate a synthetic song similar to Michael Jackson’s “Beat It”? Google AI would dodge the Weird Al Yankovic version of the 1983 hit. Google’s version might include the refrain:
Just EEAT it (EEAT it, EEAT it, EEAT it)
EEAT it (EEAT it, EEAT it, ha, ha, ha, ha)
EEAT it (EEAT it, EEAT it)
EEAT it (EEAT it, EEAT it)
If chowing down on this Google information is not to your liking, one can get with the Google program via a direct video injection. Google has been publicizing its free video training program from India to LinkedIn (a Microsoft property to give the social media service its due). Navigate to “Master Generative AI for Free from Google’s Courses.” The free, free courses are obviously advertisements for the Google way of smart software. Remember the key sequence: Expertise, experience, authoritativeness, and trustworthiness.
The courses are:
- Introduction to Generative AI
- Introduction to Large Language Models
- Attention Mechanism
- Transformer Models and BERT Model
- Introduction to Image Generation
- Create Image Captioning Models
- Encoder-Decoder Architecture
- Introduction to Responsible AI (remember the phrase “Expertise, experience, authoritativeness, and trustworthiness.”)
- Introduction to Generative AI Studio
- Generative AI Explorer (Vertex AI).
Why is Google offering free infomercials about its approach to AI?
The cited article answers the question this way:
By 2030, experts anticipate the generative AI market to reach an impressive $109.3 billion, signifying a promising outlook that is captivating investors across the board. [Emphasis added.]
How will Microsoft respond to the EEAT It positioning?
Just EEAT it (EEAT it, EEAT it, EEAT it)
EEAT it (EEAT it, EEAT it, ha, ha, ha, ha)
EEAT it (EEAT it, EEAT it)
EEAT it (EEAT it, EEAT it)
Stephen E Arnold, June 5, 2023
Smart Software and a Re-Run of Paradise Lost Joined Progress
June 5, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
I picked up two non-so-faint and definitely not-encrypted signals about the goals of Google and Microsoft for smart software.
Which company will emerge as the one true force in smart software? MidJourney did not pick a winner, just what the top dog will wear to the next quarterly sales report delivered via a neutral Zoom call.
Navigate to the visually thrilling podcast hosted by Lex Fridman, an American MIT wizard. He interviewed the voluble Google wizard Chris Lattner. The subject was the Future of Programming and AI. After listening to the interview, I concluded the following:
- Google wants to define and control the “meta” framework for artificial intelligence. What’s this mean? Think a digital version of a happy family: Vishnu, Brahma, and Shiva, among others.
- Google has an advantage when it comes to doing smart software because its humanoids have learned what works, what to do, and how to do certain things.
- The complexity of Google’s multi-pronged smart software methods, its home-brew programming languages, and its proprietary hardware are nothing more than innovation. Simple? Innovation means no one outside of the Google AI cortex can possibly duplicate, understand, or outperform Googzilla.
- Google has money and will continue to spend it to deliver the Vishnu, Brahma, and Shiva experience in my interpretation of programmer speak.
How’s that sound? I assume that the fruit fly start ups are going to ignore the vibrations emitted from Chris Lattner, the voluble Chris Lattner, I want to emphasize. But like those short-lived Diptera, one can derive some insights from the efforts of less well-informed, dependent, and less-well-funded lab experiments.
Okay, that’s signal number one.
Signal number two appears in “Microsoft Signs Deal for AI Computing Power with Nvidia-Backed CoreWeave That Could Be Worth Billions.” This “real news” story asserts:
… Microsoft has agreed to spend potentially billions of dollars over multiple years on cloud computing infrastructure from startup CoreWeave …
CoreWeave? Yep, the company “sells simplified access to Nvidia’s graphics processing units, or GPUs, which are considered the best available on the market for running AI models.” By the way, nVidia has invested in this outfit. What’s this signal mean to me? Here are the flickering lines on my oscilloscope:
- Microsoft wants to put smart software into its widely-used enterprise applications in order to make the one true religion of smart software. The idea, of course, is to pass the collection plate and convert dead dog software into racing greyhounds.
- Microsoft has an advantage because when an MBA does calculations and probably letters to significant others, Excel is the go-to solution. Some people create art in Excel and then sell it. MBAs just get spreadsheet fever and do leveraged buyouts. With smart software the Microsoft alleged monopoly does the billing.
- The wild and wonderful world of Azure is going to become smarter because… well, Microsoft does smart things. Imagine the demand for training courses, certification for Microsoft engineers, and how-to YouTube videos.
- Microsoft has money and will continue to achieve compulsory attendance at the Church of Redmond.
Net net: Two titans will compete. I am thinking about the battle between the John Milton’s protagonist and antagonist in “Paradise Lost.” This will be fun to watch whilst eating chicken korma.
Stephen E Arnold, June 5, 2023
AI Allegedly Doing Its Thing: Let Fake News Fly Free
June 2, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
I cannot resist this short item about the smart software. Stories has appeared in my newsfeeds about AI which allegedly concluded that to complete its mission, it had to remove an obstacle — the human operator.
A number of news sources reported as actual factual that a human operator of a smart weapon system was annoying the smart software. The smart software decided that the humanoid was causing a mission to fail. The smart software concluded that the humanoid had to be killed so the smart software could go kill more humanoids.
I collect examples of thought provoking fake news. It’s my new hobby and provides useful material for my “OSINT Blindspots” lectures. (The next big one will be in October 2023 after I return from Europe in late September 2023.)
However, the write up “US Air Force Denies AI Drone Attacked Operator in Test” presents a different angle on the story about evil software. I noted this passage from an informed observer:
Steve Wright, professor of aerospace engineering at the University of the West of England, and an expert in unmanned aerial vehicles, told me jokingly that he had “always been a fan of the Terminator films” when I asked him for his thoughts about the story. “In aircraft control computers there are two things to worry about: ‘do the right thing’ and ‘don’t do the wrong thing’, so this is a classic example of the second,” he said. “In reality we address this by always including a second computer that has been programmed using old-style techniques, and this can pull the plug as soon as the first one does something strange.”
Now the question: Did smart software do the right thing. Did it go after its humanoid partner? In a hypothetical discussion perhaps? In real life, nope. My hunch is that the US Air Force anecdote is anchored in confusing “what if” thinking with reality. That’s easy for some younger than me to do in my experience.
I want to point out that in August 2020, a Heron Systems’ AI (based on Google technology) killed an Air Force “top gun” in a simulated aerial dog fight. How long did it take the smart software to neutralize the annoying humanoid? About a minute, maybe a minute and a half. See this Janes new item for more information.
My view is that smart software has some interesting capabilities. One scenario of interest to me is a hacked AI-infused weapons system? Pondering this idea opens the door some some intriguing “what if” scenarios.
Stephen E Arnold, June 2, 2023
The Prospects for Prompt Engineers: English Majors, Rejoice
June 2, 2023
I noted some good news for English majors. I suppose some history and political science types may be twitching with constrained jubilation too.
Navigate to “9 in 10 Companies That Are Currently Hiring Want Workers with ChatGPT Experience.” The write up contains quite a number of factoids. (Are these statistically valid? I believe everything I read on the Internet with statistical data, don’t you.) Well, true or not, I found these statements interesting:
- 91 percent of the companies in a human resourcey survey want workers with ChatGPT experience. What does “experience” mean? The write up does not deign to elucidate. The question about how to optimize phishing email counts.
- 75 percent of those surveyed will fire people who are declared redundant, annoying, or too expensive to pay.
- 30 percent of those in the sample say that hiring a humanoid with ChatGPT experience is “urgent.” Why not root around in the reason for this urgency? Oh, right. That’s research work.
- 66 percent of the respondents perceive that ChatGPT will deliver a “competitive edge.” What about the link to cost reduction? Oh, I forgot. That’s additional research work.
What work functions will get to say, “Hello” to smart software? The report summary identifies six job categories:
- Software engineering
- Customer service
- Human resources
- Marketing
- Data entry
- Sale
- Finance
For parents with a 22 to 40 year old working in one of these jobs, my suggestion is to get that spare bedroom ready. The progeny may return to the nest.
Stephen E Arnold, June 2, 2023
Does Jugalbandi Mean De-casting?
June 1, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
I read “Microsoft Launches Jugalbandi: An AI Powered Platform and Chatbot to Bridge Information Gap in India.” India connotes for me spicy food and the caste system. My understanding of this term comes from Wikipedia which says:
The caste system in India is the The caste system in India is the paradigmatic ethnographic instance of social classification based on castes. It has its origins in ancient India, and was transformed by various ruling elites in medieval, early-modern, and modern India, especially the Mughal Empire and the British Raj.
Like me, the Wikipedia can be incorrect, one-sided, and PR-ish.
The Jugalbandi write up contains some interesting statements which I interpret against my understanding of the Wikipedia article about castes in India. Here’s one example:
Microsoft, a pioneer in the artificial intelligence (AI) field, has made significant strides with its latest venture, Jugalbandi. This generative AI-driven platform and chatbot aim to revolutionize access to information about government initiatives and public programs in India. With nearly 22 official languages and considerable linguistic variations in the country, Jugalbandi seeks to address the challenges in disseminating information effectively.
I wonder if Microsoft’s pioneering smart software (based largely upon the less than open and often confused OpenAI technology) will do much to “address the challenges in disseminating information effectively.”
Wikipedia points out:
In 1948, negative discrimination on the basis of caste was banned by law and further enshrined in the Indian constitution in 1950; however, the system continues to be practiced in parts of India. There are 3,000 castes and 25,000 sub-castes in India, each related to a specific occupation.
If law and every day behavior have not mitigated castes and how these form fences in India and India outposts in London and Silicon Valley, exactly what will Microsoft (the pioneer in AI) accomplish?
My hunch the write up enshrines:
- The image of Microsoft as the champion of knocking down barriers and allowing communication to flow. (Why does smart Bing block certain queries?)
- Microsoft’s self-professed role as a “pioneer” in smart software. I think a pioneer in clever Davos messaging is closer to the truth.
- The OnMSFT.com’s word salad about something that may be quite difficult to accomplish in many social, business, and cultural settings.
Who created the concept of untouchables?
Stephen E Arnold, June 1, 2023
MBAs and Advisors, Is Your Nuclear Winter Looming?
May 31, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Big time, blue chip consulting firms are quite competent in three areas: [1] Sparking divorces because those who want money marry the firm, [2] Ingesting legions of MBAs to advise clients who are well compensated but insecure, and [3] Finding ways to cuts costs and pay the highly productive partners more money. I assume some will disagree, but that’s what kills horses at the Kentucky Derby.
I read but did not think twice about believing every single word in “Amid Mass Layoff, Accenture Identifies 300+ Generative AI Use Cases.” My first mental reaction was this question, “Just 300?”
The write up points out:
Accenture has identified five broad areas where generative AI can be implemented – advising, creating, automation, software creation and protection. The company is also working with a multinational bank to use generative AI to route large numbers of post-trade processing emails and draft responses with recommended actions to reduce manual effort and risk.
With fast food joints replacing humans with robots, what’s an MBA to do? The article does not identify employment opportunities for those who will be replaced with zeros and ones. As a former blue chip worker bee, I would suggest to anyone laboring in the intellectual vineyards to consider a career as an influencer.
Who will get hired and make big bucks at the Bains, the BCGs, the Boozers, and the McKinseys, et al? Here’s my short list:
- MBAs or people admitted to a fancy university with super connections. If one’s mom or dad was an ambassador or frequents parties drooled upon by Town & Country Magazine, you may be in the game.
- Individuals even if they worked at low rent used car lots who can sell big buck projects. The future at the blue chips is bright indeed.
- Individuals who are pals with highly regarded partners.
What about the quality of the work produced by the smart software? That is a good question. The idea is to make the client happy and sell follow on work. The initial work product may be reviewed by a partner or maybe not. The proof of the pudding are the revenue, costs, and profit figures.
That influencer opportunity looks pretty good, doesn’t it? I think snow is falling. Grab a Ralph Lauren Purple Label before you fire up that video camera.
Stephen E Arnold, May 31, 2023
Finally, an Amusing Analysis of AI
May 31, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Intentionally amusing or not, I found “ChatGPT Is Basically a Gen X’er Who Stopped Reading in 12th Grade” a hoot. The write up develops its thesis this way:
Turns out our soon-to-be AI Overlord, ChatGPT, has a worldview based in the 19th-century canon, Gen X sci-fi favorites, and the social dynamics at Hogwart’s School For Lil Magicians.
The essay then cites the estimable Business Insider (noted for its subscribe to read this most okay article approach to raising money) and its report about a data scientist who figured out what books ChatGPT has ingested. The list is interesting because it reflects how texts which most of today’s online users would find quaint, racist, irrelevant, or mildly titillating. Who doesn’t need to know about sensitive vampires?
So what’s funny?
First, the write up is similar to outputs from smart software: Recycled information and generic comments.
Second, the reading material fed into ChatGPT by more unnamed smart software experts.
I wonder if the Sundar & Prabhakar Comedy Act will integrate this type of material into their explanation about the great things which will emerge from the Google.
Stephen E Arnold, May 31, 2023
Stop Smart Software! A Petition to Save the World! Signed by 350 Humans!
May 30, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
A “real” journalist (Kevin Roose), who was told to divorce his significant other for a chat bot published the calming, measured, non-clickbait story “AI Poses Risk of Extinction, Industry Leaders Warn.” What’s ahead for the forest fire of smart software activity? The headline explains a “risk of extinction.” What no screenshot of a Terminator robot saying”:
The strength of the human heart. The difference between us and machines. [Uplifting music]
Sadly, no.
Write up reports:
Eventually, some believe, A.I. could become powerful enough that it could create societal-scale disruptions within a few years if nothing is done to slow it down, though researchers sometimes stop short of explaining how that would happen. These fears are shared by numerous industry leaders, putting them in the unusual position of arguing that a technology they are building — and, in many cases, are furiously racing to build faster than their competitors — poses grave risks and should be regulated more tightly.
Isn’t the Gray Lady amplifying fear, uncertainty, and doubt? Didn’t IBM pay sales engineers to spread the FUD?
Enough. AI is bad. Stop those who refined the math and numerical recipes. Pass laws to regulate the AI technology. Act now. Save humanity. Several observations:
- The credibility of technologists who “develop” functions and then beg for rules is disingenuous. The idea is to practice self-control and judgment before inviting Mr. Hyde to brunch.
- With smart software chock full of “unknown unknowns”, how exactly are elected officials supposed to regulate a diffusing and enabling technology? Appealing to US and EU officials omits common sense in my opinion.
- The “fix” for the AI craziness may be emulating the Chinese approach: Do what the CCP wants or be reeducated. What a nation state can d with smart software is indeed a something to consider. But China has taken action and will move forward with militarization no matter what the US and EU do.
Silicon Valley type innovation has created a “myth of excellence.” One need look at the consequences of social media to see the consequences of high school science club decision making. Now a handful of individuals with the Silicon Valley DNA want external forces to reign in their money making experiments and personal theme parks. Sorry, folks. Internal control, ethical behavior, and integrity provide that to mature individuals.
A sheet of paper with “rules” and “regulations” is a bit late to the Silicon Valley game. And the Gray Lady? Chasing clicks in my opinion.
Stephen E Arnold, May 30, 2023