AI Tools That Make Cheating…Err… Research Easier
June 22, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Homework has been the bane of students since the inception of school. Students have dreamt about ways to make homework easier, either with the intervention of divine beings or a homework-finishing robot. While the gods of various religions have never concerned themselves with homework, ingenious minds have tackled the robot idea with artificial intelligence. While AI cannot succinctly write a decent essay, Euro News shares the next generation of tools that will make homework easier: “The Best AI Tools To Power Your Academic Research.”
This young lady is not cheating. She is using her mobile phone to look up facts using Bard and ChatGPT. With the information in hand, she will interact with each system to obtain the required 500 words for her US history essay about ethics and Spiro Agnew. She is not cheating. She is researching. The image emerged from the highly original MidJourney system, which never cheats it users. But what does it do with those inputs?
OpenAI’s ChatGPT tool, a generative AI that creates and writes text, has thrown academic for a loop. ChatGPT is the first AI that can “write” a cohesive essay and can answer simple questions better than a search engine. Academics are worried it ruin the integrity of education, but others believe ChatGPT and other AI tools will democratize information.
Postdoctoral researcher Mushtaq Bilal, based at the University of Southern Denmark, believes ChatGPT is a wonderful invention. He explains that ChatGPT cannot produce a full journal article that contains truthful information, peer-reviewed, and well-cited. With incremental prompting, Bilal says the AI tool can generate ideas that resemble a conversation with an ivy league professor. Bilal proposes to use ChatGPT as a brainstorming tool. For example, he used it to create an article outline and he fact checked the information.
Bilal recommends scholars use other AI tools, such as Consensus. Consensus is an AI-driven search engine that answers questions and provides citations. Elicit.org is similar, except it is an AI research assistant and its database s based purely on research. Scite.ai provides fact based citations based on search queries. Research Rabbit fast tracks research similar to how Spotify recommends music. It learns researchers interests and recommends new information based on them. ChatPDF allows users to upload papers, then they can ask the AI questions or summarize the information.
Homework has not seen a revolution this huge since the implementation of the Internet.
“ ‘The development of AI will be as fundamental “as the creation of the microprocessor, the personal computer, the Internet, and the mobile phone,’ wrote Bill Gates in the latest post on his personal blog, titled ‘The Age of AI Has Begun’. ‘Computers haven’t had the effect on education that many of us in the industry have hoped,’ he wrote. ‘But I think in the next five to 10 years, AI-driven software will finally deliver on the promise of revolutionizing the way people teach and learn’.
In other words, homework be much easier to complete and these new tools will make learning better. Students will also cleverly discover new ways to manipulate the tools to cheat just as they have been for centuries.
Whitney Grace, June 22, 2023
Many Regulators, Many Countries Cannot Figure Out How to Regulate AI
June 21, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
American and European technology and trade leaders met in Sweden for the Trade and Tech Council (TTC) summit. They met at the beginning of June to discuss their sector’s future. One of the main talking points was how to control AI. The one thing all the leaders agreed on was that they could not agree on anything. Politico tells more about the story in: “The Struggle To Control AI.”
The main AI topic international leaders discussed was generative AI, such as Google’s Bard and ChatGPT from OpenAI, and its influence on humanity. The potential for generative AI is limitless, but there are worries that it poses threats to global security and would ruin the job market. The leaders want to prove to the world that democratic governments advances as quickly as technology advances.
A group of regulators discuss regulating AI. The regulators are enjoying a largely unregulated lunch of fast good stuffed with chemicals. Some of these have interesting consequences. One regulator says, “Pass the salt.” Another says, “What about AI and ML?” A third says, “Are those toppings?” The scene was generated by the copyright maven MidJourney.
Leaders from Europe and the United States are anxious to make laws that regulate how AI works in conjunction with society. The TTC’s goal is to develop non-binding standards about AI transparency, risk audits, and technical details. The non-binding standards would police AI so it does not destroy humanity and the planet. The plan is to present the standards at the G7 in Fall 2023.
Europe and the United States need to agree on the standards, except they are not-so that leaves room for China to promote its authoritarian version of AI. The European Union has written the majority of the digital rulebook that Western societies follows. The US has other ideas:
“The U.S., on the other hand, prefers a more hands-off approach, relying on industry to come up with its own safeguards. Ongoing political divisions within Congress make it unlikely any AI-specific legislation will be passed before next year’s U.S. election. The Biden administration has made international collaboration on AI a policy priority, especially because a majority of the leading AI companies like Google, Microsoft and OpenAI, are headquartered in the U.S. For Washington, helping these companies compete against China’s rivals is also a national security priority.”
The European Union wants to do things one way, the United States has other ideas. It is all about talking heads speaking legalese mixed with ethics, while China is pushing its own agenda.
Whitney Grace, June 21, 2023
The Famous Google Paper about Attention, a Code Word for Transformer Methods
June 20, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Wow, many people are excited a Bloomberg article called “The AI Boom Has Silicon Valley on Another Manic Quest to Change the World: A Guide to the New AI Technologies, Evangelists, Skeptics and Everyone Else Caught Up in the Flood of Cash and Enthusiasm Reshaping the Industry.”
In the tweets and LinkedIn posts one small factoid is omitted from the second hand content. If you want to read the famous DeepMind-centric paper which doomed the Google Brain folks to watch their future from the cheap seats, you can find “Attention Is All You Need”, branded with the imprimatur of the Neural Information Processing Systems Conference held in 2017. Here’s the link to the paper.
For those who read the paper, I would like to suggest several questions to consider:
- What economic gain does Google derive from proliferation of its transformer system and method; for example, the open sourcing of the code?
- What does “attention” mean for [a] the cost of training and [b] the ability to steer the system and method? (Please, consider the question from the point of view of the user’s attention, the system and method’s attention, and a third-party meta-monitoring system such as advertising.)
- What other tasks of humans, software, and systems can benefit from the user of the Transformer system and methods?
I am okay with excitement for a 2017 paper, but including a link to the foundation document might be helpful to some, not many, but some.
Net net: Think about Google’s use of the word “trust” and “responsibility” when you answer the three suggested questions.
Stephen E Arnold, June 20, 2023
Google: Smart Software Confusion
June 19, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
I cannot understand. Not only am I old; I am a dinobaby. Furthermore, I am like one of William James’s straw men: Easy to knock down or set on fire. Bear with me this morning.
I read “Google Skeptical of AI: Google Doesn’t Trust Its Own AI Chatbots, Asks Employees Not to Use Bard.” The write up asserts as “real” information:
It seems that Google doesn’t trust any AI chatbot, including its own Bard AI bot. In an update to its security measures, Alphabet Inc., Google’s parent company has asked its employees to keep sensitive data away from public AI chatbots, including their own Bard AI.
The go-to word for the Google in the last few weeks is “trust.” The quote points out that Google doesn’t “trust” its own smart software. Does this mean that Google does not “trust” that which it created and is making available to its “users”?
MidJourney, an interesting but possibly insecure and secret-filled smart software system, generated this image of Googzilla as a gatekeeper. Are gatekeepers in place to make money, control who does what, and record the comings and goings of people, data, and content objects?
As I said, I am a dinobaby, and I think I am dumb. I don’t follow the circular reasoning; for example:
Google is worried that human reviewers may have access to the chat logs that these chatbots generate. AI developers often use this data to train their LLMs more, which poses a risk of data leaks.
Now the ante has gone up. The issue is one of protecting itself from its own software. Furthermore, if the statement is accurate, I take the words to mean that Google’s Mandiant-infused, super duper, security trooper cannot protect Google from itself.
Can my interpretation be correct? I hope not.
Then I read “This Google Leader Says ML Infrastructure Is Conduit to Company’s AI Success.” The “this” refers to an entity called Nadav Eiron, a Stanford PhD and Googley wizard. The use of the word “conduit” baffles me because I thought “conduit” was a noun, not a verb. That goes to support my contention that I am a dumb humanoid.
Now let’s look at the text of this write up about Google’s smart software. I noted this passage:
The journey from a great idea to a great product is very, very long and complicated. It’s especially complicated and expensive when it’s not one product but like 25, or however many were announced that Google I/O. And with the complexity that comes with doing all that in a way that’s scalable, responsible, sustainable and maintainable.
I recall someone telling me when I worked at a Fancy Dan blue chip consulting firm, “Stephen, two objectives are zero objectives.” Obviously Google is orders of magnitude more capable than the bozos at the consulting company. Google can do 25 objectives. Impressive.
I noted this statement:
we created the OpenXLA [an open-source ML compiler ecosystem co-developed by AI/ML industry leaders to compile and optimize models from all leading ML frameworks] because the interface into the compiler in the middle is something that would benefit everybody if it’s commoditized and standardized.
I think this means that Google wants to be the gatekeeper or man in the middle.
Now let’s consider the first article cited. Google does not want its employees to use smart software because it cannot be trusted.
Is it logical to conclude that Google and its partners should use software which is not trusted? Should Google and its partners not use smart software because it is not secure? Given these constraints, how does Google make advances in smart software?
My perception is:
- Google is not sure what to do
- Google wants to position its untrusted and insecure software as the industry standard
- Google wants to preserve its position in a workflow to maximize its profit and influence in markets.
You may not agree. But when articles present messages which are alarming and clearly focused on market control, I turn my skeptic control knob. By the way, the headline should be “Google’s Nadav Eiron Says Machine Learning Infrastructure Is a Conduit to Facilitate Google’s Control of Smart Software.”
Stephen E Arnold, June 19, 2023
The Value of AI and the De-Valuing of Humanoids: Long Lines for Food Stamps Ahead?
June 16, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
AI, AI, AI-Yai-Ai. That could be a country western lyric. Maybe it is? I am not a fan of Grand Old Opry-type entertainment. I do enjoy what I call “Dark AI humor.” If the flow of amusing crAIziness continues, could it become a staple of comedy shows on Tubi or Pluto?
How many people live (theoretically) in the United States? The answer, according to an unimpeachable source, is 336,713,783. I love the precision of smart search software.
Consider the factoid in “300 Million Jobs Will Be Replaced, Diminished by Artificial Intelligence, Report Warns.” If we assume the population of the US is 337 million (sorry You.com), this works out to a trivial 37 million people who will have been promoted by smart software to the “Get Paycheck” social class. I may be overstating the “Paycheck Class,” but this is AI land, so numbers are fuzzified because you know… probability.
The write up points out:
Using data on occupational tasks in both the US and Europe, we find that roughly two-thirds of current jobs are exposed to some degree of AI automation, and that generative AI could substitute up to one-fourth of current work.
Disruption rocks on.
Now consider the information in “People Who Work with AI Are More Likely to Be Lonely, Suffer from Insomnia and Booze after Work, Study Finds.” The write up asserts:
Analysis revealed employees who interacted more frequently with AI systems were more likely to experience loneliness, insomnia and increased after-work alcohol consumption. But they also found these employees were more likely to offer to help their coworkers – a response that may be triggered by the need for social contact, the team said. Other experiments in the US, Indonesia and Malaysia, involving property management companies and a tech company, yielded similar results.
Let’s assume both articles contain actual factual information. Imagine managing a group of individuals in the top tier. Now think about those who are in the lower tier. Any fancy management ideas? I have none.
Exciting for sure.
Stephen E Arnold, June 16, 2023
AI and Non-State Actors
June 16, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
“AI Weapons Need a Safe Back Door for Human Control” contains a couple of interesting statements.
The first is a quote from Hugh Durrant-Whyte, director of the Centre for Translational Data Science at the University of Sydney. He allegedly said:
China is investing arguably twice as much as everyone else put together. We need to recognize that it genuinely has gone to town. If you look at the payments, if you look at the number of publications, if you look at the companies that are involved, it is quite significant. And yet, it’s important to point out that the US is still dominant in this area.
For me, the important point is the investment gap. Perhaps the US should be more aggressive in its identifying and funding promising smart software companies?
The second statement which caught my attention was:
James Black, assistant director of defense and security research group RAND Europe, warned that non-state actors could lead in the proliferation of AI-enhanced weapons systems. “A lot of stuff is very much going to be difficult to control from a non-proliferation perspective, due to its inherent software-based nature. A lot of our export controls and non-proliferation regimes that exist are very much focused on old-school traditional hardware…
Several observations:
- Smart software ups the ante in modern warfare, intelligence, and law enforcement activities
- The smart software technology has been released into the wild. As a result, bad actors have access to advanced tools
- The investment gap is important but the need for skilled smart software engineers, mathematicians, and support personnel is critical in the US. University research departments are, in my opinion, less and less productive. The concentration of research in the hands of a few large publicly traded companies suggests that military, intelligence, and law enforcement priorities will be ignored.
Net net: China, personnel, and institution biases require attention by senior officials. These issues are not fooling around with Twitter scale. More is at stake. Urgent action is needed, which may be uncomfortable for fans of TikTok and expensive dinners in Washington, DC.
Stephen E Arnold, June 16, 2023
Is Smart Software Above Navel Gazing: Nope, and It Does Not Care
June 15, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Synthetic data. Statistical smoothing. Recursive methods. When we presented our lecture “OSINT Blindspots” at the 2023 National Cyber Crime Conference, the audience perked up. The terms might have been familiar, but our framing caught the more than 100 investigators’ attention. The problem my son (Erik) and I described was butt simple: Faked data will derail a prosecution if an expert witness explains that machine-generated output may be wrong.
We provided some examples, ranging from a respected executive who obfuscates his “real” business from a red-herring business. We profiled how information about a fervid Christian adherence to God’s precepts overshadowed a Ponzi scheme. We explained how an American living in Eastern Europe openly flaunts social norms in order to distract authorities from an encrypted email business set up to allow easy, seamless communication for interesting people. And we included more examples.
An executive at a big time artificial intelligence firm looks over his domain and asks himself, “How long will it take for the boobs and boobettes to figure out that our smart software is wonky?” The illustration was spit out by the clever bits and bytes at MidJourney.
What’s the point in this blog post? Who cares besides analysts, lawyers, and investigators who have to winnow facts which are verifiable from shadow or ghost information activities?
It turns out that a handful of academics seem to have an interest in information manipulation. Their angle of vision is broader than my team’s. We focus on enforcement; the academics focus on tenure or getting grants. That’s okay. Different points of view lead to interesting conclusions.
Consider this academic and probably tough to figure out illustration from “The Curse of Recursion: Training on Generated Data Makes Models Forget”:
A less turgid summary of the researchers’ findings appears at this location.
The main idea is that gee-whiz methods like Snorkel and small language models have an interesting “feature.” They forget; that is, as these models ingest fake data they drift, get lost, or go off the rails. Synthetic cloth, unlike natural cotton T shirts, look like shirts. But on a hot day, those super duper modern fabrics can cause a person to perspire and probably emit unusual odors.
The authors introduce and explain “model collapse.” I am no academic. My interpretation of the glorious academic prose is that the numerical recipes, systems, and methods don’t work like the nifty demonstrations. In fact, over time, the models degrade. The hapless humanoids who are dependent on these lack the means to figure out what’s on point and what’s incorrect. The danger, obviously, is that clueless and lazy users of smart software make more mistakes in judgment than a person might otherwise reach.
The paper includes fancy mathematics and more charts which do not exactly deliver on the promise of a picture is worth a thousand words. Let me highlight one statement from the journal article:
Our evaluation suggests a “first mover advantage” when it comes to training models such as LLMs. In our work we demonstrate that training on samples from another generative model can induce a distribution shift, which over time causes Model Collapse. This in turn causes the model to mis-perceive the underlying learning task. To make sure that learning is sustained over a long time period, one needs to make sure that access to the original data source is preserved and that additional data not generated by LLMs remain available over time. The need to distinguish data generated by LLMs from other data raises questions around the provenance of content that is crawled from the Internet: it is unclear how content generated by LLMs can be tracked at scale. One option is community-wide coordination to ensure that different parties involved in LLM creation and deployment share the information needed to resolve questions of provenance. Otherwise, it may become increasingly difficult to train newer versions of LLMs without access to data that was crawled from the Internet prior to the mass adoption of the technology, or direct access to data generated by humans at scale.
Bang on.
What the academics do not point out are some “real world” business issues:
- Solving this problem costs money; the point of synthetic and machine-generated data is to reduce costs. Cost reduction wins.
- Furthermore, fixing up models takes time. In order to keep indexes fresh, delays are not part of the game plan for companies eager to dominate a market which Accenture pegs as worth trillions of dollars. (See this wild and crazy number.)
- Fiddling around to improve existing models is secondary to capturing the hearts and minds of those eager to worship a few big outfits’ approach to smart software. No one wants to see the problem because that takes mental effort. Those inside one of firms vying to own information framing don’t want to be the nail that sticks up. Not only do the nails get pounded down, they are forced to leave the platform. I call this the Dr. Timnit Gebru effect.
Net net: Good paper. Nothing substantive will change in the short or near term.
Stephen E Arnold, June 15, 2023
Can You Create Better with AI? Sure, Even If You Are Picasso or a TikTok Star
June 15, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Do we need to worry about how generative AI will change the world? Yes, but no more than we had to fear automation, the printing press, horseless carriages, and the Internet. The current technology revolution is analogous to the Industrial Revolutions and technology advancements of past centuries. University of Chicago history professor Ada Palmer is aware of humanity’s cyclical relationship with technology and she discusses it in her Microsoft Unlocked piece: “We Are An Information Revolution Species.”
Palmer explains that the human species has been living in an information revolution for twenty generations. She provides historical examples and how people bemoan changes. The changes arguably remove the “art” from tasks. These tasks, however, are simplified and allow humans to create more. It also frees up humanity’s time to conquer harder problems. Changes in technology spur a democratization of information. They also mean that jobs change, so humans need to adapt their skills for continual survival.
Palmer says that AI is just another tool as humanity progresses. She asserts that the bigger problems are outdated systems that no longer serve the current society. While technology has evolved so has humanity:
“This revolution will be faster, but we have something the Gutenberg generations lacked: we understand social safety nets. We know we need them, how to make them. We have centuries of examples of how to handle information revolutions well or badly. We know the cup is already leaking, the actor and the artist already struggling as the megacorp grows rich. Policy is everything. We know we can do this well or badly. The only sure road to real life dystopia is if we convince ourselves dystopia is unavoidable, and fail to try for something better.”
AI does need a social safety net so it does not transform into a sentient computer hell-bent on world domination. Palmer should point out that humans learn from their imaginations too. Star Trek or 2001: A Space Odyssey anyone?
A digital Sistine Chapel from a savant in Cairo, Illinois. Oh, right, Cairo, Illinois, is gone. But nevertheless…
Whitney Grace, June 15, 2023
Smart Software: The Dream of Big Money Raining for Decades
June 14, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
The illustration — from the crafty zeros and ones at MidJourney — depicts a young computer scientist reveling in the cash generated from his AI-infused innovation.
For a budding wizard, the idea of cash falling around the humanoid is invigorating. It is called a “coder’s high” or Silicon Valley fever. There is no known cure, even when FTX-type implosions doom a fellow traveler to months of litigation and some hard time among individuals typically not in an advanced math program.
Where’s the cyclone of cash originate?
I would submit that articles like “Generative AI Revenue Is Set to Reach US$1.3 Trillion in 2032” are like catnip to a typical feline living amidst the cubes at a Google-type company or in the apartment of a significant other adjacent a blue chip university in the US.
Here’s the chart that makes it easy to see the slope of the growth:
I want to point out that this confection is the result of the mid tier outfit IDC and the fascinating Bloomberg terminal. Therefore, I assume that it is rock solid, based on in-depth primary research, and deep analysis by third-party consultants. I do, however, reserve the right to think that the chart could have been produced by an intern eager to hit the gym and grabbing a sushi special before the good stuff was gone.
Will generative AI hit the $1.3 trillion target in nine years? In the hospital for recovering victims of spreadsheet fever, the coder’s high might slow recovery. But many believe — indeed, fervently hope to experience the realities of William James’s mystics in his Varieties of Religious Experience.
My goodness, the vision of money from Generative AI is infectious. So regulate mysticism? Erect guard rails to prevent those with a coder’s high from driving off the Information Superhighway?
Get real.
Stephen E Arnold, June 12, 2023
Can One Be Accurate, Responsible, and Trusted If One Plagiarizes
June 14, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Now that AI is such a hot topic, tech companies cannot afford to hold back due to small flaws. Like a tendency to spit out incorrect information, for example. One behemoth seems to have found a quick fix for that particular wrinkle: simple plagiarism. Eager to incorporate AI into its flagship Search platform, Google recently released a beta version to select users. Forbes contributor Matt Novak was among the lucky few and shares his observations in, “Google’s New AI-Powered Search Is a Beautiful Plagiarism Machine.”
The author takes us through his query and results on storing live oysters in the fridge, complete with screenshots of the Googlebot’s response. (Short answer: you can for a few days if you cover them with a damp towel.) He highlights passages that were lifted from websites, some with and some without tiny tweaks. To be fair, Google does link to its source pages alongside the pilfered passages. But why click through when you’ve already gotten what you came for? Novak writes:
“There are positive and negative things about this new Google Search experience. If you followed Google’s advice, you’d probably be just fine storing your oysters in the fridge, which is to say you won’t get sick. But, again, the reason Google’s advice is accurate brings us immediately to the negative: It’s just copying from websites and giving people no incentive to actually visit those websites.
Why does any of this matter? Because Google Search is easily the biggest driver of traffic for the vast majority of online publishers, whether it’s major newspapers or small independent blogs. And this change to Google’s most important product has the potential to devastate their already dwindling coffers. … Online publishers rely on people clicking on their stories. It’s how they generate revenue, whether that’s in the sale of subscriptions or the sale of those eyeballs to advertisers. But it’s not clear that this new form of Google Search will drive the same kind of traffic that it did over the past two decades.”
Ironically, Google’s AI may shoot itself in the foot by reducing traffic to informative websites: it needs their content to answer queries. Quite the conundrum it has made for itself.
Cynthia Murrell, June 14, 2023