A Discernment Challenge for Those Who Are Dull Normal
June 24, 2024
This essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.
Techradar, an online information service, published “Ahead of GPT-5 Launch, Another Test Shows That People Cannot Distinguish ChatGPT from a Human in a Conversation Test — Is It a Watershed Moment for AI?” The headline implies “change everything” rhetoric, but that is routine AI jargon-hype.
Once again, academics who are unable to land a job in a “real” smart software company studied the work of their former colleagues who make a lot more money than those teaching do. Well, what do academic researchers do when they are not sitting in the student union or the snack area in the lab whilst waiting for a graduate student to finish a task? In my experience, some think about their CVs or résumés. Others ponder the flaws in a commercial or allegedly commercial product or service.
A young shopper explains that the outputs of egg laying chickens share a similarity. Insightful observation from a dumb carp. Thanks, MSFT Copilot. How’s that Recall project coming along?
The write up reports:
The Department of Cognitive Science at UC San Diego decided to see how modern AI systems fared and evaluated ELIZA (a simple rules-based chatbot from the 1960’s included as a baseline in the experiment), GPT-3.5, and GPT-4 in a controlled Turing Test. Participants had a five-minute conversation with either a human or an AI and then had to decide whether their conversation partner was human.
Here’s the research set up:
In the study, 500 participants were assigned to one of five groups. They engaged in a conversation with either a human or one of the three AI systems. The game interface resembled a typical messaging app. After five minutes, participants judged whether they believed their conversation partner was human or AI and provided reasons for their decisions.
And what did the intrepid academics find? Factoids that will get them a job at a Perplexity-type of company? Information that will put smart software into focus for the elected officials writing draft rules and laws to prevent AI from making The Terminator come true?
The results were interesting. GPT-4 was identified as human 54% of the time, ahead of GPT-3.5 (50%), with both significantly outperforming ELIZA (22%) but lagging behind actual humans (67%). Participants were no better than chance at identifying GPT-4 as AI, indicating that current AI systems can deceive people into believing they are human.
What does this mean for those labeled dull normal, a nifty term applied to some lucky people taking IQ tests. I wanted to be a dull normal, but I was able to score in the lowest possible quartile. I think it was called dumb carp. Yes!
Several observations to disrupt your clear thinking about smart software and research into how the hot dogs are made:
- The smart software seems to have stalled. Our tests of You.com which allows one to select which object models parrots information, it is tough to differentiate the outputs. Cut from the same transformer cloth maybe?
- Those judging, differentiating, and testing smart software outputs can discern differences if they are way above dull normal or my classification dumb carp. This means that indexing systems, people, and “new” models will be bamboozled into thinking what’s incorrect is a-okay. So much for the informed citizen.
- Will the next innovation in smart software revolutionize something? Yep, some lucky investors.
Net net: Confusion ahead for those like me: Dumb carp. Dull normals may be flummoxed. But those super-brainy folks have a chance to rule the world. Bust out the party hats and little horns.
Stephen E Arnold, June 24, 2024
Ad Hominem Attack: A Revived Rhetorical Form
June 24, 2024
This essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.
I remember my high school debate coach telling my partner Nick G. (I have forgotten the budding prosecutor’s name, sorry) you should not attack the character of our opponents. Nick G. had interacted with Bill W. on the basketball court in an end-of-year regional game. Nick G., as I recall got a bloody nose, and Bill W. was thrown out of the basketball game. When fisticuffs ensued, I thanked my lucky stars I was a hopeless athlete. Give me the library, a debate topic, a pile of notecards, and I was good to go. Nick G. included in his rebuttal statement comments about the character of Bill W. When the judge rendered a result and his comments, Nick G. was singled out as being wildly inappropriate. After the humiliating defeat, the coach explained that an ad hominem argument is not appropriate for 15-year-olds. Nick G.’s attitude was, “I told the truth.” As Nick G. learned, the truth is not what wins debate tournaments or life in some cases.
I thought about ad hominem arguments as I read “Silicon Valley’s False Prophet.” This essay reminded me of the essay by the same author titled “The Man Who Killed Google Search.” I must admit the rhetorical trope is repeatable. Furthermore it can be applied to an individual who may be clueless about how selling advertising nuked relevance (or what was left of it) at the Google and to the dealing making of a person whom I call Sam AI-Man. Who knows? Maybe other authors will emulate these two essays, and a new Silicon Valley genre may emerge ready for the real wordsmiths and pooh-bahs of Silicon Valley to crank out a hit piece every couple of days.
To the essay at hand: The false profit is the former partner of Elon Musk and the on-again-off-again-on-again Big Dog at OpenAI. That’s an outfit where “open” means closed, and closed means open to the likes of Apple. The main idea, I think, is that AI sucks and Sam AI-Man continues to beat the drum for a technology that is likely to be headed for a correction. In Silicon Valley speak, the bubble will burst. It is, I surmise, Mr. AI-man’s fault.
The essay explains:
Sam Altman, however, exists in a category of his own. There are many, many, many examples of him saying that OpenAI — or AI more broadly — will do something it can’t and likely won’t, and it being meekly accepted by the Fourth Estate without any real pushback. There are more still of him framing the limits of the present reality as a positive — like when, in a fireside sitdown with
1980s used car salesmanSalesforce CEO Marc Benioff, Altman proclaimed that AI hallucinations (when an LLM asserts something untrue as fact, because AI doesn’t know anything) are a feature, not a bug, and rather than being treated as some kind of fundamental limitation, should be regarded as a form of creative expression.
I understand. Salesperson. Quite a unicorn in Silicon Valley. I mean when I worked there I would encounter hyperbole artists every few minutes. Yeah, Silicon Valley. Anchored in reality, minimum viable products, and lots of hanky pinky.
The essay provides a bit of information about the background of Mr. AI-Man:
When you strip away his ability to convince people that he’s smart, Altman had actually done very little — he was a college dropout with a failing-then-failed startup, one where employees tried to get him fired twice.
If true, that takes some doing. Employees tried to get the false prophet fired twice. In olden times, burning at the stake might have been an option. Now it is just move on to another venture. Progress.
The essay does provide some insight into Sam AI-Man’s core competency:
Altman is adept at using connections to make new connections, in finding ways to make others owe him favors, in saying the right thing at the right time when he knew that nobody would think about it too hard. Altman was early on Stripe, and Reddit, and Airbnb — all seemingly-brilliant moments in the life of a man who had many things handed to him, who knew how to look and sound to get put in the room and to get the capital to make his next move. It’s easy to conflate investment returns with intellectual capital, even though the truth is that people liked Altman enough to give him the opportunity to be rich, and he took it.
I cannot figure out if the author envies Sam AI-Man, reviles him for being clever (a key attribute in some high-technology outfits), or genuinely perceives Mr. AI-Man as the first cousin to Beelzebub. Whatever the motivation, I find the phoenix-like rising of the ad hominem attack a refreshing change from the entitled pooh-bahism of some folks writing about technology.
The only problem: I think it is unlikely that the author will be hired by OpenAI. Chance blown.
Stephen E Arnold, June 24, 2024
Chasing a Folly: Identifying AI Content
June 24, 2024
As are other academic publishers, Springer Nature Group is plagued by fake papers. Now the company announces, “Springer Nature Unveils Two New AI Tools to Protect Research Integrity.” How effective the tools are remains to be proven, but at least the company is making an effort. The press release describes text-checker Geppetto and image-analysis tool SnappShot. We learn:
“Geppetto works by dividing the paper up into sections and uses its own algorithms to check the consistency of the text in each section. The sections are then given a score based on the probability that the text in them has been AI generated. The higher the score, the greater the probability of there being problems, initiating a human check by Springer Nature staff. Geppetto is already responsible for identifying hundreds of fake papers soon after submission, preventing them from being published – and from taking up editors’ and peer reviewers’ valuable time.
SnappShot, also developed in-house, is an AI-assisted image integrity analysis tool. Currently used to analyze PDF files containing gel and blot images and look for duplications in those image types – another known integrity problem within the industry – this will be expanded to cover additional image types and integrity problems and speed up checks on papers.”
Springer Nature’s Chris Graf emphasizes the importance of research integrity and vows to continue developing and improving in-house tools. To that end, we learn, the company is still growing its fraud-detection team. The post points out Springer Nature is a contributing member of the STM Integrity Hub.
Based in Berlin, Springer Nature was formed in 2015 through the combination of Nature Publishing Group, Macmillan Education, and Springer Science+Business Media. A few of its noteworthy publications include Scientific American, Nature, and this collection of Biology, Clinical Medicine, and Health journals.
Cynthia Murrell, June 24, 2024
Thomson Reuters: A Trust Report about Trust from an Outfit with Trust Principles
June 21, 2024
This essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.
Thomson Reuters is into trust. The company has a Web page called “Trust Principles.” Here’s a snippet:
The Trust Principles were created in 1941, in the midst of World War II, in agreement with The Newspaper Proprietors Association Limited and The Press Association Limited (being the Reuters shareholders at that time). The Trust Principles imposed obligations on Reuters and its employees to act at all times with integrity, independence, and freedom from bias. Reuters Directors and shareholders were determined to protect and preserve the Trust Principles when Reuters became a publicly traded company on the London Stock Exchange and Nasdaq. A unique structure was put in place to achieve this. A new company was formed and given the name ‘Reuters Founders Share Company Limited’, its purpose being to hold a ‘Founders Share’ in Reuters.
Trust nestles in some legalese and a bit of business history. The only reason I mention this anchoring in trust is that Thomson Reuters reported quarterly revenue of $1.88 billion in May 2024, up from $1.74 billion in May 2023. The financial crowd had expected $1.85 billion in the quarter, and Thomson Reuters beat that. Surplus funds makes it possible to fund many important tasks; for example, a study of trust.
The ouroboros, according to some big thinkers, symbolizes the entity’s journey and the unity of all things; for example, defining trust, studying trust, and writing about trust as embodied in the symbol.
My conclusion is that trust as a marketing and business principle seems to be good for business. Therefore, I trust, and I am confident that the information in “Global Audiences Suspicious of AI-Powered Newsrooms, Report Finds.” The subject of the trusted news story is the Reuters Institute for the Study of Journalism. The Thomson Reuters reporter presents in a trusted way this statement:
According to the survey, 52% of U.S. respondents and 63% of UK respondents said they would be uncomfortable with news produced mostly with AI. The report surveyed 2,000 people in each country, noting that respondents were more comfortable with behind-the-scenes uses of AI to make journalists’ work more efficient.
To make the point a person working for the trusted outfit’s trusted report says in what strikes me as a trustworthy way:
“It was surprising to see the level of suspicion,” said Nic Newman, senior research associate at the Reuters Institute and lead author of the Digital News Report. “People broadly had fears about what might happen to content reliability and trust.”
In case you have lost the thread, let me summarize. The trusted outfit Thomson Reuters funded a study about trust. The research was conducted by the trusted outfit’s own Reuters Institute for the Study of Journalism. The conclusion of the report, as presented by the trusted outfit, is that people want news they can trust. I think I have covered the post card with enough trust stickers.
I know I can trust the information. Here’s a factoid from the “real” news report:
Vitus “V” Spehar, a TikTok creator with 3.1 million followers, was one news personality cited by some of the survey respondents. Spehar has become known for their unique style of delivering the top headlines of the day while laying on the floor under their desk, which they previously told Reuters is intended to offer a more gentle perspective on current events and contrast with a traditional news anchor who sits at a desk.
How can one not trust a report that includes a need met by a TikTok creator? Would a Thomson Reuters’ professional write a news story from under his or her desk or cube or home office kitchen table?
I think self funded research which finds that the funding entity’s approach to trust is exactly what those in search of “real” news need. Wikipedia includes some interesting information about Thomson Reuters in its discussion of the company in the section titled “Involvement in Surveillance.” Wikipedia alleges that Thomson Reuters licenses data to Palantir Technologies, an assertion which if accurate I find orthogonal to my interpretation of the word “trust.” But Wikipedia is not Thomson Reuters.
I will not ask questions about the methodology of the study. I trust the Thomson Reuters’ professionals. I will not ask questions about the link between revenue and digital information. I have the trust principles to assuage any doubt. I will not comment on the wonderful ouroboros-like quality of an enterprise embodying trust, funding a study of trust, and converting those data into a news story about itself. The symmetry is delicious and, of course, trustworthy. For information about Thomson Reuters’s trust use of artificial intelligence see this Web page.
Stephen E Arnold, June 21, 2024
The Key to Success at McKinsey & Company: The 2024 Truth Is Out!
June 21, 2024
This essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.
When I was working at a “real” company, I wanted to labor in the vineyards of a big-time, blue-chip consulting firm. I achieved that goal and, after a suitable period of time in the penal colony, I escaped to a client. I made it out, unscathed, and entered a more interesting, less nutso working life. When the “truth” about big-time, blue-chip consulting firms appears in public sources, I scan the information. Most of it is baloney; for example, the yip yap about McKinsey and its advice pertaining to addictive synthetics. Hey, stuff happens when one is objective. “McKinsey Exec Tells Summer Interns That Learning to Ask AI the Right Questions Is the Key to Success” contains some information which I find quite surprising. First, I don’t know if the factoids in the write up are accurate or if they are the off-the-cuff baloney recruiters regularly present to potential 60-hour-a-week knowledge worker serfs or if the person has a streaming video connection to the McKinsey managing partner’s work-from-the-resort office.
Let’s assume the information is correct and consider some of its implications. An intern is a no-pay or low-pay job for students from the right institutions, the right background, or the right connections. The idea is that associates (one step above the no-pay serf) and partners (the set for life if you don’t die of heart failure crowd) can observe, mentor, and judge these field laborers. The write up states:
Standing out in a summer internship these days boils down to one thing — learning to talk to AI. At least, that’s the advice McKinsey’s chief client officer, Liz Hilton Segel, gave one eager intern at the firm. “My advice to her was to be an outstanding prompt engineer,” Hilton Segel told The Wall Street Journal.
But what about grades? What about my family’s connections to industry, elected officials, and a supreme court judge? What about my background scented with old money, sheepskin from prestigious universities, and a Nobel Prize awarded a relative 50 years ago? These questions, its seems, may no longer be relevant. AI is coming to the blue-chip consulting game, and the old-school markers of building big revenues may not longer matter.
AI matters. Despite McKinsey’s 11-month effort, the firm has produced Lilli. The smart systems, despite fits and starts, has delivered results; that is, a payoff, cash money, engagement opportunities. The write up says:
Lilli’s purpose is to aggregate the firm’s knowledge and capabilities so that employees can spend more time engaging with clients, Erik Roth, a senior partner at McKinsey who oversaw Lili’s development, said last year in a press release announcing the tool.
And the proof? I learned:
“We’ve [McKinsey humanoids] answered over 3 million prompts and add about 120,000 prompts per week,” he [Erik Roth] said. “We are saving on average up to 30% of a consultants’ time that they can reallocate to spend more time with their clients instead of spending more time analyzing things.”
Thus, the future of success is to learn to use Lilli. I am surprised that McKinsey does not sell internships, possibly using a Ticketmaster-type system.
Several observations:
- As Lilli gets better or is replaced by a more cost efficient system, interns and newly hired professionals will be replaced by smart software.
- McKinsey and other blue-chip outfits will embrace smart software because it can sell what the firm learns to its clients. AI becomes a Petri dish for finding marketable information.
- The hallucinative functions of smart software just create an opportunity for McKinsey and other blue-chip firms to sell their surviving professionals at a more inflated fee. Why fail and lose money? Just pay the consulting firm, sidestep the stupidity tax, and crush those competitors to whom the consulting firms sell the cookie cutter knowledge.
Net net: Blue-chip firms survived the threat from gig consultants and the Gerson Lehrman-type challenge. Now McKinsey is positioning itself to create a no-expectation environment for new hires, cut costs, and increase billing rates for the consultants at the top of the pyramid. Forget opioids. Go AI.
Stephen E Arnold, June 21, 2024
Meta Case Against Intelware Vendor Voyager Lags to Go Forward
June 21, 2024
Another clever intelware play gets trapped and now moves to litigation. Meta asserts that when Voyager Labs scraped data on over 600,000 Facebook users, it violated its contract. Furthermore, it charges, the scraping violated anti-hacking laws. While Voyager insists the case should be summarily dismissed, U.S. District Court Judge Araceli Martinez-Olguin disagrees. MediaDailyNews reports, “Meta Can Proceed With Claims that Voyager Labs Scraped Users’ Data.” Writer Wendy Davis explains:
“Voyager argued the complaint should be dismissed at an early stage for several reasons. Among others, Voyager said the allegations regarding Facebook’s terms of service were too vague. Meta’s complaint ‘refers to a catchall category of contracts … but then says nothing more about those alleged contracts, their terms, when they are supposed to have been executed, or why they allegedly bind Voyager UK today,’ Voyager argued to Martinez-Olguin in a motion filed in February. The company also said California courts lacked jurisdiction to decide whether the company violated federal or state anti-hacking laws. Martinez-Olguin rejected all of Voyager’s arguments on Thursday. She wrote that while Meta’s complaint could have set out the company’s terms of service ‘with more clarity,’ the allegations sufficiently informed Voyager of the basis for Meta’s claim.”
This battle began in January 2023 when Meta first filed the complaint. Now it can move forward. How long before the languid wheels of justice turn out a final ruling? A long time we wager.
Cynthia Murrell, June 21, 2024
What Is That Wapo Wapo Wapo Sound?
June 20, 2024
This essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.
Do you hear that thumping wapo wapo wapo sound? I do. It reminds me of an old school pickup truck with a flat tire on a hot summer’s day? Yep, wapo wapo wapo. That’s it!
“Jeff Bezos Has Worst Response Ever to Washington Post Turmoil” emitted this sound when I read the essay in New Republic. The newspaper for Washington, DC and its environs is the Post. When I lived in Washington, DC, the newspaper was a must read. Before I trundled off to the cheerful workplace of Halliburton Nuclear and later to the incredibly sensitive and human blue chip consulting firm known affectionately as the Boozer, I would read the WaPo. I had to be prepared. If I were working with a Congress person like Admiral Craig Hosmer, USN Retired, I had to know what Miss Manners had to say that day. A faux pas could be fatal.
The old pickup truck has a problem because one of the tires went wapo wapo wapo and then the truck stopped. Thanks, MSFT Copilot. Good enough.
The WaPo is now a Jeff Bezos property. I have forgotten how the financial deal was structured, but he has a home in DC and every person who is in contention as one of the richest men on earth needs a newspaper. The write up explains:
In a memo to the paper’s top personnel on Tuesday, the billionaire technocrat backed the new CEO Will Lewis, a former lieutenant to right-wing media mogul Richard Murdoch, whose controversial appointment at the Post has made waves across the industry in the wake of reporting on his shady journalistic practices.
That’s inspiring for a newspaper: A political angle and “shady journalistic practices.” What happened to that old every day is Day One and the customer is important? I suppose a PR person could trot those out. But the big story seems to be the newspaper is losing readers and money. Don’t people in DC read? Oh, silly question. No, now the up-and-come movers and shakers doom scroll and watch YouTube. The cited article includes a snippet from the Bezos bulldozer it appears. That item states:
…the journalistic standards and ethics at The Post will not change… You have my full commitment to n maintaining the quality, ethics, and standards we all believe in.
Two ethics in one short item. Will those add up this way: ethics plus ethics equals trust? Sure. I believe everything one of the richest people in the world says. It seems that one of the new hires to drive the newspaper world’s version of Jack Benny’s wheezing Maxwell was involved in some hanky-panky from private telephone conversations.
Several observations:
- “Real” newspapers seem to be facing some challenges. These range from money to money to money. Did I mention money?
- The newspaper owner and the management team have to overcome the money hurdle. How does one do that? Maybe smart software from an outfit like AWS and the Sagemaker product line? The AI can output good enough content at a lower cost and without grousing humans, vacations, health care, and annoying reporters poking into the lifestyle of the rich, powerful, famous, and rich. Did I mention “rich” twice? But if Mr. Bezos can work two ethics into one short memo, I can fit two into a longer blog post.
- The readers and journalists are likely to lose. I think readers will just suck down content from their mobile devices and the journalists will have to find their futures elsewhere like certain lawyers, many customer service personnel, and gig workers who do “art” for publishers, among others.
Net net: Do you hear the wapo wapo wapo? How long will the Bezos pickup truck roll along?
Stephen E Arnold, June 20, 2024
There Must Be a Fix? Sorry. Nope.
June 20, 2024
This essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.
I enjoy stories like “Microsoft Chose Profit Over Security and Left U.S. Government Vulnerable to Russian Hack, Whistleblower Says.” It combines a number of fascinating elements; for example, corporate green, Russia, a whistleblower, and the security of the United States. Figuring out who did what to whom when and under what circumstances is not something a dinobaby at my pay grade of zero can do. However, I can highlight some of the moving parts asserted in the write up and pose a handful of questions. Will these make you feel warm and fuzzy? I hope not. I get a thrill capturing the ideas as they manifest in my very aged brain.
The capture officer proudly explains to the giant corporation, “You have won the money?” Can money buy security happiness? Answer: Nope. Thanks, MSFT Copilot. Good enough, the new standard of excellence.
First, what is the primum movens for this exposé? I think that for this story, one candidate is Microsoft. The company has to decide to do what slays the evil competitors, remains the leader in all things smart, and generates what Wall Street and most stakeholders crave: Money. Security is neither sexy nor a massive revenue producer when measured in terms of fixing up the vulnerabilities in legacy code, the previous fixes, and the new vulnerabilities cranked out with gay abandon. Recall any recent MSFT service which may create a small security risk or two? Despite this somewhat questionable approach to security, Microsoft has convinced the US government that core software like PowerPoint definitely requires the full panoply of MSFT software, services, features, and apps. Unfortunately articles like “Microsoft Chose Profit Over Security” converts the drudgery of cyber security into a snazzy story. A hard worker finds the MSFT flaw, reports it, and departs for a more salubrious work life. The write up says:
U.S. officials confirmed reports that a state-sponsored team of Russian hackers had carried out SolarWinds, one of the largest cyberattacks in U.S. history. They used the flaw Harris had identified to vacuum up sensitive data from a number of federal agencies, including, ProPublica has learned, the National Nuclear Security Administration, which maintains the United States’ nuclear weapons stockpile, and the National Institutes of Health, which at the time was engaged in COVID-19 research and vaccine distribution. The Russians also used the weakness to compromise dozens of email accounts in the Treasury Department, including those of its highest-ranking officials. One federal official described the breach as “an espionage campaign designed for long-term intelligence collection.”
Cute. SolarWinds, big-money deals, and hand-waving about security. What has changed? Nothing. A report criticized MSFT; the company issued appropriate slick-talking, lawyer-vetted, PR-crafted assurances that security is Job One. What has changed? Nothing.
The write up asserts about MSFT’s priorities:
the race to dominate the market for new and high-growth areas like the cloud drove the decisions of Microsoft’s product teams. “That is always like, ‘Do whatever it frickin’ takes to win because you have to win.’ Because if you don’t win, it’s much harder to win it back in the future. Customers tend to buy that product forever.”
I understand. I am not sure corporations and government agencies do. That PowerPoint software is the go-to tool for many agencies. One high-ranking military professional told me: “The PowerPoints have to be slick.” Yep, slick. But reports are written in PowerPoints. Congress is briefed with PowerPoints. Secret operations are mapped out in PowerPoints. Therefore, buy whatever it takes to make, save, and distribute the PowerPoints.
The appropriate response is, “Yes, sir.”
So what’s the fix? There is no fix. The Microsoft legacy security, cloud, AI “conglomeration” is entrenched. The Certified Partners will do patch ups. The whistleblowers will toot, but their tune will be downed out in the post-contract-capture party at the Old Ebbitt Grill.
Observations:
- Third-party solutions are going to have to step up. Microsoft does not fix; it creates.
- More serious breaches are coming. Too many nation-states view the US as a problem and want to take it down and put it out.
- Existing staff in the government and at third-party specialist firms are in “knee jerk mode.” The idea of pro-actively getting ahead of the numerous bad actors is an interesting thought experiment. But like most thought experiments, it can morph into becoming a BFF of Don Quixote and going after those windmills.
Net net: Folks, we have some cyber challenges on our hands, in our systems, and in the cloud. I wish reality were different, but it is what it is. (Didn’t President Clinton define “is”?)
Stephen E Arnold, June 20, 2024
Can Anthropic Break Into the AI Black Box?
June 20, 2024
The inner workings of large language models have famously been a mystery, even to their creators. That is a problem for those who would like transparency around pivotal AI systems. Now, however, Anthropic may have found the solution. Time reports, “No One Truly Knows Bow AI Systems Work. A New Discovery Could Change That.” If the method pans out, this will be perfect for congressional hearings and anti trust testimony. Reporter Billy Perrigo writes:
“Researchers developed a technique for essentially scanning the ‘brain’ of an AI model, allowing them to identify collections of neurons—called ‘features’—corresponding to different concepts. And for the first time, they successfully used this technique on a frontier large language model, Anthropic’s Claude Sonnet, the lab’s second-most powerful system, .In one example, Anthropic researchers discovered a feature inside Claude representing the concept of ‘unsafe code.’ By stimulating those neurons, they could get Claude to generate code containing a bug that could be exploited to create a security vulnerability. But by suppressing the neurons, the researchers found, Claude would generate harmless code. The findings could have big implications for the safety of both present and future AI systems. The researchers found millions of features inside Claude, including some representing bias, fraudulent activity, toxic speech, and manipulative behavior. And they discovered that by suppressing each of these collections of neurons, they could alter the model’s behavior. As well as helping to address current risks, the technique could also help with more speculative ones.”
The researchers hope their method will replace “red-teaming,” where developers chat with AI systems in order to uncover toxic or dangerous traits. On the as-of-yet theoretical chance an AI gains the capacity to deceive its creators, the more direct method would be preferred.
A happy side effect of the method could be better security. Anthropic states being able to directly manipulate AI features may allow developers to head off AI jailbreaks. The research is still in the early stages, but Anthropic is singing an optimistic tune.
Cynthia Murrell, June 20, 2024
Great Moments in Smart Software: IBM Watson Gets to Find Its Future Elsewhere Again
June 19, 2024
This essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.
The smart software game is a tough one. Whip up some compute, download the models, and go go go. Unfortunately artificial intelligence is artificial and often not actually intelligent. I read an interesting article in Time Magazine (who knew it was still in business?). The story has a clickable title: “McDonald’s Ends Its Test Run of AI Drive-Throughs With IBM.” The juicy word IBM, the big brand McDonald’s, and the pickle on top: IBM.
A college student tells the smart software system at a local restaurant that his order was misinterpreted. Thanks, MSFT Copilot. How your “recall” today? What about system security? Oh, that’s too bad.
The write up reports with the glee of a kid getting a happy meal:
McDonald’s automated order taker with IBM received scores of complaints in recent years, for example — with many taking to social media to document the chatbot misunderstanding their orders.
Consequently, the IBM fast food service has been terminated.
Time’s write up included a statement from Big Blue too:
In an initial statement, IBM said that “this technology is proven to have some of the most comprehensive capabilities in the industry, fast and accurate in some of the most demanding conditions," but did not immediately respond to a request for further comment about specifics of potential challenges.
IBM suggested its technology could help fight cancer in Houston a few years ago. How did that work out? That smart software worker had an opportunity to find its future elsewhere. The career trajectory, at first glance, seems to be from medicine to grilling burgers. One might interpret this as an interesting employment trajectory. The path seems to be heading down to Sleepy Town.
What’s the future of the IBM smart software test? The write up points out:
Both IBM and McDonald’s maintained that, while their AI drive-throughs partnership was ending, the two would continue their relationship on other projects. McDonalds said that it still plans to use many of IBM’s products across its global system.
But Ronald McDonald has to be practical. The article adds:
In December, McDonald’s launched a multi-year partnership with Google Cloud. In addition to moving restaurant computations from servers into the cloud, the partnership is also set to apply generative AI “across a number of key business priorities” in restaurants around the world.
Google’s smart software has been snagged in some food controversies too. The firm’s smart system advised some Googlers to use glue to make the cheese topping stick better. Yum.
Several observations seem to be warranted:
- Practical and money-saving applications of IBM’s smart software do not have the snap, crackle, and pop of OpenAI’s PR coup with Microsoft in January 2023. Time is writing about IBM, but the case example is not one that makes me crave this particular application. Customers want a sandwich, not something they did not order.
- Examples of reliable smart software applications which require spontaneous reaction to people ordering food or asking basic questions are difficult to find. Very narrow applications of smart software do result in positive case examples; for example, in some law enforcement software (what I call policeware), the automatic processes of some vendors’ solutions work well; for example, automatic report generation in the Shadowdragon Horizon system.
- Big companies spend money, catch attention, and then have to spend more money to remediate and clean up the negative publicity.
Net net: More small-scale testing and less publicity chasing seem to be two items to add to the menu. And, Watson, keep on trying. Google is.
Stephen E Arnold, June 19, 2024
x