AI Summaries Get News Wrong
February 28, 2025
With big news stories emerging at a frantic pace, one might turn to AI to consolidate the key points. If so, one might become woefully ill informed. “AI Chatbots Unable to Accurately Summarise News, BBC Finds.” The BBC tested the biggest AIs on content from its own site–OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini and Perplexity AI all sat for the exam. None of them passed it, though ChatGPT and Perplexity were less bad than Copilot and Gemini. Tech reporter Imran Rahman-Jones tells us:
“In the study, the BBC asked ChatGPT, Copilot, Gemini and Perplexity to summarise 100 news stories and rated each answer. It got journalists who were relevant experts in the subject of the article to rate the quality of answers from the AI assistants. It found 51% of all AI answers to questions about the news were judged to have significant issues of some form. Additionally, 19% of AI answers which cited BBC content introduced factual errors, such as incorrect factual statements, numbers and dates.”
But it was not just about mixing up, or inventing, facts. The chatbots also struggled with the concept of context and the distinction between facts and opinions. We learn:
“The report said that as well as containing factual inaccuracies, the chatbots ‘struggled to differentiate between opinion and fact, editorialised, and often failed to include essential context’.”
To illustrate the findings, the article gives us a few examples:
- “Gemini incorrectly said the NHS did not recommend vaping as an aid to quit smoking.
- ChatGPT and Copilot said Rishi Sunak and Nicola Sturgeon were still in office even after they had left.
- Perplexity misquoted BBC News in a story about the Middle East, saying Iran initially showed ‘restraint’ and described Israel’s actions as ‘aggressive’.”
So, dear readers, we suggest you take the time to read the news for yourselves. Or, at the very least, get your recaps from another human.
Cynthia Murrell, February 28, 2025
Curricula Ideas That Will Go Nowhere Fast
February 28, 2025
No smart software. Just a dinobaby doing his thing.
I read “Stuff You Should Have Been Taught in College But Weren’t” reveals a young person who has some dinobaby notions. Good for Casey Handmer, PhD. Despite his brush with Hyperloop, he has retained an ability to think clearly about education. Caltech and the JPL have shielded him from some intellectual cubby holes.
So why am I mentioning the “Stuff You Should Have…” essay and the author? I found the write up in line with thoughts my colleagues and I have shared. Let me highlight a few of Dr. Handmer’s “Should haves” despite my dislike for “woulda coulda shoulda” as a mental bookshelf.
The write up says:
in the sorts of jobs you want to have, no-one should have to spell anything out for you.
I want to point out that the essay may not be appropriate for a person who seeks a job washing dishes at the El Nopal restaurant on Goose Creek Road. The observation strikes me as appropriate for an individual who seeks employment at a high-performing organization or an aspiring “performant” outfit. (I love the coinage “performant”; it is very with it.
What are other dinobaby-in-the-making observations in the write up. I have rephrased some of the comments, and I urge you to read the original essay. Here’s goes:
- Do something tangible to demonstrate your competence. Doom scrolling and watching TikTok-type videos may not do the job.
- Offer proof you deliver value in whatever you do. I am referring to “good” actors, not “bad” actors selling Telegram and WhatsApp hacking services on the Dark Web. “Proof” is verifiable facts, a reference from an individual of repute, or demonstrating a bit of software posted on GitHub or licensed from you.
- Watch, learn, and act in a way that benefits the organization, your colleagues, and your manager.
- Change jobs to grow and demonstrate your capabilities.
- Suck it up, buttercup. Life is a series of challenges. Meet them. Deliver value.
I want to acknowledge that not all dinobabies exhibit these traits as they toddle toward the holding tank for the soon-to-be-dead. However, for an individual who wants to contribute and grow, the ideas in this essay are good ones to consider and then implement.
I do have several observations:
- The percentage of a cohort who can consistently do and deliver is very small. Excellence is not for everyone. This has significant career implications unless you have a lot of money, family connections, or a Hollywood glow.
- Most of the young people with whom I interact say they have these or similar qualities. Then their own actions prove they don’t. Here’s an example: I met a business school dean. I offered to share some ideas relevant to the job market. I gave him my card because he forgot his cards. He never emailed me. I contacted him and said politely, “What’s up?” He double talked and wanted to meet up in the spring. What’s that tell me about this person’s work ethic? Answer: Loser.
- Universities and other formal training programs struggle even when the course material and teacher is on point. The “problem” begins before the student shows up in class. The impact of family stress on a person creates a hot house of sorts. What grows in the hortorium? Species with an inability to concentrate, a pollen that cannot connect with an ovule, and a baked in confusion of “I will do it” and “doing it.”
Net net: This dinobaby is happy to say that Dr. Handmer will make a very good dinobaby some day.
Stephen E Arnold, February 28, 2025
Has Amazon Hit the Same Big Pothole As Apple?
February 27, 2025
This blog post is the work of a real-live dinobaby. No smart software involved.
Apple has experienced some growing pains with its Apple Intelligence. Incorrect news and assorted Siri weirdness indicated that designing a rectangle and laptop requires different skills from delivering a high impact, mass market smart software “solution.”
I know Apple is working overtime to come up with the next big thing. Will it be another me-too product? Probably. I liked the M1 chip, but subsequent generations have not done much to change my work flow or my happiness with my laptops and Mac Minis. I am okay with a cheap smart watch. I am okay with an old iPhone. I am okay with providing those who do work for me with a Mac laptop. Apple, however, is not a big player in smart software. In China, the company is embracing Chinese smart software. Hey, Apple wants to sell iPhones. Do what’s necessary is the basic approach to innovation in my opinion.
Has Amazon hit the same pothole as Apple? Surely the Bezos bulldozer can move forward with its powerful innovation machine. I am not so sure. I remember four years ago a project requiring my team to look at Amazon’s Sagemaker. That was an initiative to provide off-the-shelf technology and data sets to Amazon cloud customers who wanted smart software. Have you perceived Sagemaker as the big dog in AI? I don’t.
I read “Looks Like the Next-0Gen Alexa’s Release Is Hitting Another Speed Bump.” The write up suggests that the expensive kitchen timer and weather update device is not getting much smarter quickly. The article reports:
According to a tip from an unnamed Amazon employee, shared by the Washington Post (via Android Authority), the smarter Alexa update won’t be released until March 31. The holdup was apparently due to the upgraded assistant tripping over itself in testing, struggling to nail accurate answers. So, it seems like Amazon is taking extra time to fine-tune Alexa’s brain before letting it loose.
I am not too surprised. Amazon fiddles with the Kindle and the software for that device does not meet the needs of people who read numerous books. (Don’t you love those Amazon Kindle email addresses and the software that makes it a challenge to figure out which books are on the device, which are for sale, and which are in the Amazon cloud? Wonderful software for someone who does not read, just buys books.) The cloud AI initiative has not come close to the Chinese technological “strike” with the Deepseek system. Now the kitchen timer is delayed just like useful Apple Intelligence.
Let me share my hypotheses about why Amazon and I suppose I can include Apple in this mental human hallucination:
- Neither company has a next big thing. Both companies are in a me-too, me-too loop. That’s a common situation in a firm which gets big, has money, and loses its genius for everything except making as much money as possible. Innovation atrophy is my phrase for this characteristic of some companies.
- Throwing money at a problem does not create sparks of insight. The novel ideas are smothered under the flow of money that must be spent. This is a middle manager’s problem; specifically, effort is directed to spending the money, not coming up with a big idea that solves a problem and delights those people. Do you know what’s different about a new iPhone? Do you know which Amazon products are actually of good quality? I sure don’t. I ordered an AMD Ryzen CPU. Amazon shipped me red panties. My old iPhone asks me to log in every time I look at Telegram’s messages on the device. Really, panties and persistent log ins?
- General strategic drift. I am not sure what business Apple is in? Is it services like selling music? Is it hardware which is mostly indistinguishable from the hardware just replaced? Is Amazon a cloud computing outfit with leaky S3 storage constructs? Is it a seller of Temu-type products? Is it a delivery business unable to keep its delivery partners happy? The purpose of these firms is to acquire money. Period. The original Jobs and Bezos “razzmatazz” is gone.
Will the companies remediate the fundamental innovation issue? Nope. But both will make a lot of money. Beavers do what beavers do. No matter what. But beavers might be able to get Alexa to spin money, games to mostly work, and Twitch to make creators happy, not grumpy.
Stephen E Arnold, February 27, 2025
Yikes! Existing AI is Fundamentally Flawed
February 27, 2025
AI applications are barreling full steam ahead into all corners of our lives. Yet there are serious concerns about the very structure of how LLMs work. The BCS Chartered Institute for IT asks, "Does Current AI Represent a Dead End?" Cybersecurity professor Eerke Boiten writes:
"From the perspective of software engineering, current AI systems are unmanageable, and as a consequence their use in serious contexts is irresponsible. For foundational reasons (rather than any temporary technology deficit), the tools we have to manage complexity and scale are just not applicable. By ‘software engineering’, I mean developing software to align with the principle that impactful software systems need to be trustworthy, which implies their development needs to be managed, transparent and accountable … When I last gave talks about AI ethics, around 2018, my sense was that AI development was taking place alongside the abandonment of responsibility in two dimensions. Firstly, and following on from what was already happening in ‘big data’, the world stopped caring about where AI got its data — fitting in nicely with ‘surveillance capitalism. And secondly, contrary to what professional organisations like BCS and ACM had been preaching for years, the outcomes of AI algorithms were no longer viewed as the responsibility of their designers — or anybody, really."
Yes, that is the reality we are careening into. But for big tech, that may be a feature, not a bug. Those firms clearly want today’s AI to be THE one true AI. A high profit to responsibility ratio suits them just fine.
Boiten describes, in a nutshell, how neural networks function. He emphasizes the disturbing lack of human guidance. And understanding. Since engineers cannot know just how an algorithm comes to its conclusions, it is impossible to ensure they are operating to specifications. These problems cannot be resolved with hard work and insights; they are baked in. See the write-up for more details.
If engineers are willing to progress beyond today’s LLMs, Boiten suggests, they could develop something actually reliable. It could even be built on existing AI tech, so all that work (and funding) need not go out the window. They just have to look past the dollar signs in their eyes and press ahead to a safer and more reliable product. The post warns:
"In my mind, all this puts even state-of-the-art current AI systems in a position where professional responsibility dictates the avoidance of them in any serious application. When all its techniques are based on testing, AI safety is an intellectually dishonest enterprise."
Now all we need is for big tech to do the right thing.
Cynthia Murrell, February 27, 2025
A Handy Resource: 100 AI Tools in 10 Categories
February 27, 2025
We hear a lot about the most prominent AI tools like ChatGPT, Dall-E, and Grammarly. But there are many more options designed for a wide range of tasks. Inspiration blogger Ayo-Ibidapo has rounded up "100 AI Toos for Every Need: The Ultimate List." He succinctly introduces his list by observing:
"AI is revolutionizing industries, making tasks easier, faster, and more efficient. Whether you need AI for writing, design, marketing, coding, or personal productivity, there’s a tool for you. Here’s a list of 100 AI tools categorized by their purpose."
The 10 categories include those above and more, including my favorite, "Miscellaneous and Fun." As a life-long gamer, I am drawn to AI Dungeon. I am not so sure about the face-swapping tool, Reface AI. Seems a bit creepy. I am curious whether any of the investing tools, like Alpaca, Kavout, or Trade Ideas could actually boost one’s portfolio. And I am pleased to see the esteemed Wolfram Alpha made the list in the education and research section. As for the ten entries under healthcare and wellness, I wonder: are we resigned to sharing our most intimate details with bots? Ginger AI, for mental health support, sounds non-threatening, but are there any data-grubbing details buried in its terms of service agreement?
See the post for all 100 tools. If that is not enough, check out the discussion at Battle Station, "Uncover 30,000+ AI Apps Using AITrendyTools." There’s an idea—what better to pick an AI tool than an AI tool?
Cynthia Murrell, February 27, 2025
Meta and Torrents: True, False, or Rationalization?
February 26, 2025
AIs gobble datasets for training. It is another fact that many LLMs and datasets contain biased information, are incomplete, or plain stink. One ethical but cumbersome way to train algorithms would be to notify people that their data, creative content, or other information will be used to train AI. Offering to pay for the right to use the data would be a useful step some argue.
Will this happen? Obviously not.
Why?
Because it’s sometimes easier to take instead of asking. According to Toms Hardware, “Meta Staff Torrented Nearly 82TB Of Pirated Books For AI Training-Court Records Reveal Copyright Violations.” The article explains that Meta pirated 81.7 TB of books from the shadow libraries Anna’s Archive, Z-Library, and LibGen. These books were then used to train AI models. Meta is now facing a class action lawsuit about using content from the shadow libraries.
The allegations arise from Meta employees’ written communications. Some of these messages provide insight into employees’ concern about tapping pirated materials. The employees were getting frown lines, but then some staffers’ views rotated when they concluded smart software helped people access information.
Here’s a passage from the cited article I found interesting:
“Then, in January 2023, Mark Zuckerberg himself attended a meeting where he said, “We need to move this stuff forward… we need to find a way to unblock all this.” Some three months later, a Meta employee sent a message to another one saying they were concerned about Meta IP addresses being used “to load through pirate content.” They also added, “torrenting from a corporate laptop doesn’t feel right,” followed by laughing out loud emoji. Aside from those messages, documents also revealed that the company took steps so that its infrastructure wasn’t used in these downloading and seeding operations so that the activity wouldn’t be traced back to Meta. The court documents say that this constitutes evidence of Meta’s unlawful activity, which seems like it’s taking deliberate steps to circumvent copyright laws.”
If true, the approach smacks of that suave Silicon Valley style. If false, my faith in a yacht owner with gold chains might be restored.
Whitney Grace, February 26, 2025
Innovation: It Ebbs, It Flows, It Fizzles
February 26, 2025
Many would argue humanity is nothing if not creative. If not, we would be living the way we were thousands of years ago. But, asks the Financial Times, "Is Innovation Slowing Down? With Matt Clancy." Nah—Look how innovative iPhones and Windows upgrades are.
The post presents the audio of an interview between journalist John Burn-Murdoch and economist Matt Clancy. (The transcript can be found here.) The page introduces the interview:
"Productivity growth in the developed world has been on a downward trend since the 1960s. Meanwhile, gains in life expectancy have also slowed. And yet the number of dollars and researchers dedicated to R&D grows every year. In today’s episode, the FT’s Chief Data Reporter, John Burn-Murdoch, asks whether western culture has lost its previous focus on human progress and become too risk-averse, or whether the problem is simply that the low-hanging fruit of scientific research has already been plucked. He does so in conversation with innovation economist Matt Clancy, who is the author of the New Things Under the Sun blog, and a research fellow at Open Philanthropy, a non-profit foundation based in San Francisco that provides research grants."
The pair begin by recalling a theory of economic historian Joel Mokyr, who believes a growing belief in human progress and experimentation led to the Industrial Revolution. The perspective, believes Clancy, is supported by a 2023 study that examined thousands of political and scientific books from the 1500s–1700s. That research shows a growing interest in progress during that period. Sounds plausible.
But now, we learn, innovation appears to be in decline. Research output per scientist has decreased since 1960, despite increased funding. Productivity growth and technological output are also slowing. Is this because our culture has grown less interested in invention? To hear Clancy tell it, probably not. A more likely suspect is what economist Ben Jones dubbed the Burden of Knowledge. Basically, as humanity makes discoveries that build on each other, each human scientist has more to learn before they can contribute new ideas. This also means more individual specialization and more teamwork. Of course, adding meetings to the mix slows everything down.
The economist has suggestions, like funding models that reward risk-taking. He also believes artificial intelligence will significantly speed things up. Probably—but will it send us careening down the wrong paths? AI will have to get far better at not making mistakes, or making stuff up, before we should trust it at the helm of human progress.
Cynthia Murrell, February 26, 2025
AI Research Tool from Perplexity Is Priced to Undercut the Competition
February 26, 2025
Are prices for AI-generated research too darn high? One firm thinks so. In a Temu-type bid to take over the market, reports VentureBeat, "Perplexity Just Made AI Research Crazy Cheap—What that Means for the Industry." CEO Aravind Srinivas credits open source software for making the move possible, opining that "knowledge should be universally accessible." Knowledge, yes. AI research? We are not so sure. Nevertheless, here we are. The write-up describes the difference in pricing:
"While Anthropic and OpenAI charge thousands monthly for their services, Perplexity offers five free queries daily to all users. Pro subscribers pay $20 monthly for 500 daily queries and faster processing — a price point that could force larger AI companies to explain why their services cost up to 100 times more."
Not only is Perplexity’s Deep Research cheaper than the competition, crows the post, its accuracy rivals theirs. We are told:
"[Deep Research] scored 93.9% accuracy on the SimpleQA benchmark and reached 20.5% on Humanity’s Last Exam, outperforming Google’s Gemini Thinking and other leading models. OpenAI’s Deep Research still leads with 26.6% on the same exam, but OpenAI charges $200 percent for that service. Perplexity’s ability to deliver near-enterprise level performance at consumer prices raises important questions about the AI industry’s pricing structure."
Well, okay. Not to stray too far from the point, but is a 20.5% or a 26.6% on Humanity’s Last Exam really something to brag about? Last we checked, those were failing grades. By far. Isn’t it a bit too soon to be outsourcing research to any LLM? But I digress.
We are told the low, low cost Deep Research is bringing AI to the micro-budget masses. And, soon, to the Windows-less—Perplexity is working on versions for iOS, Android, and Mac. Will this spell disaster for the competition?
Cynthia Murrell, February 26, 2025
Researchers Raise Deepseek Security Concerns
February 25, 2025
What a shock. It seems there are some privacy concerns around Deepseek. We learn from the Boston Herald, “Researchers Link Deepseek’s Blockbuster Chatbot to Chinese Telecom Banned from Doing Business in US.” Former Wall Street Journal and now AP professional Byron Tau writes:
“The website of the Chinese artificial intelligence company Deepseek, whose chatbot became the most downloaded app in the United States, has computer code that could send some user login information to a Chinese state-owned telecommunications company that has been barred from operating in the United States, security researchers say. The web login page of Deepseek’s chatbot contains heavily obfuscated computer script that when deciphered shows connections to computer infrastructure owned by China Mobile, a state-owned telecommunications company.”
If this is giving you déjà vu, dear reader, you are not alone. This scenario seems much like the uproar around TikTok and its Chinese parent company ByteDance. But it is actually worse. ByteDance’s direct connection to the Chinese government is, as of yet, merely hypothetical. China Mobile, on the other hand, is known to have direct ties to the Chinese military. We learn:
“The U.S. Federal Communications Commission unanimously denied China Mobile authority to operate in the United States in 2019, citing ‘substantial’ national security concerns about links between the company and the Chinese state. In 2021, the Biden administration also issued sanctions limiting the ability of Americans to invest in China Mobile after the Pentagon linked it to the Chinese military.”
It was Canadian cybersecurity firm Feroot Security that discovered the code. The AP then had the findings verified by two academic cybersecurity experts. Might similar code be found within TikTok? Possibly. But, as the article notes, the information users feed into Deepseek is a bit different from the data TikTok collects:
“Users are increasingly putting sensitive data into generative AI systems — everything from confidential business information to highly personal details about themselves. People are using generative AI systems for spell-checking, research and even highly personal queries and conversations. The data security risks of such technology are magnified when the platform is owned by a geopolitical adversary and could represent an intelligence goldmine for a country, experts warn.”
Interesting. But what about CapCut, the ByteDance video thing?
Cynthia Murrell, February 25, 2025
Musings on AI UI Design
February 25, 2025
The advent of AI has send UI designers back to the drawing tablet. Tech product designer and blogger Patrick Morgan considers "8 Design Breakthroughs Defining AI’s Future." As when touch-based devices became common, he asserts, design choices made now will shape the ways we interact with technology for years to come. Morgan writes:
"For the first time in over a decade, we’re facing a truly greenfield space in user experience design. There’s no playbook, no established patterns to fall back on. Even the frontier AI labs are learning through experimentation, watching to see what resonates as they introduce new ways to interact. … It’s fascinating to watch these design choices ripple across the ecosystem in real-time. When something works, competitors rush to adopt it — not out of laziness, but because we’re all collectively discovering what makes sense in this new paradigm. In this wild-west moment, new dominant patterns are emerging. Today, I want to highlight the breakthroughs that have captured my imagination the most — the design choices shaping our collective understanding of AI interaction."
The roundup include obvious choices—conversational paradigms like ChatGPT’s interface and voice input systems in general. Morgan also admires integration a la Cursor IDE and Claude Artifacts, and he
appreciates the helpful Grok button alongside content on X. He gives kudos for transparency, like Perplexity’s real-time citations and Deepseek’s process descriptions. Morgan even gives credit to MidJourney for refusing to build its own UI until it had refined its core technology. He reflects:
"These eight breakthroughs aren’t just clever UI decisions — they’re the first chapters in a new story about how humans and machines work together. Each represents a moment when someone dared to experiment, to try something unproven, and found a pattern that resonated."
Yes. And also: Ultimately, AI will be invisible—embedded and out of sight, outputting information. Interfaces undergo constant change by people with time on their hands. UI changes should not distract from the actual trajectory of smart and smarter software. Where do we stand on bias, hallucinations, privacy, and accountability? Those, we believe, are the more pertinent questions. But, sure, UI choices are nifty to observe.
Cynthia Murrell, February 25, 2025