Does Smart Software Forget?
November 21, 2024
A recent paper challenges the big dogs of AI, asking, “Does Your LLM Truly Unlearn? An Embarrassingly Simple Approach to Recover Unlearned Knowledge.” The study was performed by a team of researchers from Penn State, Harvard, and Amazon and published on research platform arXiv. True or false, it is a nifty poke in the eye for the likes of OpenAI, Google, Meta, and Microsoft, who may have overlooked the obvious. The abstract explains:
“Large language models (LLMs) have shown remarkable proficiency in generating text, benefiting from extensive training on vast textual corpora. However, LLMs may also acquire unwanted behaviors from the diverse and sensitive nature of their training data, which can include copyrighted and private content. Machine unlearning has been introduced as a viable solution to remove the influence of such problematic content without the need for costly and time-consuming retraining. This process aims to erase specific knowledge from LLMs while preserving as much model utility as possible.”
But AI firms may be fooling themselves about this method. We learn:
“Despite the effectiveness of current unlearning methods, little attention has been given to whether existing unlearning methods for LLMs truly achieve forgetting or merely hide the knowledge, which current unlearning benchmarks fail to detect. This paper reveals that applying quantization to models that have undergone unlearning can restore the ‘forgotten’ information.”
Oops. The team found as much as 83% of data thought forgotten was still there, lurking in the shadows. The paper offers a explanation for the problem and suggestions to mitigate it. The abstract concludes:
“Altogether, our study underscores a major failure in existing unlearning methods for LLMs, strongly advocating for more comprehensive and robust strategies to ensure authentic unlearning without compromising model utility.”
See the paper for all the technical details. Will the big tech firms take the researchers’ advice and improve their products? Or will they continue letting their investors and marketing departments lead them by the nose?
Cynthia Murrell, November 21, 2024
Lark Flies Home with TikTok User Data, DOJ Alleges
August 7, 2024
An Arnold’s Law of Online Content states simply: If something is online, it will be noticed, captured, analyzed, and used to achieve a goal. That is why we are unsurprised to learn, as TechSpot reports, “US Claims TikTok Collected Data on Users, then Sent it to China.” Writer Skye Jacobs reveals:
“In a filing with a federal appeals court, the Department of Justice alleges that TikTok has been collecting sensitive information about user views on socially divisive topics. The DOJ speculated that the Chinese government could use this data to sow disruption in the US and cast suspicion on its democratic processes. TikTok has made several overtures to the US to create trust in its privacy and data controls, but it has also been reported that the service at one time tracked users who watched LGBTQ content. The US Justice Department alleges that TikTok collected sensitive data on US users regarding contentious issues such as abortion, religion and gun control, raising concerns about privacy and potential manipulation by the Chinese government. This information was reportedly gathered through an internal communication tool called Lark.”
Lark is also owned by TikTok parent company ByteDance and is integrated into the app. Alongside its role as a messaging platform, Lark has apparently been collecting a lot of very personal user data and sending it home to Chinese servers. The write-up specifies some of the DOJ’s concerns:
“They warn that the Chinese government could potentially instruct ByteDance to manipulate TikTok’s algorithm to use this data to promote certain narratives or suppress others, in order to influence public opinion on social issues and undermine trust in the US’ democratic processes. Manipulating the algorithm could also be used to amplify content that aligns with Chinese state narratives, or downplay content that contradicts those narratives, thereby shaping the national conversation in a way that serves Chinese interests.”
Perhaps most concerning, the brief warns, China could direct ByteDance to use the data to “undermine trust in US democracy and exacerbate social divisions.” Yes, that tracks. Meanwhile, TikTok insists any steps our government takes against it infringe on US users’ First Amendment rights. Oh, the irony.
In the face of US government’s demand it sell off TikTok or face a ban, ByteDance has offered a couple of measures designed to alleviate concerns. So far, though, the Biden administration is standing firm.
Cynthia Murrell, August 7, 2024
Perfect for Spying, Right?
June 28, 2024
And we thought noise-cancelling headphones were nifty. The University of Washington’s UW News announces “AI Headphones Let Wearer Listen to a Single Person in a Crowd, by Looking at them Just Once.” That will be a real help for the hard-of-hearing. Also spies. Writers Stefan Milne and Kiyomi Taguchi explain:
“A University of Washington team has developed an artificial intelligence system that lets a user wearing headphones look at a person speaking for three to five seconds to ‘enroll’ them. The system, called ‘Target Speech Hearing,’ then cancels all other sounds in the environment and plays just the enrolled speaker’s voice in real time even as the listener moves around in noisy places and no longer faces the speaker. … To use the system, a person wearing off-the-shelf headphones fitted with microphones taps a button while directing their head at someone talking. The sound waves from that speaker’s voice then should reach the microphones on both sides of the headset simultaneously; there’s a 16-degree margin of error. The headphones send that signal to an on-board embedded computer, where the team’s machine learning software learns the desired speaker’s vocal patterns. The system latches onto that speaker’s voice and continues to play it back to the listener, even as the pair moves around. The system’s ability to focus on the enrolled voice improves as the speaker keeps talking, giving the system more training data.”
If the sound quality is still not satisfactory, the user can refresh enrollment to improve clarity. Though the system is not commercially available, the code used for the prototype is available for others to tinker with. It is built on last year’s “semantic hearing” research by the same team. Target Speech Hearing still has some limitations. It does not work if multiple loud voices are coming from the target’s direction, and it can only eavesdrop on, er, listen to one speaker at a time. The researchers are now working on bringing their system to earbuds and hearing aids.
Cynthia Murrell, June 28, 2024
Our Privacy Is Worth $47 It Seems
June 6, 2024
This essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.
Multimillion dollar lawsuits made on behalf of the consumer keep businesses in check. These lawsuits fight greedy companies that want to squeeze every last cent from consumers and take advantage of their ignorance. Thankfully many of these lawsuits are settled in favor of the consumers, like the Federal Trade Commission (FTC) vs. Ring. Unfortunately, the victims aren’t getting much in the form of compensation says OM in: “You Are Worth $47.”
Ring is a camera security company that allowed its contractors and employees to access users’ private data. The FTC and Ring reached a settlement in the case, resulting in $5.6 million to be given to 117,000 victims. That will be $47 per person. That amount will at least pay for a tank of gas or a meal for two in some parts of the country. It’s better than what other victims received:
“That is what your data (and perhaps your privacy) is worth — at least today. It is worth more than what T-Mobile or Experian paid as a fine per customer: $4.50 and $9, respectively. This minuscule fine is one of the reasons why companies get away with playing loose and easy with our privacy and data.”
OM is exactly right that the small compensation amounts only stirs consumers’ apathy more. What’s the point of fighting these mega conglomerates when the pay out is so small? Individuals, unless they’re backed with a boatload of money and strong sense of stubborn, righteous justice, won’t fight big businesses.
It’s the responsibility of law makers to fight these companies, but they don’t. They don’t fight for consumers because they’re either in the pocket of big businesses or they’re struck down before they even begin.
My listing is inactive and says I need approval to sell this item. I have approval to sell it.
Whitney Grace, June 6, 2024
Bugged? Hey, No One Can Get Our Data
December 22, 2023
This essay is the work of a dumb dinobaby. No smart software required.
I read “The Obscure Google Deal That Defines America’s Broken Privacy Protections.” In the cartoon below, two young people are confident that their lunch will be undisturbed. No “bugs” will chow down on their hummus, sprout sandwiches, or their information. What happens, however, is that the young picnic fans cannot perceive what is out of sight. Are these “bugs” listening? Yep. They are. 24×7.
What the young fail to perceive is that “bugs” are everywhere. These digital creatures are listening, watching, harvesting, and consuming every scrap of information. The image of the picnic evokes an experience unfolding in real time. Thanks, MSFT Copilot. My notion of “bugs” is obviously different from yours. Good enough and I am tired of finding words you can convert to useful images.
The essay explains:
While Meta, Google, and a handful of other companies subject to consent decrees are bound by at least some rules, the majority of tech companies remain unfettered by any substantial federal rules to protect the data of all their users, including some serving more than a billion people globally, such as TikTok and Apple.
The situation is simple: Major centers of techno gravity remain unregulated. Law makers, regulators, and “users” either did not understand or just believed what lobbyists told them. The senior executives of certain big firms smiled, said “Senator, thank you for that question,” and continued to build out their “bug” network. Do governments want to lose their pride of place with these firms? Nope. Why? Just reference bad actors who commit heinous acts and invoke “protect our children.” When these refrains from the techno feudal playbook sound, calls to take meaningful action become little more than a faint background hum.
But the article continues:
…there is diminishing transparency about how Google’s consent decree operates.
I think I understand. Google-type companies pretend to protect “privacy.” Who really knows? Just ask a Google professional. The answer in my experience is, “Hey, dude, I have zero idea.”
How does Wired, the voice of the techno age, conclude its write up? Here you go:
The FTC agrees that a federal privacy law is long overdue, even as it tries to make consent decrees more powerful. Samuel Levine, director of the FTC’s Bureau of Consumer Protection, says that successive privacy settlements over the years have become more limiting and more specific to account for the growing, near-constant surveillance of Americans by the technology around them. And the FTC is making every effort to enforce the settlements to the letter…
I love the “every effort.” The reality is that the handling of online data collection presages the trajectory for smart software. We live with bugs. Now those bugs can “think”, adapt, and guide. And what’s the direction in which we are now being herded? Grim, isn’t it?
Stephen E Arnold, December 23, 2023
How about Fear and Paranoia to Advance an Agenda?
December 6, 2023
This essay is the work of a dumb dinobaby. No smart software required.
I thought sex sells. I think I was wrong. Fear seems to be the barn burner at the end of 2023. And why not? We have the shadow of another global pandemic? We have wars galore. We have craziness on US air planes. We have a Cybertruck which spells the end for anyone hit by the behemoth.
I read (but did not shake like the delightful female in the illustration “AI and Mass Spying.” The author is a highly regarded “public interest technologist,” an internationally renowned security professional, and a security guru. For me, the key factoid is that he is a fellow at the Berkman Klein Center for Internet & Society at Harvard University and a lecturer in public policy at the Harvard Kennedy School. Mr. Schneier is a board member of the Electronic Frontier Foundation and the most, most interesting organization AccessNow.
Fear speaks clearly to those in retirement communities, elder care facilities, and those who are uninformed. Let’s say, “Grandma, you are going to be watched when you are in the bathroom.” Thanks, MSFT Copilot. I hope you are sending data back to Redmond today.
I don’t want to make too much of the Harvard University connection. I feel it is important to note that the esteemed educational institution got caught with its ethical pants around its ankles, not once, but twice in recent memory. The first misstep involved an ethics expert on the faculty who allegedly made up information. The second is the current hullabaloo about a whistleblower allegation. The AP slapped this headline on that report: “Harvard Muzzled Disinfo Team after $500 Million Zuckerberg Donation.” (I am tempted to mention the Harvard professor who is convinced he has discovered fungible proof of alien technology.)
So what?
The article “AI and Mass Spying” is a baffler to me. The main point of the write up strikes me as:
Summarization is something a modern generative AI system does well. Give it an hourlong meeting, and it will return a one-page summary of what was said. Ask it to search through millions of conversations and organize them by topic, and it’ll do that. Want to know who is talking about what? It’ll tell you.
I interpret the passage to mean that smart software in the hands of law enforcement, intelligence operatives, investigators in one of the badge-and-gun agencies in the US, or a cyber lawyer is really, really bad news. Smart surveillance has arrived. Smart software can process masses of data. Plus the outputs may be wrong. I think this means the sky is falling. The fear one is supposed to feel is going to be the way a chicken feels when it sees the Chik-fil-A butcher truck pull up to the barn.
Several observations:
- Let’s assume that smart software grinds through whatever information is available to something like a spying large language model. Are those engaged in law enforcement are unaware that smart software generates baloney along with the Kobe beef? Will investigators knock off the verification processes because a new system has been installed at a fusion center? The answer to these questions is, “Fear advances the agenda of using smart software for certain purposes; specifically, enforcement of rules, regulations, and laws.”
- I know that the idea that “all” information can be processed is a jazzy claim. Google made it, and those familiar with Google search results knows that Google does not even come close to all. It can barely deliver useful results from the Railway Retirement Board’s Web site. “All” covers a lot of ground, and it is unlikely that a policeware vendor will be able to do much more than process a specific collection of data believed to be related to an investigation. “All” is for fear, not illumination. Save the categorical affirmatives for the marketing collateral, please.
- The computational cost for applying smart software to large domains of data — for example, global intercepts of text messages — is fun to talk about over lunch. But the costs are quite real. Then the costs of the computational infrastructure have to be paid. Then the cost of the downstream systems and people who have to figure out if the smart software is hallucinating or delivering something useful. I would suggest that Israel’s surprise at the unhappy events in October 2023 to the present day unfolded despite the baloney for smart security software, a great intelligence apparatus, and the tons of marketing collateral handed out at law enforcement conferences. News flash: The stuff did not work.
In closing, I want to come back to fear. Exactly what is accomplished by using fear as the pointy end of the stick? Is it insecurity about smart software? Are there other messages framed in a different way to alert people to important issues?
Personally, I think fear is a low-level technique for getting one’s point across. But when those affiliated with an outfit with the ethics matter and now the payola approach to information, how about putting on the big boy pants and select a rhetorical trope that is unlikely to anything except remind people that the Covid thing could have killed us all. Err. No. And what is the agenda fear advances?
So, strike the sex sells trope. Go with fear sells.
Stephen E Arnold, December 6, 2023
Google: Privacy Is Number One?
September 19, 2023
Big tech companies like Google do not respect users’ privacy rights. Yes, these companies have privacy statements and other legal documents that state they respect individuals’ privacy but it is all smoke and mirrors. The Verge has the lowdown on a privacy lawsuit filed against Google and a judge’s recent decision: “$5 Billion Google Lawsuit Over ‘Incognito Mode’ Tracking Moves A Step Closer To Trial.”
Chasom Brown, Willian Byatt, Jeremy Davis, Christopher Castillo, and Monique Trujillo filed a class action lawsuit against Google for collecting user information while in “incognito mode.” Publicly known as Chasom Brown, et. Al v. Google, the plaintiffs seek $5 billion in damages. Google requested a summary judgment, but Judge Yvonne Gonzalez Rogers of California denied it.
Judge Gonzalez noted that statements in the Chrome privacy nonie, Privacy Policy, Incognito Splash Screen, and Search & Browse Privately Help page explains how Incognito mode limits information and how people can control what information is shared. The judge wants the court to decide if these notices act as a binding agreement between Google and users that the former would not collect users’ data when they browsed privately.
Google disputes the claims and state that every time a new incognito tab is opened, Web sites might collect user information. There are other issues the plaintiffs and judge want to discuss:
“Another issue going against Google’s arguments that the judge mentioned is that the plaintiffs have evidence Google ‘stores users’ regular and private browsing data in the same logs; it uses those mixed logs to send users personalized ads; and, even if the individual data points gathered are anonymous by themselves, when aggregated, Google can use them to ‘uniquely identify a user with a high probability of success.’’
She also responded to a Google argument that the plaintiffs didn’t suffer economic injury, writing that ‘Plaintiffs have shown that there is a market for their browsing data and Google’s alleged surreptitious collection of the data inhibited plaintiffs’ ability to participate in that market…Finally, given the nature of Google’s data collection, the Court is satisfied that money damages alone are not an adequate remedy. Injunctive relief is necessary to address Google’s ongoing collection of users’ private browsing data.’”
Will Chasom Brown, et. Al v. Google go anywhere beyond the California court? Will the rest of the United States and other countries that have a large Google market, the European Union, do anything?
Whitney Grace, September 19, 2023
Malware: The NSO Group and a Timeline
September 8, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
A flurry of NSO Group news appeared in my newsfeeds this morning. Citizen Labs issued an advisory. You can find that short item in “BLASTPASSNSO Group iPhone Zero-Click, Zero-Day Exploit Captured in the Wild.” Recorded Future, a cyber security company, published “Apple Discloses Zero-Days Linked.” Variants of these stories are percolating, including British tabloid newspapers like The Metro. One message comes through: Update your iPhones.
The information makes clear that a vulnerability “path” appears to be blocked. That’s good news. The firm which allegedly discovered the way into user mobile devices is the NSO Group. The important fact, at least for me, is that this organization opened its doors for business in 2010. The origin story, if one believes the information once can find using a free Web search engine, is that the company evolved from a mobile phone repair business. After repairing and tinkering, the founder set up a company to assist government agencies in obtaining information from mobile devices believed to be used by bad actors. Agree or disagree, the origin story is interesting.
What’s important for me is that the time between the company’s start up and the “good news” about addressing a vulnerability in certain devices has been a decade, maybe more. I don’t have an opinion about whether the time window could have been closed more quickly. What’s important to me is that the information is diffusing quickly. On one hand, that’s beneficial to those concerned about the security of their devices. On the other hand, that’s the starter’s gun for bad actors to deploy another hard-to-spot exploit.
I have several observation about this vulnerability:
- The challenge to those who create hardware and software is to realize that security issues are likely to exist. Those who discover these and exploit them, blindside the company. The developers have to reverse engineer the exploit and then figure out what their colleagues missed. Obviously this is a time consuming and difficult process. Perhaps 10 years is speedy or slow. I don’t know. But an error made many years ago can persist and affect millions of device owners.
- The bad actor acts and the company responsible for chasing down the flaw reacts. This is a cat-and-mouse game. As a result, the hardware and software developers are playing defense. The idea that a good defense is better than a good offense may not be accurate. Those initial errors are, by definition, unknown. The gap between the error and the exploit allows bad actors to do what they want. Playing defense allows the offense time to gear up something new. The “good guys” are behind the curve in this situation.
- The fact that the digital ecosystem is large means that the opportunity for mischief increases. In my lectures, I like to point out that technology yields benefits, but it also is an enabler of those who want to do mischief.
Net net: The steady increase in cyber crime and the boundary between systems and methods which are positive and negative becomes blurred. Have we entered a stage in technical development in which the blurred space between good and bad has become so large that one cannot tell what is right or wrong, correct or incorrect, appropriate or inappropriate? Are we living in a “ghost Web” or a “shadow land?”
Stephen E Arnold, September 8, 2023
India Where Regulators Actually Try or Seem to Try
August 22, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
I read “Data Act Will Make Digital Companies Handle Info under Legal Obligation.” The article reports that India’s regulators are beavering away in an attempt to construct a dam to stop certain flows of data. The write up states:
Union Minister of State for Electronics and Information Technology Rajeev Chandrasekhar on Thursday [August 17, 2023] said the Digital Personal Data Protection Act (DPDP Act) passed by Parliament recently will make digital companies handle the data of Indian citizens under absolute legal obligation.
What about certain high-technology companies operating with somewhat flexible methods? The article uses the phrase “punitive consequences of high penalty and even blocking them from operating in India.”
US companies’ legal eagles take off. Destination? India. MidJourney captures 1950s grade school textbook art quite well.
This passage caught my attention because nothing quite like it has progressed in the US:
The DPDP [Digital Personal Data Protection] Bill is aimed at giving Indian citizens a right to have his or her data protected and casts obligations on all companies, all platforms be it foreign or Indian, small or big, to ensure that the personal data of Indian citizens is handled with absolute (legal) obligation…
Will this proposed bill become law? Will certain US high-technology companies comply? I am not sure of the answer, but I have a hunch that a dust up may be coming.
Stephen E Arnold, August 22, 2023
What Does Apple Value? Money or Privacy
January 18, 2023
Ten Ways Apple Breaks its Privacy Promise to hear Apple tell it, the company makes protecting users’ privacy a top priority. While it does a better job than Google or Meta, that is not saying much. Gizmodo describes “10 Apple Privacy Problems that Might Surprise You.” Surprise? Nope, not us. Reporter Thomas Germain writes:
“Apple wants you to know that it cares about your privacy. For years, the company has emblazoned billboards with catchy slogans about its robust data protection practices, criticized tech rivals for their misuse of users’ personal information, and made big pronouncements about how it shields users. There’s no question that Apple handles your data with more care and respect than a lot of other tech companies. Unlike Google and Meta, parent company of Facebook and Instagram, Apple’s business doesn’t depend on mining and monetizing your data. But that doesn’t mean owning an iPhone spells perfect privacy. Apple harvest lots of personal information, often in ways that you might not expect if you buy into the company’s promise that ‘what happens on your iPhone, stays on your iPhone.’ It uses that information for advertising, developing new products, and more. Apple didn’t comment on the record for this story.”
Of course it didn’t. Germain describes each of the 10 privacy problems, complete with links to further reading on each one. Here are his headings: Apple appears to track you even with its own privacy settings turned off; Apple collects details about every single thing you do in the app store; A hidden map of everywhere you go; You ask your apps not to track you, but sometimes Apple lets them do it anyway; Apple collects enough data from your phone to track the people you hang out with; Apple makes iMessage less private on purpose; Targeted ads; Think your VPN hides all your data? think again; How private are your conversations with Siri?; and finally, Harvesting your music, movie and stocks data—and a whole lot more. Though none of these points actually surprise us, it is a bit startling to see them all laid out together. Navigate to the article for the details on each, including ways to lock down iDevices to the limited extent possible.
Cynthia Murrell, January 18, 2023