Meta and Torrents: True, False, or Rationalization?
February 26, 2025
AIs gobble datasets for training. It is another fact that many LLMs and datasets contain biased information, are incomplete, or plain stink. One ethical but cumbersome way to train algorithms would be to notify people that their data, creative content, or other information will be used to train AI. Offering to pay for the right to use the data would be a useful step some argue.
Will this happen? Obviously not.
Why?
Because it’s sometimes easier to take instead of asking. According to Toms Hardware, “Meta Staff Torrented Nearly 82TB Of Pirated Books For AI Training-Court Records Reveal Copyright Violations.” The article explains that Meta pirated 81.7 TB of books from the shadow libraries Anna’s Archive, Z-Library, and LibGen. These books were then used to train AI models. Meta is now facing a class action lawsuit about using content from the shadow libraries.
The allegations arise from Meta employees’ written communications. Some of these messages provide insight into employees’ concern about tapping pirated materials. The employees were getting frown lines, but then some staffers’ views rotated when they concluded smart software helped people access information.
Here’s a passage from the cited article I found interesting:
“Then, in January 2023, Mark Zuckerberg himself attended a meeting where he said, “We need to move this stuff forward… we need to find a way to unblock all this.” Some three months later, a Meta employee sent a message to another one saying they were concerned about Meta IP addresses being used “to load through pirate content.” They also added, “torrenting from a corporate laptop doesn’t feel right,” followed by laughing out loud emoji. Aside from those messages, documents also revealed that the company took steps so that its infrastructure wasn’t used in these downloading and seeding operations so that the activity wouldn’t be traced back to Meta. The court documents say that this constitutes evidence of Meta’s unlawful activity, which seems like it’s taking deliberate steps to circumvent copyright laws.”
If true, the approach smacks of that suave Silicon Valley style. If false, my faith in a yacht owner with gold chains might be restored.
Whitney Grace, February 26, 2025
AI Research Tool from Perplexity Is Priced to Undercut the Competition
February 26, 2025
Are prices for AI-generated research too darn high? One firm thinks so. In a Temu-type bid to take over the market, reports VentureBeat, "Perplexity Just Made AI Research Crazy Cheap—What that Means for the Industry." CEO Aravind Srinivas credits open source software for making the move possible, opining that "knowledge should be universally accessible." Knowledge, yes. AI research? We are not so sure. Nevertheless, here we are. The write-up describes the difference in pricing:
"While Anthropic and OpenAI charge thousands monthly for their services, Perplexity offers five free queries daily to all users. Pro subscribers pay $20 monthly for 500 daily queries and faster processing — a price point that could force larger AI companies to explain why their services cost up to 100 times more."
Not only is Perplexity’s Deep Research cheaper than the competition, crows the post, its accuracy rivals theirs. We are told:
"[Deep Research] scored 93.9% accuracy on the SimpleQA benchmark and reached 20.5% on Humanity’s Last Exam, outperforming Google’s Gemini Thinking and other leading models. OpenAI’s Deep Research still leads with 26.6% on the same exam, but OpenAI charges $200 percent for that service. Perplexity’s ability to deliver near-enterprise level performance at consumer prices raises important questions about the AI industry’s pricing structure."
Well, okay. Not to stray too far from the point, but is a 20.5% or a 26.6% on Humanity’s Last Exam really something to brag about? Last we checked, those were failing grades. By far. Isn’t it a bit too soon to be outsourcing research to any LLM? But I digress.
We are told the low, low cost Deep Research is bringing AI to the micro-budget masses. And, soon, to the Windows-less—Perplexity is working on versions for iOS, Android, and Mac. Will this spell disaster for the competition?
Cynthia Murrell, February 26, 2025
Researchers Raise Deepseek Security Concerns
February 25, 2025
What a shock. It seems there are some privacy concerns around Deepseek. We learn from the Boston Herald, “Researchers Link Deepseek’s Blockbuster Chatbot to Chinese Telecom Banned from Doing Business in US.” Former Wall Street Journal and now AP professional Byron Tau writes:
“The website of the Chinese artificial intelligence company Deepseek, whose chatbot became the most downloaded app in the United States, has computer code that could send some user login information to a Chinese state-owned telecommunications company that has been barred from operating in the United States, security researchers say. The web login page of Deepseek’s chatbot contains heavily obfuscated computer script that when deciphered shows connections to computer infrastructure owned by China Mobile, a state-owned telecommunications company.”
If this is giving you déjà vu, dear reader, you are not alone. This scenario seems much like the uproar around TikTok and its Chinese parent company ByteDance. But it is actually worse. ByteDance’s direct connection to the Chinese government is, as of yet, merely hypothetical. China Mobile, on the other hand, is known to have direct ties to the Chinese military. We learn:
“The U.S. Federal Communications Commission unanimously denied China Mobile authority to operate in the United States in 2019, citing ‘substantial’ national security concerns about links between the company and the Chinese state. In 2021, the Biden administration also issued sanctions limiting the ability of Americans to invest in China Mobile after the Pentagon linked it to the Chinese military.”
It was Canadian cybersecurity firm Feroot Security that discovered the code. The AP then had the findings verified by two academic cybersecurity experts. Might similar code be found within TikTok? Possibly. But, as the article notes, the information users feed into Deepseek is a bit different from the data TikTok collects:
“Users are increasingly putting sensitive data into generative AI systems — everything from confidential business information to highly personal details about themselves. People are using generative AI systems for spell-checking, research and even highly personal queries and conversations. The data security risks of such technology are magnified when the platform is owned by a geopolitical adversary and could represent an intelligence goldmine for a country, experts warn.”
Interesting. But what about CapCut, the ByteDance video thing?
Cynthia Murrell, February 25, 2025
Content Injection Can Have Unanticipated Consequences
February 24, 2025
The work of a real, live dinobaby. Sorry, no smart software involved. Whuff, whuff. That’s the sound of my swishing dino tail. Whuff.
Years ago I gave a lecture to a group of Swedish government specialists affiliated with the Forestry Unit. My topic was the procedure for causing certain common algorithms used for text processing to increase the noise in their procedures. The idea was to input certain types of text and numeric data in a specific way. (No, I will not disclose the methods in this free blog post, but if you have a certain profile, perhaps something can be arranged by writing benkent2020 at yahoo dot com. If not, well, that’s life.)
We focused on a handful of methods widely used in what now is called “artificial intelligence.” Keep in mind that most of the procedures are not new. There are some flips and fancy dancing introduced by individual teams, but the math is not invented by TikTok teens.
In my lecture, the forestry professionals wondered if these methods could be used to achieve specific objectives or “ends”. The answer was and remains, “Yes.” The idea is simple. Once methods are put in place, the algorithms chug along, some are brute force and others are probabilistic. Either way, content and data injections can be shaped, just like the gizmos required to make kinetic events occur.
The point of this forestry excursion is to make clear that a group of people, operating in a loosely coordinated manner can create data or content. Those data or content can be weaponized. When ingested by or injected into a content processing flow, the outputs of the larger system can be fiddled: More emphasis here, a little less accuracy there, and an erosion of whatever “accuracy” calculations are used to keep the system within the engineers’ and designers’ parameters. A plebian way to describe the goal: Disinformation or accuracy erosion.
I read “Meet the Journalists Training AI Models for Meta and OpenAI.” The write up explains that journalists without jobs or in search of extra income are creating “content” for smart software companies. The idea is that if one just does the Silicon Valley thing and sucks down any and all content, lawyers might come calling. Therefore, paying for “real” information is a better path.
Please, read the original article to get a sense of who is doing the writing, what baggage or mind set these people might bring to their work.
If the content is distorted — either intentionally or unintentionally — the impact of these content objects on the larger smart software system might have some interesting consequences. I just wanted to point out that weaponized information can have an impact. Those running smart software and buying content assuming it is just fine, might find some interesting consequences in the outputs.
Stephen E Arnold, February 24, 2025
AI Worriers, Play Some Bing Crosby Music
February 24, 2025
This blog post is the work of a real-live dinobaby. No smart software involved.
The Guardian newspaper ran an interesting write up about smart software and the inevitability of complaining to stop it in its tracks. “I Met the Godfathers of AI in Paris – Here’s What They Told Me to Really Worry About.” I am not sure what’s being taught in British schools, but the headline features the author, a split infinitive, and the infamous “ending a sentence with a preposition” fillip. Very sporty.
The write up includes quotes from the godfathers:
“It’s not today’s AI we need to worry about, it’s next year’s,” Tegmark told me. “It’s like if you were interviewing me in 1942, and you asked me: ‘Why aren’t people worried about a nuclear arms race?’ Except they think they are in an arms race, but it’s actually a suicide race.”
I am not sure what psychologists call worrying about the future. Bing Crosby took a different approach. He sang, “Don’t worry about tomorrow” and offered:
Why should we cling to some old faded thing
That used to be
Bing looked beyond the present but did not seem unduly worried. The Guardian is a bit more up tight.
The write up says:
The idea that we, on Earth, might lose control of an AGI that then turns on us sounds like science fiction – but is it really so far-fetched considering the exponential growth of AI development? As Bengio [an AI godfather, according to the Guardian] pointed out, some of the most advanced AI models have already attempted to deceive human programmers during testing, both in pursuit of their designated objectives and to escape being deleted or replaced with an update.
I circled this passage:
It seems as if we have a shrinking opportunity to lay down the incentives for companies to create the kind of AI that actually benefits our individual and collective lives: sustainable, inclusive, democracy-compatible, controlled. And beyond regulation, “to make sure there is a culture of participation embedded in AI development in general”, as Eloïse Gabadou, a consultant to the OECD on technology and democracy, put it.
Okay, so what’s the fix? Who implements the fix? Will the fix stop British universities in Manchester, Cambridge, and Oxford among others from teaching about AI or stop researchers from fiddling with snappier methods? Will the Mayor of London shut down the DeepMind outfit?
Nope. I am delighted that some people are talking about smart software. However, in the high tech world in which we love, I want to remind the Guardian, the last train for Chippenham has left the station. Too late, old chap. Learn to play Bing’s song. Chill.
Stephen E Arnold, February 24, 2025
Advice for Programmers: AI-Proof Your Career
February 24, 2025
Software engineer and blogger Sean Goedecke has some career advice for those who, like himself, are at risk of losing their programming jobs to AI. He counsels, "To Avoid Being Replaced by LLMs, Do What They Can’t." Logical enough. But what will these tools be able to do, and when will they be able to do it? That is the $25 million question. Goedecke has suggestions for the medium term, and the long term.
Right now, he advises, engineers should do three things: First, use the tools. They can help you gain an advantage in the field. And also, know-thine-enemy, perhaps? Next, learn how LLMs work so you can transition to the growing field of AI work. If you can’t beat them, join them, we suppose. Finally, climb the ranks posthaste, for those in junior roles will be the first to go. Ah yes, the weak get eaten. It is a multipronged approach.
For the medium term, Goedecke predicts which skills LLMs are likely to master first. Get good at the opposite of that. For example, ill-defined or poorly-scoped problems, solutions that are hard to verify, and projects with huge volumes of code are all very difficult for algorithms. For now.
In the long term, work yourself into a position of responsibility. There are few of those to go around. So, as noted above, start vigorously climbing over your colleagues now. Why? Because executives will always need at least one good human engineer they can trust. The post observes:
"A LLM strong enough to take responsibility – that is, to make commitments and be trusted by management – would have to be much, much more powerful than a strong engineer. Why? Because a LLM has no skin in the game, which means the normal mechanisms of trust can’t apply. Executives trust engineers because they know those engineers will experience unpleasant consequences if they get it wrong. Because the engineer is putting something on the line (e.g. their next bonus, or promotion, or in the extreme case being fired), the executive can believe in the strength of their commitment. A LLM has nothing to put on the line, so trust has to be built purely on their track record, which is harder and takes more time. In the long run, when almost every engineer has been replaced by LLMs, all companies will still have at least one engineer around to babysit the LLMs and to launder their promises and plans into human-legible commitments. Perhaps that engineer will eventually be replaced, if the LLMs are good enough. But they’ll be the last to go."
If you are lucky, it will be time to retire by then. For those young enough that this is unlikely, or for those who do not excel at the rat race, perhaps a career change is in order. What jobs are safe? Sadly, this dino-baby writer does not have the answer to that question.
Cynthia Murrell, February 24, 2025
OpenAI Furthers Great Research
February 21, 2025
Unsatisfied with existing AI cheating solutions? If so, Gizmodo has good news for you: “OpenAI’s ‘Deep Research’ Gives Students a Whole New Way to Cheat on Papers.” Writer Kyle Barr explains:
“OpenAI’s new ‘Deep Research’ tool seems perfectly designed to help students fake their way through a term paper unless asked to cite sources that don’t include Wikipedia. OpenAI’s new feature, built on top of its upcoming o3 model and released on Sunday, resembles one Google introduced late last year with Gemini 2.0. Google’s ‘Deep Research’ is supposed to generate long-form reports over the course of 30 minutes or more, depending on the depth of the requested topic. Boiled down, Google’s and OpenAI’s tools are AI agents capable of performing multiple internet searches while reasoning about the next step to generate a report.”
Deep Research even functions in a side panel, providing updates on its direction and progress. So helpful! However, the tool is not for those looking to score an A. Like a student rushing to finish a paper the old-fashioned way, Barr notes, it relies heavily on Wikipedia. An example report did include a few trusted sites, like Pew Research, but such reliable sources were in the minority. Besides, the write-up emphasizes:
“Remember, this is just a bot scraping the internet, so it won’t be accessing any non-digitized books or—ostensibly—any content locked behind a paywall. … Because it’s essentially an auto-Googling machine, the AI likely won’t have access to the most up-to-date and large-scale surveys from major analysis firms. … That’s not to say the information was inaccurate, but anybody who generates a report is at the mercy of suspect data and the AI’s interpretation of that data.”
Meh, we suppose that is okay if one just needs a C to get by. But is it worth the $200 per month subscription? I suppose that depends on the student, and the parents willingness to sign up for services that will make gentle Ben and charming Chrissie smarter. Besides, we are sure more refined versions are in our future.
Cynthia Murrell, February 21, 2025
Gemini, the Couch Potato, Watches YouTube
February 21, 2025
Have you ever told yourself that you have too many YouTube videos to watch? Now you can save time by using Gemini AI to watch them for you. What is Gemini AI? According to Make Use Of, the algorithm can “Gemini Can Now Watch YouTube Videos And Save Hours Of Time.”
Google recently uploaded a new an update to its Gemini AI that allows users to catch up YouTube videos without having to actually watch them. The new feature is a marvelous advancement! The new addition to Gemini 2.0 Flash will watch the video then it can answer questions or provide a summary of it. Google users can access Gemini through the Gemini site or the smartphone app. It’s also available for free without the Gemini Advanced subscription.
To access video watching feature, users must select the 2.0 Flash Thinking Experimental with apps model from the sidebar.
Here’s how the cited article’s author used Gemini:
… I came across a YouTube video about eight travel tips for Las Vegas. Instead of watching the entire video, I simply asked Gemini, “What are the eight travel tips in this video?” Gemini then processed the video and provided a concise summary of the travel tips. I also had Gemini summarize a video on changing a windshield wiper on a Honda CR-V, a chore I needed to complete. The results were simple and easy to understand, allowing me to glance at my iPhone screen instead of constantly stopping and starting the video during the process. The easiest way to grab a YouTube link is through your web browser or the Share button under the video.”
YouTube videos can be long and boring. Gemini condenses the information into digestible and quick to read bits. It’s an awesome tool, but if Gemini watches a video does it count as a view for advertising? Will Gemini put on a few pounds snacking on Pringles?
Whitney Grace, February 21, 2025
What Do Gamers Know about AI? Nothing, Nothing at All
February 20, 2025
Take-Two Games CEO says, "There’s no such thing" as AI.
Is the head of a major gaming publisher using semantics to downplay the role of generative AI in his industry? PC Gamer reports, "Take-Two CEO Strauss Zelnick Takes a Moment to Remind Us Once Again that ‘There’s No Such Thing’ as Artificial Intelligence." Writer Andy Chalk quotes Strauss’ from a recent GamesIndustry interview:
"Artificial intelligence is an oxymoron, there’s no such thing. Machine learning, machines don’t learn. Those are convenient ways to explain to human beings what looks like magic. The bottom line is that these are digital tools and we’ve used digital tools forever. I have no doubt that what is considered AI today will help make our business more efficient and help us do better work, but it won’t reduce employment. To the contrary, the history of digital technology is that technology increases employment, increases productivity, increases GDP and I think that’s what’s going to happen with AI. I think the videogame business will probably be on the leading, if not bleeding, edge of using AI."
So AI, which does not exist, will actually create jobs instead of eliminate them? The write-up correctly notes the evidence points to the contrary. On the other hand, Strauss seems clear-eyed on the topic of copyright violations. AI-on-AI violations, anyway. We learn:
"That’s a mess Zelnick seems eager to avoid. ‘In terms of [AI] guardrails, if you mean not infringing on other people’s intellectual property by poaching their LLMs, yeah, we’re not going to do that,’ he said. ‘Moreover, if we did, we couldn’t protect that, we wouldn’t be able to protect our own IP. So of course, we’re mindful of what technology we use to make sure that it respects others’ intellectual property and allows us to protect our own.’"
Perhaps Strauss is on to something. It is true that generative AI is just another digital tool—albeit one that tends to put humans out of work. But as we know, hype is more important than reality for those chasing instant fame and riches.
Cynthia Murrell, February 20, 2025
Smart Software and Law Firms: Realities Collide
February 19, 2025
This blog post is the work of a real-live dinobaby. No smart software involved.
TechCrunch published “Legal Tech Startup Luminance, Backed by the Late Mike Lynch, Raises $75 Million.” Good news for Luminance. Now the company just needs to ring the bell for those putting up the money. The write up says:
Claiming to be capable of highly accurate interrogation of legal issues and contracts, Luminance has raised $75 million in a Series C funding round led by Point72 Private Investments. The round is notable because it’s one of the largest capital raises by a pure-play legal AI company in the U.K. and Europe. The company says it has raised over $115 million in the last 12 months, and $165 million in total. Luminance was originally developed by Cambridge-based academics Adam Guthrie (founder and chief technical architect) and Dr. Graham Sills (founder and director of AI).
Why is Luminance different? The method is similar to that used by Deepseek. With concerns about the cost of AI, a method which might be less expensive to get up and keep running seems like a good bet.
However, Eudia has raised $105 million with backing from people familiar with Relativity’s legal business. Law dot com suggests that Eudia will streamline legal business processes.
The article “Massive Law Firm Gets Caught Hallucinating Cases” offers an interesting anecdote about a large law firm’s facing sanctions. What did the big boys and girls at the law firm do? Those hard working Type A professionals cited nine cases to support an argument. There is just one trivial issue perplexing the senior partners. Eight of those cases were “nonexistent.” That means made up, invented, and spot out by a nifty black box of probabilities and their methods.
I am no lawyer. I did work as an expert witness and picked up some insight about the thought processes of big time lawyers. My observations may not apply to the esteemed organizations to which I linked in this short essay, but I will assume that I am close enough for horseshoes.
- Partners want big pay and juicy bonuses. If AI can help reduce costs and add protein powder to the compensation package, AI is definitely a go-to technology to use.
- Lawyers who are very busy all of the billable time and then some want to be more efficient. The hyperbole swirling around AI makes it clear that using an AI is a productivity booster. Do lawyers have time to check what the AI system did? Nope. Therefore, hallucination is going to be part of the transformer-based methodologies until something better becomes feasible. (Did someone say, “Quantum computers?)
- The marketers (both directly compensated and the social media remoras) identify a positive. Then that upside is gilded like Tzar Nicholas’ powder room and repeated until it sure seems true.
The reality for the investors is that AI could be a winner. Go for it. The reality is for the lawyers that time to figure out what’s in bounds and what’s out of bounds is unlikely to be available. Other professionals will discover what the cancer docs did when using the late, great IBM Watson. AI can do some things reasonably well. Other things can have severe consequences.
Stephen E Arnold, February 19, 2025