AI Crawlers Are Bullying Open Source: Stop Grousing and Go Away

April 25, 2025

AI algorithms are built on open source technology. Unfortunately generative AI is harming its mother code explains TechDirt: “AI Crawlers Are Harming Wikimedia, Bringing Open Source Sites To Their Knees, And Putting The Open Web At Risk.” To make generative AI work you need a lot of computer power, smart coding, and mounds of training data. Money can buy coding and power, but (quality) training data is incredibly difficult to obtain.

AI crawlers were unleashed on the Internet to scrap information and use it for training models. The biggest information providers for crawlers are Wikimedia projects and it’s a big problem. Wikimedia, which claims to be “the largest collection of open knowledge in the world,” says most of its traffic is from crawlers and it is eating into costs:

“Since January 2024, we have seen the bandwidth used for downloading multimedia content grow by 50%. This increase is not coming from human readers, but largely from automated programs that scrape the Wikimedia Commons image catalog of openly licensed images to feed images to AI models. Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.”

This is bad because it is straining the Wikimedia datacenter and budgetary resources. Wikimedia isn’t the only information source feeling the burn from AI crawlers. News sites and more are being wrung by crawlers for every decimal of information:

“It’s increasingly clear that the reckless and selfish way in which AI crawlers are being deployed by companies eager to tap into today’s AI hype is bringing many sites around the Internet to their knees. As a result, AI crawlers are beginning to threaten the open Web itself, and thus the frictionless access to knowledge that it has provided to general users for the last 30 years.”

Silicon Valley might have good intentions but dollars are more important. (Oh, I am not sure about the “good intentions.”)

Whitney Grace, April 25, 2025

Zuckerberg Wants WhatsApp To Compete With Telegram

April 24, 2025

After 13 years of just borrowing Telegram’s innovations, the Zucker wants to compete with Telegram. (Wasn’t Pavel Durov arrested?)

Mark Zuckerberg is ready to bring WhatsApp to the messaging race and he plans to give Telegram and Signal a run for their money. Life Hacker posted a press release about the updates to the message app: “WhatsApp Just Announced a Dozen New Features.”

Group chats are getting a major overhaul. There will be an indicator that shows who has WhatsApp open in real time. This will allow users to see how many people are active on a threat. There will also be a “Notify for” section in group chat settings for managing thread notifications and there will be a “Highlights” option to limit what alerts users. The option to create events will be extended to one-on-one chats. Apple iPhone users get the exclusive update of a built-in document scanner and WhatsApp can now be set as the default message app.

Calls have been updated too:

You’ll notice three new features when placing calls. On iOS, you can pinch to zoom when on a video call. This works on both your video feed, as well as the feed of the person you’re talking to…You can now add a friend to a one-on-one call by swiping over to their chat, tapping the call button, and choose "Add to call.”…Finally, WhatsApp says they’ve upgraded their video call tech, optimizing the routing system and boosting bandwidth detection.”

Updates will has some important changes:

“There are also three changes to the Updates tab: Channel admins can record and post videos to their followers directly from the app (though these videos need to be 60 seconds or less). You can also see a transcription of voice messages updates in channels, and channel admins can share QR codes to link to the channel.”

Why not implement the live video, the crypto wallet, and the bots? Oh, right. Those are harder to emulate.

Whitney Grace, April 24, 2025

Microsoft and Its Modern Management Method: Waffling

April 23, 2025

dino orange_thumb_thumb_thumb_thumbNo AI, just the dinobaby himself.

The Harvard Business School (which I assume will remain open for “business”) has not addressed its case writers to focus on Microsoft’s modern management method. To me, changing direction is not a pivot; it is a variant of waffling. “Waffling” means saying one thing like “We love OpenAI.” Then hiring people who don’t love OpenAI and cutting deals with other AI outfits. The whipped cream on the waffle is killing off investments in data centers.

If you are not following this, think of the old song “The first time is the last time,” and you might get a sense of the confusion that results from changes in strategic and tactical direction. You may find this GenX, Y and Z approach just fine. I think it is a hoot.

PC Gamer, definitely not the Harvard Business Review, tackles one example of Microsoft’s waffling in “Microsoft Pulls Out of Two Big Data Centre Deals Because It Reportedly Doesn’t Want to Support More OpenAI Training Workloads.”

The write up says:

Microsoft has pulled out of deals to lease its data centres for additional training of OpenAI’s language model ChatGPT. This news seems surprising given the perceived popularity of the model, but the field of AI technology is a contentious one, for a lot of good reasons. The combination of high running cost, relatively low returns, and increasing competition—plus working on it’s own sickening AI-made Quake 2 demo—have proven enough reason for Microsoft to bow out of two gigawatt worth of projects across the US and Europe.

I love the scholarly “sickening.” Listen up, HBR editors. That’s a management term for 2025.

The article adds:

Microsoft, as well as its investors, have witnessed this relatively slow payoff alongside the rise of competitor models such as China’s Deepseek.

Yep, “payoff.” The Harvard Business School’s professors are probably not familiar with the concept of a payoff.

The news report points out that Microsoft is definitely, 100 percent going to spend $80 billion on infrastructure in 2025. With eight months left in the year, the Softies have to get in gear. The Google is spending as well. The other big time high tech AI juggernauts are also spending.

Will these investments payoff? Sure. Accountants and chief financial officers learn how to perform number magic. Guess where? Schools like the HBS. Don’t waffle. Go to class. Learn and then implement big time waffling.

Stephen E Arnold, April 23, 2025

ArXiv: Will Other Smart Software Systems Get “Free” Access? Yeah, Sure

April 21, 2025

dino orangeBelieve it or not, no smart software. Just a dumb and skeptical dinobaby.

Before commenting on Cornell University’s apparent shift  of the ArXiv service to the Google Cloud, let me point you to this page:

image

The page was updated 15 years ago. Now check out the access to

NCSTRL, the Networked Computer Science Technical Reference Library.

CoRR, the Computing Research Repository.

The Open Archives Initiative.

ETRDL, the ERCIM Technical Reference Digital Library.

Cornell University Library Historical Math Book Collection

Cornell University Library Making of America Collection

Hein online Retrospective Law Journals

Yep, 404s, some content behind paywalls, and other data just disappeared because Bing, Google, and Yandex don’t index certain information no matter what people believe or the marketers say.

This orphaned Cornell University Dienst service has “gorged out”; that is, jumped off a bridge to the rocks below. The act is something students know about but the admissions department seems to not be aware of the bound phrase.

I read “Careers at ArXiv.” The post seems to say to me, “We are moving the ArXiv “gray” papers to Google Cloud. Here’s a snippet of the “career” advertisement / news announcement:

We are already underway on the arXiv CE ("Cloud Edition") project. This is a project to re-home all arXiv services from VMs at Cornell to a cloud provider (Google Cloud). There are a number of reasons for this transition, including improving arXiv’s scalability while modernizing our infrastructure. This will not be a simple port of the existing arXiv code base because this project will:

  • replace the portion of our backends still written in perl and PHP
  • re-architect our article processing to be fully asynchronous, and provide better insight into the processing workflows
  • containerize all, or nearly all arXiv services so we can deploy via Kubernetes or services like Google Cloud Run
  • improve our monitoring and logging facilities so we can more quickly identify and manage production issues with arxiv.org
  • create a robust CI/CD pipeline to give us more confidence that changes we deploy will not cause services to regress

The cloud transition is a pre-requisite to modernizing arXiv as a service. The modernization will enable: – arXiv to expand the subject areas that we cover – improve the metadata we collect and make available for articles, adding fields that the research community has requested such as funder identification – deal with the problem of ambiguous author identities – improve accessibility to support users with impairments, particularly visual impairments – improve usability for the entire arXiv community.

I know Google is into “free.” The company is giving college students its quantumly supreme smart software for absolutely nothing. Maybe a Google account will be required? Maybe the Chrome browser may be needed to give those knowledge hungry college students the best experience possible? Maybe Google’s beacons, bugs, and cookies will be the students’ constant companions? Yeah, maybe.

But will ArXiv exist in the future? Will Google’s hungry knowledge munchers chew through the data and then pull a Dienst maneuver?

As a dinobaby, I liked the ArXiv service, but I also liked the Dienst math repository before it became unfindable.

It seems to me that Cornell University is:

  1. Saving money at the library and maybe the Theory Center
  2. Avoiding future legal dust ups about access to content which to some government professionals may reveal information to America’s adversaries
  3. Intentionally or inadvertently giving the Google control over knowledge flow related to matters of technical and competitive interest to everyone’s favorite online advertising company
  4. Running a variation of its Dienst game plan.

But I am a dinobaby, and I know zero about Cornell other than the “gorging out” approach to termination. I know even less about the blue chip consulting type thinking in which the Google engages. I don’t even know if I agree that Google’s recent court loss is really a “win” for the Google.

But the future of the ArXiv? Hey, where is that bridge? Do some students jump, fall, or get pushed to their death on the rocks below?

PS. In case your German is rusty “dienst” means duty and possibly “a position of authority” like a leader at Google.

Stephen E Arnold, April xx, 2025

YouTube Click Count Floors Creators

April 18, 2025

Content creators are not thrilled about a change in how YouTube counts views for short-form videos. The Google-owned site now tallies a view any time the short starts, regardless of how long it plays before the user scrolls on past. Digiday reports, “YouTube Shorts View Count Update Wins Over Brands—But Creators Aren’t Sold.” Though view counts have spiked since the change, that number has nothing to do with creators’ compensation. Any bragging rights from high view counts will surely be negated as word spreads on how their calculation changed. Besides, say seasoned creators, there could be a real downside for newbies. Reporter Ivy Liu writes:

Other creators said that they were worried the change could encourage YouTubers to focus on the inflated view metric displayed beneath Shorts, rather than the engaged view metric that contributes more meaningfully to creators’ income. For example, the creator BnG Refining — who goes by the name ‘Scrooge’ to his audience and asked not to be quoted by his real name — said that he was afraid less experienced creators might ‘flood the platform with content that they think is wanted, and not until hours, days, weeks later realizing that those were only fake views.’”

We are sure Google does not mind, though. Creators were not the real audience for the change. We learn:

“Brands and marketers are far more welcoming of the update, saying it brings order to the chaos of influencer marketing. Now, YouTube Shorts, TikTok videos and Instagram Reels all measure their views in the same way, making it easier for marketers to compare creators’ and videos’ performance across platforms. ‘It makes it easier, if you’re a brand, to say, “here’s how performance is across the board,” vs. looking at impressions and then trying to judge an impression as a view,’ said Krishna Subramanian, CEO of the influencer marketing company Captiv8.”

Of course. Because it is all about making it easier for brands to calculate their ROI. Creators’ perspectives, information, and artistic expression are secondary. As usual, creators are at the mercy of Google. Google likes everyone to be at its mercy. No meaningful regulation is the best regulation. Self regulation works wonders in the financial services sector too.

Cynthia Murrell, April 18, 2025

Why Is Meta Experimenting With AI To Write Comments?

April 18, 2025

Who knows why Meta does anything original? Amazon uses AI to write snapshots of book series. Therefore, Meta is using AI to write comments. We were not surprised to read “Meta Is Experimenting With AI-Generated Comments, For Some Reason."

Meta is using AI to write Instagram comments. It sounds like a very stupid idea, but Meta is doing it. Some Instagram accounts can see a new icon to the left of the text field after choosing to leave a comment. The icon is a pencil with a star. When the icon is tapped, a new Meta AI menu pops up, and offers a selection of comment choices. These comments are presumed to be based off whatever content the comment corresponds to in the post.

It doesn’t take much effort to write a simple Instagram comment, but offloading the task appears to take more effort than completing the task yourself. Plus, Instagram is already plagued with chatbot comments already. Does it need more? Nope.

Here’s what the author Jake Peterson requests of his readers:

“Writing comments isn’t hard, and yet, someone at Meta thought there was a usefulness—a market—for AI-generated comments. They probably want more training data for their AI machine, which tracks, considering companies are running out of internet for models to learn from. But that doesn’t mean we should be okay with outsourcing all human tasks to AI.

Mr. Peterson suggest that what bugs him the most is users happily allowing hallucinating software to perform cognitive tasks and make decision for people like me. Right on, Mr. Peterson.

Whitney Grace, April 18, 2025

Trust: Zuck, Meta, and Llama 4

April 17, 2025

dino orange_thumb_thumbSorry, no AI used to create this item.

CNET published a very nice article that says to me: “Hey, we don’t trust you.” Navigate to “Meta Llama 4 Benchmarking Confusion: How Good Are the New AI Models?” The write up is like a wimpy version of the old PC Perspective podcast with Ryan Shrout. Before the embrace of Intel’s intellectual blanket, the podcast would raise questions about video card benchmarks. Most of the questions addressed: “Is this video card that fast?” In some cases, yes, the video card benchmarks were close to the real world. In other cases, video card manufacturers did what the butcher on Knoxville Avenue did in 1951. Mr. Wilson put his thumb on the scale. My grandmother watched friendly Mr. Wilson who drove a new Buick in a very, very modest neighborhood, closely. He did not smile as broadly when my grandmother and I would enter the store for a chicken.

image

Would someone put an AI professional benchmarked to this type of test? Of course not. But the idea has a certain charm. Plus, if the person dies, he was fooling. If the person survives, that individual is definitely a witch. This was a winner method to some enlightened leaders at one time.

The CNET story says about the Zuck’s most recent non-virtual reality investment:

Meta’s Llama 4 models Maverick and Scout are out now, but they might not be the best models on the market.

That’s a good way to say, “Liar, liar, pants on fire.”

The article adds:

the model that Meta actually submitted to the LMArena tests is not the model that is available for people to use now. The model submitted for testing is called “llama-4-maverick-03-26-experimental.” In a footnote on a chart on Llama’s website (not the announcement), in tiny font in the final bullet point, Meta clarifies that the model submitted to LMArena was ‘optimized for conversationality.”

Isn’t this a GenZ way to say, “You put your thumb on the scale, Mr. Wilson”?

Let’s review why one should think about the desire to make something better than it is:

  1. Meta’s decision is just marketing. Think about the self driving Teslas. Consequences? Not for fibbing.
  2. The Meta engineers have to deliver good news. Who wants to tell the Zuck that the Llama innovations are like making the VR thing a big winner? Answer: No one who wants to get a bonus and curry favor.
  3. Meta does not have the ability to distinguish good from bad. The model swap is what Meta is going to do anyway. So why not just use it? No big deal. Is this a moral and ethical dead zone?

What’s interesting is that from my point of view, Meta and the Zuck have a standard operating procedure. I am not sure that aligns with what some people expect. But as long as the revenue flows and meaningful regulation of social media remains a windmill for today’s Don Quixotes, Meta is the best — until another AI leader puts out a quantumly supreme news release.

Stephen E Arnold, April 17, 2025

Google AI: Invention Is the PR Game

April 17, 2025

Google was so excited to tout its AI’s great achievement: In under 48 hours, It solved a medical problem that vexed human researchers for a decade. Great! Just one hitch. As Pivot to AI tells us, "Google Co-Scientist AI Cracks Superbug Problem in Two Days!—Because It Had Been Fed the Team’s Previous Paper with the Answer In It." With that detail, the feat seems much less impressive. In fact, two days seems downright sluggish. Writer David Gerard reports:

"The hype cycle for Google’s fabulous new AI Co-Scientist tool, based on the Gemini LLM, includes a BBC headline about how José Penadés’ team at Imperial College asked the tool about a problem he’d been working on for years — and it solved it in less than 48 hours! [BBC; Google] Penadés works on the evolution of drug-resistant bacteria. Co-Scientist suggested the bacteria might be hijacking fragments of DNA from bacteriophages. The team said that if they’d had this hypothesis at the start, it would have saved years of work. Sounds almost too good to be true! Because it is. It turns out Co-Scientist had been fed a 2023 paper by Penadés’ team that included a version of the hypothesis. The BBC coverage failed to mention this bit. [New Scientist, archive]"

It seems this type of Googley AI over-brag is a pattern. Gerard notes the company claims Co-Scientist identified new drugs for liver fibrosis, but those drugs had already been studied for this use. By humans. He also reminds us of this bit of truth-stretching from 2023:

"Google loudly publicized how DeepMind had synthesized 43 ‘new materials’ — but studies in 2024 showed that none of the materials was actually new, and that only 3 of 58 syntheses were even successful. [APS; ChemrXiv]"

So the next time Google crows about an AI achievement, we have to keep in mind that AI often is a synonym for PR.

Cynthia Murrell, April 17, 2026

AI Impacts Jobs: But Just 40 Percent of Them

April 16, 2025

AI enthusiasts would have us believe workers have nothing to fear from the technology. In fact, they gush, AI will only make our jobs easier by taking over repetitive tasks and allowing time for our creative juices to flow. It is a nice vision. Far-fetched, but nice. Euronews reports, “AI Could Impact 40 Percent of Jobs Worldwide in the Next Decade, UN Agency Warns.” Writer Anna Desmarais cites a recent report as she tells us:

“Artificial intelligence (AI) may impact 40 per cent of jobs worldwide, which could mean overall productivity growth but many could lose their jobs, a new report from the United Nations Department of Trade and Development (UNCTAD) has found. The report … says that AI could impact jobs in four main ways: either by replacing or complementing human work, deepening automation, and possibly creating new jobs, such as in AI research or development.”

So it sounds like we could possibly reach a sort of net-zero on jobs. However, it will take deliberate action to get there. And we are not currently pointed in the right direction:

“A handful of companies that control the world’s advancement in AI ‘often favour capital over labour,’ the report continues, which means there is a risk that AI ‘reduces the competitive advantage’ of low-cost labour from developing countries. Rebeca Grynspan, UCTAD’s Secretary-General, said in a statement that there needs to be stronger international cooperation to shift the focus away ‘from technology to people’.”

Oh, is that all? Easy peasy. The post notes it is not just information workers under threat—when combined with other systems, AI can also perform physical production jobs. Desmarais concludes:

“The impact that AI is going to have on the labour force depends on how automation, augmentation, and new positions interact. The UNCTAD said developing countries need to invest in reliable internet connections, making high-quality data sets available to train AI systems and building education systems that give them necessary digital skills, the report added. To do this, UNCTAD recommends building a shared global facility that would share AI tools and computing power equitably between nations.”

Will big tech and agencies around the world pull together to make it happen?

Cynthia Murrell, April 16, 2025

Google Wears a Necklace and Sneakers with Flashing Blue LEDs. Snazzy.

April 15, 2025

dino orangeNo AI. Just an old dinobaby pointing out some exciting developments in the world “beyond search.”

I can still see the flashing blue light in Aisle 7. Yes, there goes the siren. K-Mart in Central Illinois was running a big sale on underwear. My mother loved those “blue light specials.” She would tell me as I covered my eyes and ears, “I don’t want to miss out.” Into the scrum she would go, emerging with two packages of purple boxer shorts for my father. He sat in the car while my mother shopped. I accompanied her because that’s what sons in Central Illinois do. I wonder if procurement officials are familiar with blue light specials. The sirens in DC wail 24×7.

image

Thanks, OpenAI. You produced a good enough illustration. A first!

I thought about K-Mart when I read “Google Slashes Business Software Prices for US Federal Agencies.” I see that flickering blue light as I type this short blog post. The trusted “real” news source reports:

Google will offer steep discounts to U.S. federal agencies for its business apps package as the company looks to capitalize on the Trump administration’s cost-cutting push and chip away at Microsoft’s longstanding grip on the government software market.

Yep, discounts. Now Microsoft has some traction in the US government. I cannot imagine what life would be like for aides to a senior Pentagon if he did not have nifty PowerPoint presentations. Perhaps offering a deal will get some Microsoft afficionados to learn to live without Excel and Word? I don’t know, but Google is giving the “discount” method a whirl.

What’s up with Google? I think someone told me that Gemini 2.5 was free. Now a discount on GSA listed services which could amount to $2 billion in savings … if — yes, that magic word — if the US government dumps the Softies’ outstanding products for the cloudy goodness of the Google’s way. Yep, “if.”

I have a cute anecdote about Google and the US government from the year 2000, but, alas, I cannot share it. Trust me. It is a knee slapper. And, no, it is not about Sergey wearing silver sparkle sneakers to meetings with US elected officials. Those were indeed eye catchers among shoes with toes that looked like potatoes.

Several observations:

  1. Google, like Amazon, is trying to obtain US government business. I think the flashing blue lights, if I were still working in the hallowed halls, would impair my vision. Price cutting seems to be the one true way right now.
  2. Will lower prices have an impact on US government procurement? I am not sure. The procurement process chugs along every day and in quite predictable ways. How long does it take to turn a battleship, assuming the captain can pull off the maneuver without striking a small fishing boat, of course.
  3. Google seems to think that slashing prices for its “products” will boost sales. My understanding of Google is that its sale to government agencies pivots on several characteristics; for example, [a] listening and understanding what government professionals say, [b] providing a modicum of customer support or at the very least answering a phone call from a government professional, and [c] delivering products that the aides, assistants, and contractors understand and can use to crank out documents with numbered lines, dense charts, and bullet points that mostly stay in place after a graphic is inserted.

To sum up, I find the idea of price cuts interesting. My initial reaction is that price cuts and procurement are not necessarily lined up procedurally. But I am a dinobaby. But after 50 years of “government” work I have a keen desire to see if the Google can shine enough blue lights to bedazzle people involved in purchasing software to keep the admirals happy. (I speak from a little experience working with the late Admiral Craig Hosmer, R-Calif. whom I thank for his service.)

Stephen E Arnold, April 15, 2025

Next Page »

  • Archives

  • Recent Posts

  • Meta