ArXiv: Will Other Smart Software Systems Get “Free” Access? Yeah, Sure

April 21, 2025

dino orangeBelieve it or not, no smart software. Just a dumb and skeptical dinobaby.

Before commenting on Cornell University’s apparent shift  of the ArXiv service to the Google Cloud, let me point you to this page:

image

The page was updated 15 years ago. Now check out the access to

NCSTRL, the Networked Computer Science Technical Reference Library.

CoRR, the Computing Research Repository.

The Open Archives Initiative.

ETRDL, the ERCIM Technical Reference Digital Library.

Cornell University Library Historical Math Book Collection

Cornell University Library Making of America Collection

Hein online Retrospective Law Journals

Yep, 404s, some content behind paywalls, and other data just disappeared because Bing, Google, and Yandex don’t index certain information no matter what people believe or the marketers say.

This orphaned Cornell University Dienst service has “gorged out”; that is, jumped off a bridge to the rocks below. The act is something students know about but the admissions department seems to not be aware of the bound phrase.

I read “Careers at ArXiv.” The post seems to say to me, “We are moving the ArXiv “gray” papers to Google Cloud. Here’s a snippet of the “career” advertisement / news announcement:

We are already underway on the arXiv CE ("Cloud Edition") project. This is a project to re-home all arXiv services from VMs at Cornell to a cloud provider (Google Cloud). There are a number of reasons for this transition, including improving arXiv’s scalability while modernizing our infrastructure. This will not be a simple port of the existing arXiv code base because this project will:

  • replace the portion of our backends still written in perl and PHP
  • re-architect our article processing to be fully asynchronous, and provide better insight into the processing workflows
  • containerize all, or nearly all arXiv services so we can deploy via Kubernetes or services like Google Cloud Run
  • improve our monitoring and logging facilities so we can more quickly identify and manage production issues with arxiv.org
  • create a robust CI/CD pipeline to give us more confidence that changes we deploy will not cause services to regress

The cloud transition is a pre-requisite to modernizing arXiv as a service. The modernization will enable: – arXiv to expand the subject areas that we cover – improve the metadata we collect and make available for articles, adding fields that the research community has requested such as funder identification – deal with the problem of ambiguous author identities – improve accessibility to support users with impairments, particularly visual impairments – improve usability for the entire arXiv community.

I know Google is into “free.” The company is giving college students its quantumly supreme smart software for absolutely nothing. Maybe a Google account will be required? Maybe the Chrome browser may be needed to give those knowledge hungry college students the best experience possible? Maybe Google’s beacons, bugs, and cookies will be the students’ constant companions? Yeah, maybe.

But will ArXiv exist in the future? Will Google’s hungry knowledge munchers chew through the data and then pull a Dienst maneuver?

As a dinobaby, I liked the ArXiv service, but I also liked the Dienst math repository before it became unfindable.

It seems to me that Cornell University is:

  1. Saving money at the library and maybe the Theory Center
  2. Avoiding future legal dust ups about access to content which to some government professionals may reveal information to America’s adversaries
  3. Intentionally or inadvertently giving the Google control over knowledge flow related to matters of technical and competitive interest to everyone’s favorite online advertising company
  4. Running a variation of its Dienst game plan.

But I am a dinobaby, and I know zero about Cornell other than the “gorging out” approach to termination. I know even less about the blue chip consulting type thinking in which the Google engages. I don’t even know if I agree that Google’s recent court loss is really a “win” for the Google.

But the future of the ArXiv? Hey, where is that bridge? Do some students jump, fall, or get pushed to their death on the rocks below?

PS. In case your German is rusty “dienst” means duty and possibly “a position of authority” like a leader at Google.

Stephen E Arnold, April xx, 2025

Comments

Got something to say?





  • Archives

  • Recent Posts

  • Meta