The First AI-Written Paper To Pass Peer Review

April 2, 2025

Cheating. I am not going to bring this topic up.

Humans have taken one small stop towards obsolesce when it comes to writing papers. Sakana AI reports that "The AI Scientist Generates Its First Peer-Reviewed Scientific Publication." This is the first known fully AI-generated paper that passed the same review process that human scientists submit their papers too. Here’s how the paper was written:

"The paper was generated by an improved version of the original AI Scientist, called The AI Scientist-v2. We will be sharing the full details of The AI Scientist-v2 in an upcoming release. This paper was submitted to an ICLR 2025 workshop that agreed to work with our team to conduct an experiment to double-blind review AI-generated manuscripts. We selected this workshop because of its broader scope, challenging researchers (and our AI Scientist) to tackle diverse research topics that address practical limitations of deep learning. The workshop is hosted at ICLR, one of three premier conferences in machine learning and artificial intelligence research, along with NeurIPS and ICML.3

The ICLR leadership and organizers were involved with the project. The paper was blindly submitted to the ICLR review team, although they were told that they might be reviewing AI generated papers.

The AI algorithm was told to research and write about a broad topic. When the process was done, three papers were selected for submission so the review board wouldn’t be overburdened. Here are the results:

“We looked at the generated papers and submitted those we thought were the top 3 (factoring in diversity and quality—We conducted our own detailed analysis of the 3 papers, please read on in our analysis section). Of the 3 papers submitted, two papers did not meet the bar for acceptance. One paper received an average score of 6.33, ranking approximately 45% of all submissions. These scores are higher than many other accepted human-written papers at the workshop, placing the paper above the average acceptance threshold. Specifically, the scores were:

• Rating: 6: Marginally above acceptance threshold

• Rating: 7: Good paper, accept

• Rating: 6: Marginally above acceptance threshold”

The AI Scientist conducted the experiment out of pure scientific curiosity to measure how current AI algorithms compare to human intellect. No problem.

Whitney Grace, April 2, 2025

Comments

Got something to say?





  • Archives

  • Recent Posts

  • Meta