Yet Another Way to Spot AI Generated Content

July 21, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_t[1]Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

The dramatic emergence of ChatGPT has people frantically searching for ways to distinguish AI-generated content from writing by actual humans. Naturally, many are turning to AI solutions to solve an AI problem. Some tools have been developed that detect characteristics of dino-baby writing, like colloquialisms and emotional language. Unfortunately for the academic community, these methods work better on Reddit posts and Wikipedia pages than academic writings. After all, research papers have employed a bone-dry writing style since long before the emergence of generative AI.

7 16 which teacup

Which tea cup is worth thousands and which is a fabulous fake? Thanks, MidJourney. You know your cups or you are in them.

Cell Reports Physical Science details the development of a niche solution in the ad article, “Distinguishing Academic Science Writing from Humans or ChatGPT with Over 99% Accuracy Using Off-the-Shelf Machine Learning Tools.” We learn:

“In the work described herein, we sought to achieve two goals: the first is to answer the question about the extent to which a field-leading approach for distinguishing AI- from human-derived text works effectively at discriminating academic science writing as being human-derived or from ChatGPT, and the second goal is to attempt to develop a competitive alternative classification strategy. We focus on the highly accessible online adaptation of the RoBERTa model, GPT-2 Output Detector, offered by the developers of ChatGPT, for several reasons. It is a field-leading approach. Its online adaptation is easily accessible to the public. It has been well described in the literature. Finally, it was the winning detection strategy used in the two most similar prior studies. The second project goal, to build a competitive alternative strategy for discriminating scientific academic writing, has several additional criteria. We sought to develop an approach that relies on (1) a newly developed, relevant dataset for training, (2) a minimal set of human-identified features, and (3) a strategy that does not require deep learning for model training but instead focuses on identifying writing idiosyncrasies of this unique group of humans, academic scientists.”

One of these idiosyncrasies, for example, is a penchant for equivocal terms like “but,” “however,” and “although.” Developers used the open source XGBoost software library for this project. The write-up describes the tool’s development and results at length, so navigate there for those details. But what happens, one might ask, the next time ChatGPT levels up? and the next? and so on? We are assured developers have accounted for this game of cat and mouse and will release updated tools quickly each time the chatbot evolves. What a winner—for the marketing team, that is.

Cynthia Murrell, July 21, 2023

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta