Identification of Machine Generated Text: Not There Yet
March 18, 2019
“A.I. Generated Text Is Supercharging Fake News. This Is How We Fight Back” provides a run down of projects focused on figuring out if a sentence were written by a human or smart software. IBM’s “visual tool” is described this way by an IBM data scientist:
“[Our current] visual tool might not be the solution to that, but it might help to create algorithms that work like spam detection algorithms,” he said. “Imagine getting emails or reading news, and a browser plug-in tells you for the current text how likely it was produced by model X or model Y.”
Okay, not there yet.
The article references XceptionNet but does not provide a link. If you want to know a bit about this approach, click this link. Interesting but not designed for text.
Net net: There is no fool proof way to determine if a chunk of content has been created:
- Entirely by a human writing to a template; for example, certain traditional news story about a hearing or a sport score
- Entirely by software processing digital content streaming from a third party
- A combination of human and smart software.
As some individuals emerge from schools with little training in more traditional types of research and source verification, understanding the difference between information which is written by a careless or stupid human from information assembled by a semi-smart software system is likely to be difficult for people.
Identification of text features is tricky. Exciting opportunities for researchers; for example, should a search and retrieval system automatically NOT our machine generated text?
Stephen E Arnold, March 18, 2019