Synthetic Content: A Challenge with No Easy Answer
January 30, 2023
Open source intelligence is the go-to method for many crime analysts, investigators, and intelligence professionals. Whether social media or third-party data from marketing companies, useful insights can be obtained. The upside of OSINT means that many of its supporters downplay or choose to sidestep its downsides. I call this “OSINT blindspots”, and each day I see more information about what is becoming a challenge.
For example, “As Deepfakes Flourish, Countries Struggle with Response” is a useful summary of one problem posed by synthetic (fake) content. What looks “real” may not be. A person sifting through data assumes that information is suspect. Verification is needed. But synthetic data can output multiple instances of fake information and then populate channels with “verification” statements of the initial item of information.
The article states:
Deepfake technology — software that allows people to swap faces, voices and other characteristics to create digital forgeries — has been used in recent years to make a synthetic substitute of Elon Musk that shilled a crypto currency scam, to digitally “undress” more than 100,000 women on Telegram and to steal millions of dollars from companies by mimicking their executives’ voices on the phone. In most of the world, authorities can’t do much about it. Even as the software grows more sophisticated and accessible, few laws exist to manage its spread.
For some government professionals, the article says:
problematic applications are also plentiful. Legal experts worry that deepfakes could be misused to erode trust in surveillance videos, body cameras and other evidence. (A doctored recording submitted in a British child custody case in 2019 appeared to show a parent making violent threats, according to the parent’s lawyer.) Digital forgeries could discredit or incite violence against police officers, or send them on wild goose chases. The Department of Homeland Security has also identified risks including cyber bullying, blackmail, stock manipulation and political instability.
The most interesting statement in the essay, in my opinion, is this one:
Some experts predict that as much as 90 per cent of online content could be synthetically generated within a few years.
The number may overstate what will happen because no one knows the uptake of smart software and the applications to which the technology will be put.
Thinking in terms of OSINT blindspots, there are some interesting angles to consider:
- Assume the write up is correct and 90 percent of content is authored by smart software, how does a person or system determine accuracy? What happens when a self learning system learns from itself?
- How does a human determine what is correct or incorrect? Education appears to be struggling to teach basic skills? What about journals with non reproducible results which spawn volumes of synthetic information about flawed research? Is a person, even one with training in a narrow discipline, able to determine “right” or “wrong” in a digital environment?
- Are institutions like libraries being further marginalized? The machine generated content will exceed a library’s capacity to acquire certain types of information? Does one acquire books which are “right” when machine generated content produces information that shouts “wrong”?
- What happens to automated sense making systems which have been engineered on the often flawed assumption that available data and information are correct?
Perhaps an OSINT blind spot is a precursor to going blind, unsighted, or dark?
Stephen E Arnold, January 30, 2023