Cognitive Blind Spot 1: Can You Identify Synthetic Data? Better Learn.
October 5, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
It has been a killer with the back-to-back trips to Europe and then to the intellectual hub of the old-fashioned America. In France, I visited a location allegedly the office of a company which “owns” the domain rrrrrrrrrrr.com. No luck. Fake address. I then visited a semi-sensitive area in Paris, walking around in the confused fog only a 78 year old can generate. My goal was to spot a special type of surveillance camera designed to provide data to a smart software system. The idea is that the images can be monitored through time so a vehicle making frequent passes of a structure can be flagged, its number tag read, and a bit of thought given to answer the question, “Why?” I visited with a friend and big brain who was one of the technical keystones of an advanced search system. He gave me his most recent book and I paid for my Orangina. Exciting.
One executive tells his boss, “Sir, our team of sophisticated experts reviewed these documents. The documents passed scrutiny.” One of the “smartest people in the room” asks, “Where are we going for lunch today?” Thanks, MidJourney. You do understand executive stereotypes, don’t you?
On the flights, I did some thinking about synthetic data. I am not sure that most people can provide a definition which will embrace the Google’s efforts in the money saving land of synthetic. I don’t think too many people know about Charlie Javice’s use of synthetic data to whip up JPMC’s enthusiasm for her company Frank Financial. I don’t think most people understand that when typing a phrase into the Twitch AI Jesus that software will output a video and mostly crazy talk along with some Christian lingo.
The purpose of this short blog post is to present an example of synthetic data and conclude by revisiting the question, “Can You Identify Synthetic Data?” The article I want to use as a hook for this essay is from Fortune Magazine. I love that name, and I think the wolves of Wall Street find it euphonious as well. Here’s the title: “Delta Is Fourth Major U.S. Airline to Find Fake Jet Aircraft Engine Parts with Forged Airworthiness Documents from U.K. Company.”
The write up states:
Delta Air Lines Inc. has discovered unapproved components in “a small number” of its jet aircraft engines, becoming the latest carrier and fourth major US airline to disclose the use of fake parts. The suspect components — which Delta declined to identify — were found on an unspecified number of its engines, a company spokesman said Monday. Those engines account for less than 1% of the more than 2,100 power plants on its mainline fleet, the spokesman said.
Okay, bad parts can fail. If the failure is in a critical component of a jet engine, the aircraft could — note that I am using the word could — experience a catastrophic failure. Translating catastrophic into more colloquial lingo, the sentence means catch fire and crash or something slightly less terrible; namely, catch fire, explode, eject metal shards into the tail assembly, or make a loud noise and emit smoke. Exciting, just not terminal.
I don’t want to get into how the synthetic or fake data made its way through the UK company, the UK bureaucracy, the Delta procurement process, and into the hands of the mechanics working in the US or offshore. The fake data did elude scrutiny for some reason. With money being of paramount importance, my hunch is that saving some money played a role.
If organizations cannot spot fake data when it relates to a physical and mission-critical component, how will organizations deal with fake data generated by smart software. The smart software can get it wrong because an engineer-programmer screwed up his or her math or the complex web of algorithms just generate unanticipated behaviors from dependencies no one knew to check and validate.
What happens when computers which many people are “always” more right than a human, says, “Here’s the answer.” Many humans will skip the hard work because they are in a hurry, have no appetite for grunt work, or are scheduled by a Microsoft calendar to do something else when the quality assurance testing is supposed to take place.
Let’s go back to the question in the title of the blog post, “Can You Identify Synthetic Data?”
I don’t want to forget this part of the title, “Better learn.”
JPMC paid out more than $100 million in November 2022 because some of the smartest guys in the room weren’t that smart. But get this. JPMC is a big, rich bank. People who could die because of synthetic data are a different kettle of fish. Yeah, that’s what I thought about as I flew Delta back to the US from Paris. At the time, I thought Delta had not fallen prey to the scam.
I was wrong. Hence, I “better learn” myself.
Stephen E Arnold, October 5, 2023