Has Microsoft Drilled into a Google Weak Point?

February 2, 2023

I want to point to a paper written by someone who is probably not on the short list to replace Jeff Dean or Prabhakar Raghavan at Google. The analysis of synthetic data and its role in smart software is titled “Machine Learning and the Politics of Synthetic Data.” The author is Benjamin N Jacobsen at Durham University. However, the first sentence of the paper invokes Microsoft’s AI Labs at Microsoft Cambridge. Clue? Maybe?

The paper does a good job of defining synthetic data. These are data generated by a smart algorithm. The fake data train other smart software. What could go wrong? The paper consumes 12 pages explaining that quite a bit can go off the rails; for example, just disconnected from the real world or delivering incorrect outputs. No big deal.

For me the key statement in the paper is this one:

… as I have sought to show in this paper, the claims that synthetic data are ushering in a new era of generated inclusion and non-risk for machinelearning algorithms is both misguided and dangerous. For it obfuscates how synthetic data are fundamentally a technology of risk, producing the parameters and conditions of what gets to count as risk in a certain context.

The idea of risk generated from synthetic data is an important one. I have been compiling examples of open source intelligence blind spots. How will a researcher know when an output is “real”? What if an output increases the risk of a particular outcome? Has the smart software begun to undermine human judgment and decision making? What happens if one approach emerges as the winner — for example the SAIL, Snorkel, Google method? What if a dominant company puts its finger on the scale to cause certain decisions to fall out of the synthetic training set?

With many rushing into the field of AI windmills, what will Google’s Code Red actions spark? Perhaps more synthetic data to make training easier, cheaper, and faster? Notice I did not use the word better. Did the stochastic parrot utter something?

Stephen E Arnold, February 2, 2023


Got something to say?

  • Archives

  • Recent Posts

  • Meta