Close Enough for Horse Shoes? Why Drifting Off Course Has Become a Standard Operating Procedure

July 14, 2020

One of the DarkCyber research team sent me a link to a post on Hacker News: “How Can I Quickly Trim My AWS Bill?” In the write up were some suggestions from a range of people, mostly anonymous. One suggestion caught my researcher’s attention and I too found it suggestive.

Here’s the statement the DarkCyber team member flagged for me:

If instead this is all about training / the volume of your input data: sample it, change your batch sizes, just don’t re-train, whatever you’ve gotta do.

Some context. Certain cloud functions are more “expensive” than others. Tips range from dumping GPUs for CPUs to “Buy some hardware and host it at home/office/etc.”

I kept coming back to the suggestion “don’t retrain.”

One of the magical things about certain smart software is that the little code devils learn from what goes through the system. The training gets the little devils or daemons to some out of bed and in the smart software gym.

However, in many smart processes, the content objects processed include signals not in the original training set. Off the shelf training sets are vulnerable just like those cooked up by three people working from home with zero interest in validating the “training data” from the “real world data.”

What happens?

The indexing or metadata assignments “drift.” This means that the smart software devils index a content object in a way that is different from what that content object should be tagged.

Examples range from this person matches that person to we indexed the food truck as a vehicle used in a robbery. Other examples are even more colorful or tragic depending on what smart software output one examines. Detroit facial recognition ring a bell?

Who cares?

I care. The person directly affected by shoddy thinking about training and retraining smart software, however, does not.

That’s what is troubling about this suggestion. Care and thought are mandatory for initial model training. Then as the model operates, informed humans have to monitor the smart software devils and retrain the system when the indexing goes off track.

The big or maybe I should type BIG problem today is that very few individuals want to do this even it an enlightened superior says, “Do the retraining right.”

Ho ho ho.

The enlightened boss is not going to do much checking and the outputs of a smart system just keep getting farther off track.

In some contexts like Google advertising, getting rid of inventory is more important than digging into the characteristics of Oingo (later Applied Semantics) methods. Get rid of the inventory is job one.

For other model developers, shapers, and tweakers, the suggestion to skip retraining is “good enough.”

That’s the problem.

Good enough has become the way to refactor excellence into substandard work processes.

Stephen E Arnold, July 14, 2020

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta