Cheap Training for Machine Learning Is Not Hyped Enough. Believe It or Not

December 6, 2022

I read an interesting article titled “Counting the Cost of Training Large Language Models.” The write up contains a statement which provides insight into the type of blind spots that plague whiz bang smart software companies. Here’s the statement which struck me as amusing and revelatory:

It has been becoming increasingly clear – anecdotally at least – just how expensive it is to train large language models and recommender systems…

Two points. Anyone who took the time to ask about the cost of retraining a Bayesian and neurolinguistic system from the late 1990s would have learned: [a] Smart software, even relatively simple implementations, require refined and curated training data before a system is deployed. This work is tedious and requires subject matter specialists. Then there is testing and fiddling knobs and dials before the software becomes operational. [b] The smart software requires retraining with updated data sets, calibration, and testing on a heartbeat. For some Autonomy plc type systems, the retraining could be necessary every 180 days or when “drift” became evident. Users complain, and that’s how one knows the system is lost in the tiny nooks and crannies of lots of infinitesimals adding up to a dust pile in a dark corner of a complex system.

After three decades of information available about the costs of human centric involvement in making smart software less stupid, one would think that the whiz kids would have done some homework. Oh, right. If the information is not in the first 15 items in a Google search result, there are no data. Very modern.

The write up identifies a number of companies with ways to chop down training costs. To be clear, the driving idea for Snorkel from the Stanford AI Lab is reducing the costs of building training sets. The goal is to be “close enough for horseshoes” or “good enough.” Cut the costs and deal with issues with some software wrappers. Package up the good enough training data and one has a way to corner the market for certain ML applications. But it’s not just the Google. Amazon AWS is in the hunt for this off-the-shelf approach to machine learning. I think of it as the 7-11 approach to getting a meal: Cheap, quick, and followed by a Big Gulp.

The write  up has a number of charts. These are okay, but I am not sure about the provenance of the data presented. But that’s just my skepticism for content marketing type write ups. There are even “cost per one million parameters” data. Interesting but who compiled the data, what methods were used to generate the numbers, and who vetted the project itself? Annoying questions? Sure. Important? Not to true believers.

But I know the well educated, well informed funding sources and procurement officials will love this conclusion:

some people will rent to train, and then as they need to train more and also train larger models, the economics will compel them to buy.

Yep, but what about the issue of “close enough for horseshoes”? Yep, here’s another annoying question: Is this article the kick off for another hype campaign? My initial reaction is, “Yes.”

Stephen E Arnold, December 2022

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta