The Cost of Training Smart Software: Is It Rising or Falling?
July 6, 2020
I read “The Cost of AI Training is Improving at 50x the Speed of Moore’s Law: Why It’s Still Early Days for AI.” The article’s main point is that “training” — that is, the cost of making machine learning smart — is declining.
That seems to make sense. First, there are cloud services. Some of these are cheaper than others, but, in general, relying on cloud compute eliminates the capital costs and the “ramp up” costs for creating one’s own infrastructure to train machine learning systems.
Second, use of a machine learning “utility” like Amazon AWS Sagemaker or the similar services available from IBM and Google provides two economic benefits:
- Tools are available to reduce engineering lift off and launch time
- Components like Sagemaker’s off-the-shelf data bundles eliminate the often-tedious process of finding additional data to use for training.
Third, assumptions about smart software’s efficacy appear to support generalizations about the training, use, and deployment of smart software.
I want to =note that there are some research groups who believe that software can learn by itself. If my memory is working this morning, I think the jazzy way to state is “sui generis.” Turn the system on, let it operate, and it learns by processing. For smart software, the crude parallel is learning the way humans learn: What’s in the environment becomes the raw material for learning.
The article correctly points out that the number of training models has increased. That is indeed accurate. A model is a numerical recipe set up to produce an output that meets the modeler’s goal. Thus, training a model involves providing data to the numerical recipe, observing the outputs, and then making adjustments. These “tweaks” can be simple and easy; for example, changing a threshold governing a decision. More complex fixes include, but are not limited to, selecting a different sequence for the individual processes, concatenating models so that multiple outputs inform a decision, and substituting one mathematical component for another. To get a sense of the range of components available to a modeler, a quick look at Algorithms. This collection is what I would call “ready to run.”
The article includes a number of charts. Each of these presents data supporting the argument that it is getting less costly to training smart software.
I am not certain I agree, although the charts seem to support the argument.
I want to point out that there are some additional costs to consider. A few of these can be “deal breakers” for financial and technical reasons.
Here’s my list of smart software costs. As far as I know, none of these has been the subject of an analyst’s examination and some may be unquantified because those in the business of smart software are not set up to capture them:
- Retraining. Anyone with experience with models knows that retraining is required. There are numerous reasons, but retraining is often more expensive than the first set of training activities.
- Gathering current or more on point training data. The assumption about training data is that it is useful. We live in the era of so called big data. Unfortunately on point data relevant to the retraining task is a time consuming and can be a complicated task involving subject matter experts.
- Data normalization. There is a perception that if data are digital, those data can be provided “as is” to a content processing system. That is not entirely accurate. The normalization processes can easily consume as much as 60 percent of available subject matter expert and data analysts’ time.
- Data validation. The era of big data makes possible this generalization, “The volume of data will smooth out any anomalies.” Maybe, but in my experience, the “anomalies” — if not addressed — can easily skew one of the ingredients in the numerical recipe so that the outputs are not reliable. The output may “look” like it is accurate. In real life, the output is not what’s desired. I would refer the reader to the stories about Detroit’s facial recognition system which is incorrect 96 percent of the time. For reference, see this Ars Technica article.
- Downstream costs. Let’s use the Detroit police facial recognition system to illustrate this cost. Answer this question, please, “What are the fully loaded costs for the consequences of the misidentification of a US citizen?”
In my view, taking a narrow look at the costs of training smart software is not in the interests of the analyst who benefits from handling investors’ money. Nor are the companies involved in smart software eager to monitor the direct and indirect costs associated with training the models. Finally, it is in no one’s interest to consider the downstream costs of a system which may generate inaccurate outputs.
Net net: In today’s economic environment, ignoring the broader cost picture is a distortion of what it takes to train and retrain smart software.
Stephen E Arnold, July 6, 2020