ChemNet: Pre Training and Rules Can Work but Time and Cost Can Be a Roadblock
February 27, 2019
I read “New AI Approach Bridges the Slim Data Gap That Can Stymie Deep Learning Approaches.” The phrase “slim data” caught my attention. Pairing the phrase with “deep learning” seemed to point the way to the future.
The method described in the document reminded me that creating rules for “smart software” works on narrow domains with constraints on terminology. No emojis allowed. The method of “pre training” has been around since the early days of smart software. Autonomy in the mid 1990s relied upon training its “black box.”
Creating a training set which represents the content to be processed or indexed can be a time consuming, expensive business. Plus because content “drifts”, re-training is required. For some types of content, the training process must be repeated and verified.
So the cost of the rule creation, tuning and tweaking is one thing. The expense of training, training set tuning, and retraining is another. Add them up, and the objective of keeping costs down and accuracy up becomes a bit of a challenge.
The article focuses on the benefits of the new system as it crunches and munches its way through chemical data. The idea is to let software identify molecules for their toxicity.
Why hasn’t this type of smart software been used to index outputs at scale?
My hunch is that the time, cost, and accuracy of the indexing itself is a challenge. Eighty percent accuracy may be okay for some applications like identifying patients with a risk of diabetes. For identifying substances that will not kill one outright is another.
In short, the slim data gap and deep learning remain largely unsolved even for a constrained content domain.
Stephen E Arnold, February 27, 2019
Comments
One Response to “ChemNet: Pre Training and Rules Can Work but Time and Cost Can Be a Roadblock”
great article share.