Shaping Data Is Indeed a Thing and Necessary
April 12, 2021
I gave a lecture at Microsoft Research many years ago. I brought up the topic of Kolmogorov’s complexity idea and making fast and slow smart software sort of work. (Remember that Microsoft bought Fast Search & Transfer which danced around making automated indexing really super wonderful like herring worked over by a big time cook.) My recollection of the Microsoft group’s reaction was, “What is this person talking about?” There you go.
If you are curious about the link between a Russian math person once dumb enough to hire one of my relatives to do some grunt work, check out the 2019 essay “Are Deep Neural Networks Dramatically Overfitted?” Spoiler: You betcha.
The essay explains that mathy tests signal when a dataset is just right. No more nor no less data are needed. Thus, if the data are “just right,” the outputs will be on the money, accurate, and close enough for horse shoes.
The write up states:
The number of parameters is not correlated with model overfitting in the field of deep learning, suggesting that parameter counting cannot indicate the true complexity of deep neural networks.
Simplifying: “Oh, oh.”
Then there is a work around. The write up points out:
The lottery ticket hypothesis states that a randomly initialized, dense, feed-forward network contains a pool of subnetworks and among them only a subset are “winning tickets” which can achieve the optimal performance when trained in isolation. The idea is motivated by network pruning techniques — removing unnecessary weights (i.e. tiny weights that are almost negligible) without harming the model performance. Although the final network size can be reduced dramatically, it is hard to train such a pruned network architecture successfully from scratch.
Simplifying again: “Yep, close enough for most applications.”
What’s the fix? Keep the data small.
Doesn’t that create other issues? Sure does. For example, what about real time streaming data which diverge from the data used to train smart software. You know the “change” thing when historical data no longer apply. Smart software is possible as long as the aperture is small and the data shaped.
There you go. Outputs are good enough but may be “blind” in some ways.
Stephen E Arnold, April 12, 2021