Snorkel: Now Humans Are a Benefit?
November 23, 2022
Snorkel emerged from Stanford University’s AI lab. Some at the Google are ga-ga over Snorkel’s approach to reducing the cost of creating training sets for machine learning. If you are not paying attention to the expense of training models the old-fashioned way, when humans do the work, months or years of effort are required. Then — surprise — after operating in the real world for six months (plus or minus depending on the use case), the model has to be retrained.
Snorkel wants to get subject matter experts to build a training set one time. Then the numerical recipes will harvest additional information and automatically update the training set. Imagine better, faster, cheaper. Well, that’s the theory. Thus the entire AI industry push for finding short cuts to deal with the need for building training sets for initial model training and the work needed to make sure the model does not drift off into craziness. (I won’t mention the name of any search vendors, but a number of these outfits have performed oblation for their VC gods. Why? The results of the user’s query returned garbage. Confusing the information in a PowerPoint pitch with returning relevant and precise results for a user’s query is a bit like resolving the conflicts between Newtonian and quantum physics.)
I read “AI Startup Snorkel Preps a New Kind of Expert for Enterprise AI.” My immediate reaction was a question, “Why didn’t Google buy the company?” Hmmm. Now Snorkel is going to push to be a commercial success, perhaps like DeepDyve, an outfit which used or uses Snorkel technology.
The write up says:
Snorkel’s Data-centric Foundation Model Development, as the offering is called, is an enhancement to the startup’s flagship Snorkel Flow program. The new features let companies write functions that automatically create labeled training data by using what are called foundation models, the largest neural nets that exist, such as OpenAI’s GPT-3. The new functions in Snorkel Flow let a person who is a domain expert but not a programmer create a workflow that will then automatically generate labeled data sets that can be used to train the foundation programs for specific tasks.
The base technology emerged from projects guided in part by Christopher Ré. The work goes back more than a decade. Snorkel itself has been a start up for several years.
Smart software is getting a lot of tire kicking action by large companies. My hunch is that Snorkel wants to sell its methods to the firms just now having a bean counter come to a meeting and saying, “Have you taken a look at how much money our AI teams need to retrain our models?”
Then a whiz kid — possibly a graduate of Stanford — says, “Get Snorkel!”
Well, that’s my hunch. Will the models avoid the horrible fate of self immolating smart software which just gets stuff wrong? Probably not. But the PowerPoints and Zoom presentations will explain that Snorkel does not go “under water.” Snorkel lets an apoplectic accountant breathe somewhat more easily until the next quarterly analysis of smart software expenses.
Stephen E Arnold, November 23, 2022