AI: Multi Modal Wu Dao

June 24, 2021

Last summer OpenAI’s GPT-3 text generator was the impressive AI of the season, creating passages of text most could not discern from human-penned prose. Now we are told a model out of the Beijing Academy of Artificial Intelligence (BAAI) has surpassed that software. According to Yahoo, “China’s Gigantic Multi-Modal AI is No One-Trick Pony.” The new deep learning model, named Wu Dao, can emulate human writers as well as GPT-3 and then some. Reporter Andrew Tarantola asserts:

“First off, Wu Dao is flat out enormous. It’s been trained on 1.75 trillion parameters (essentially, the model’s self-selected coefficients) which is a full ten times larger than the 175 billion GPT-3 was trained on and 150 billion parameters larger than Google’s Switch Transformers. In order to train a model on this many parameters and do so quickly — Wu Dao 2.0 arrived just three months after version 1.0’s release in March — the BAAI researchers first developed an open-source learning system akin to Google’s Mixture of Experts, dubbed FastMoE. This system, which is operable on PyTorch, enabled the model to be trained both on clusters of supercomputers and conventional GPUs. This gave FastMoE more flexibility than Google’s system since FastMoE doesn’t require proprietary hardware like Google’s TPUs and can therefore run on off-the-shelf hardware — supercomputing clusters notwithstanding. With all that computing power comes a whole bunch of capabilities. Unlike most deep learning models which perform a single task — write copy, generate deep fakes, recognize faces, win at Go — Wu Dao is multi-modal, similar in theory to Facebook’s anti-hatespeech AI or Google’s recently released MUM.”

Tarantola checked out the researchers’ recent demo. While OpenAI taught us that software can now mimic news stories and similar content, Wu Dao takes language further by generating essays, poems, and couplets in traditional Chinese. It can also take clues from static images to write relevant text and can create almost photorealistic images from natural-language descriptions. With the help of Microsoft’s XiaoIce, Wu Dao can also power virtual idols and predict 3D protein structures a la AlphaFold. Talk about use cases from different ends of the spectrum. BAAI chair Dr. Zhang Hongjiang declares the key to AI’s future lies in “big models and a big computer.” Perhaps those models can divine a way to minimize their own power consumption and work without alleged biases toward everyone not in CompSci 410.

Cynthia Murrell, June 24, 2021

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta