This new AI model runs on supercomputers and boasts 530 billion parameters on 105 layers. “Each model replica spans 280 NVIDIA A100 GPUs, with 8-way tensor-slicing within a node, and 35-way pipeline parallelism across nodes,” Microsoft and Nvidia say in a blog post. Researchers have been training the model on 15 datasets coverings 339 billion tokens. The companies say this highlights how larger AI models can understand with less training. However, the duo admit they had to deal with a frequent AI issue, bias. “While giant language models are advancing the state of the art on language generation, they also suffer from issues such as bias and toxicity,” the companies point out. “Our observations with MT-NLG are that the model picks up stereotypes and biases from the data on which it is trained. Microsoft and Nvidia are committed to working on addressing this problem.”
Video Training Breakthrough
This isn’t the first major AI announcement from Microsoft and Nvidia this year. Back in May, the partnership revealed a major breakthrough in video training by leveraging multimodal transformers. In a paper titled : “Parameter Efficient Multimodal Transformers for Video Representation Learning,” researchers discuss how they reduced multimodal transformer size by 97 per cent to achieve improved AI training for video clips of 30 seconds (sampled at 480 frames, 16 per second). This is a major improvement on existing models that can process video sequences of 10 seconds or less. Tip of the day: Thanks to the Windows Subsystem for Linux (WSL) you can run complete Linux distributions within Windows 10. In our tutorial, we show you how to install Ubuntu or other Linux packages and how to activate the bash shell.