Fast.ai, an organization offering free courses on deep learning, claimed a new speed record for training a popular image database using Nvidia GPUs running on public cloud infrastructure.
A pair of researchers trained the ImageNet database with 93 percent accuracy in 18 minutes using 16 Amazon Web Services cloud instances, each with eight Nvidia Tesla V100 Tensor Core GPUs. Running Fast.ai and Pytorch libraries, the researchers claimed a 40-percent boost in speed and accuracy for training ImageNet on public infrastructure. The previous record was held by Google on its Tensor Processing Unit Pod cluster.
“Our approach uses the same number of processing units as Google’s benchmark (128) and costs around $40 to run,” Fast.ai reported. The researchers said they would release their software for training and monitoring distributed models running in the AWS cloud.
The researchers included a Fast.ai alumnus and a deep learning expert with the Defense Innovation Unit Experimental (DIUx), a Pentagon startup working to transfer commercial technologies to the military.
Fast.ai developed a set of tools for cropping database images while DIUx supplied a framework called a nexus-scheduler used to orchestrate training runs and track the results. The scheduler was tuned for multi-machine training.
The researchers said they were encouraged by a recent report that AWS was able to reduce training time on the image database to 47 minutes with comparable accuracy.
The Fast.ai effort employed what they called a “new training trick.”
“A lot of people mistakenly believe that convolutional neural networks can only work with one fixed image size, and that that must be rectangular,” Fast.ai’s Jeremy Howard explained in a blog post. “However, most libraries support ‘adaptive’ or ‘global’ pooling layers, which entirely avoid this limitation.”
Howard continued: “…unless users of these libraries replace those layers, they are stuck with just one image size and shape (generally 224 by 224 pixels). The Fast.ai library automatically converts fixed-size models to dynamically sized models.”
The researchers said training started with small images that were gradually increased in size as training progressed. Early, inaccurate models quickly learned to identify more and larger images while spotting more image detail and distinctions. To accelerate training, they also used larger batch sizes during intermediate training steps to better utilize GPU memory to avoid network latency.
Among the lessons drawn from the Fast.ai experiments are the assertion that deep learning researchers do not necessarily require massive processing power to accelerate training. The researchers argued that a combination of new training techniques such as dynamically sized models along with public cloud access to GPU infrastructure on demand can help democratize deep learning and other AI development tasks.
“There’s certainly plenty of room to go faster still,” Fast.ai’s Howard said.
This story originally appeared on our sister site Datanami.