On Tuesday, Google released its “next generation of on-device computer vision networks” – MobileNewtV2 – which Google says are substantially faster than MobileNetV1 for the same accuracy across the entire latency spectrum. In particular, the new models use 2x fewer operations, need 30% fewer parameters and are about 30-40% faster on a Google Pixel phone than MobileNetV1 models, all while achieving higher accuracy, says the company.
While this release is aimed at Google platform developers, it’s worth noting that various Google-developed machine learning and deep learning tools have become widely adopted across many domains.
“MobileNetV2 is a significant improvement over MobileNetV1 and pushes the state of the art for mobile visual recognition including classification, object detection and semantic segmentation. MobileNetV2 is released as part of TensorFlow-Slim Image Classification Library, or you can start exploring MobileNetV2 right away in Colaboratory. Alternately, you can download the notebook and explore it locally using Jupyter. MobileNetV2 is also available as modules on TF-Hub, and pretrained checkpoints can be found on github,” wrote Mark Sandler and Andrew Howard, Google Research, on yesterday’s Google Research blog.
MobileNetV2 builds upon the ideas from MobileNetV1, using depthwise separable convolution as efficient building blocks. However, V2 introduces two new features to the architecture: 1) linear bottlenecks between the layers, and 2) shortcut connections between the bottlenecks. The basic structure is shown below.
“The intuition is that the bottlenecks encode the model’s intermediate inputs and outputs while the inner layer encapsulates the model’s ability to transform from lower-level concepts such as pixels to higher level descriptors such as image categories. Finally, as with traditional residual connections, shortcuts enable faster training and better accuracy. You can learn more about the technical details in our paper, ‘MobileNet V2: Inverted Residuals and Linear Bottlenecks,’” according to Sandler and Howard.
Overall, Google says, the MobileNetV2 models are faster for the same accuracy across the entire latency spectrum. In particular, the new models use 2x fewer operations, need 30% fewer parameters and are about 30-40% faster on a Google Pixel phone than MobileNetV1 models, all while achieving higher accuracy.
Link to blog: https://research.googleblog.com
Link to MobilNetV2 paper: https://arxiv.org/abs/1801.04381