As the MLPerf benchmark emerges as an industry standard for measuring the performance of machine learning models, its creators said they will phase out the foundational DAWNBench metric.
Stanford University researchers announced earlier this month they will end rolling submissions—that is, finished sections of a model rather than a complete version—on March 27, 2020. The move consolidates benchmarking efforts for machine learning systems with the launch of MLPerf training and inference suites
Until then, DAWNBench’s maintainers said they would continue accepting new submissions via pull requests.
The move reflects broad industry acceptance of MLPerf as a follow-on to DAWNBench. The benchmark suite was expanded last June to include inferencing for applications ranging from autonomous driving to natural language processing models. MLPerf’s training benchmark suite was launched in May 2018, followed by release of its initial set of public results in December of that year.
Stanford researchers said DAWNBench yielded major improvements in model training. For example, ImageNet training time declined from 30 minutes to less than three minutes, along with a 20-fold drop in inference latency. The improvements were credited to contributors that include Alibaba, Apple, Baidu Huawei and Myrtle.ai, clearing the way for the launch of MLPerf training and inference benchmarks.
“We have been actively involved with MLPerf to expand the benchmarking methodology to a more comprehensive suite of tasks and scenarios,” the researchers noted in a blog post announcing the benchmark consolidation.
DAWNBench was introduced several years back as a training benchmark and competition, with a particular focus on improving accuracy in gauging training times for deep learning workloads.
Since then, MLPerf has emerged as a de facto industry standard for gauging the performance of machine learning models as they are deployed across enterprise infrastructure.