GPU leader Nvidia, generally associated with deep learning, autonomous vehicles and other higher-end enterprise and scientific workloads (and gaming, of course), is mounting an open source end-to-end GPU acceleration platform and ecosystem directed at machine learning and data analytics, domains heretofore within the CPU realm.
At the GPU Technology Conference today in Munich, the company launched RAPIDS, an open source machine learning GPU acceleration platform. Nvidia reports that RAPIDS delivers speed-ups, using the XGBoost machine learning algorithm for training on an Nvidia DGX-2 supercomputer, of 50x compared with CPU-only systems (configuration details were not provided).
RAPIDS brings with it with an ecosystem from the open-source community, including Databricks (a web-based platform for big data processing in the cloud using Apache Spark) and Anaconda (an open source distribution of the Python and R programming languages for data science and machine learning), and tech companies such as Hewlett Packard Enterprise, IBM and Oracle.
The RAPIDS suite of open-source libraries has been under development for the past two years by Nvidia engineers working with open-source contributors, including Apache Arrow (a data layer for in-memory analytics), Pandas and scikitlearn, and it’s designed to give scientists the tools to run the entire data science pipeline on GPUs. RAPIDS builds on popular open-source projects by adding GPU acceleration to the Python data science tool chain.
“We’re building on the community of Python users… and more recently built around… Apache Arrow and in memory data format and some other tools that allow us to scale from using just one GPU to multiple GPUs in the system, to multiple node and clusters of GPUs,” said Jeff Tseng, head of product for AI infrastructure at Nvidia, in a pre-announcement conference call. “These technologies are driving RAPIDS’ ability to integrate into today’s most popular data science workloads and accelerate them…. We’re going to be focused on business data, on tabular data, and we’re going to accelerate machine learning data prep.”
To bring additional ML libraries and capabilities to RAPIDS, the company is working with open-source contributors Anaconda, BlazingDB, Databricks, Quansight and scikit-learn, as well as Wes McKinney, head of Ursa Labs and creator of Apache Arrow and Pandas, the Python data science library.
“Data analytics and machine learning are two of the biggest high performance computing applications that have not been accelerated – until now,” said Jensen Huang, founder and CEO of Nvidia. “The world’s largest industries use a sea of servers to study vast quantities of data to make fast, accurate predictions, so data analytics and machine learning can directly impact the bottom line. Building on CUDA and its global ecosystem, and working closely with the open-source community, we have created the RAPIDS GPU acceleration platform. It integrates seamlessly into the world’s most popular data science libraries and workflows to speed up machine learning. We are turbocharging machine learning like we have done with deep learning.”
From the HPC industry, Rollin Thomas, Python data analytics lead at NERSC, the National Energy Research Scientific Computing Center, said RAPIDS is a potentially significant new scientific tool.
“NERSC supports more than 7,000 researchers at universities, national labs and in industry. They increasingly want productive, high-performance ways of interacting with their data from complex science simulations or experimental and observational facilities like particle accelerators and telescopes. We look forward to working with Nvidia to put new high-performance Python data analytics tools like RAPIDS in the hands of our users to accelerate their pace of discovery across many scientific disciplines.”
Access to the RAPIDS suite of libraries is available at http://www.rapids.ai, where the code is being released under the Apache license. Containerized versions of RAPIDS are available immediately on the NVIDIA GPU Cloud container registry.
Nvidia said RAPIDS systems are under development from Cisco, Dell EMC, HPE, IBM, Lenovo, and Pure Storage.
A version of this story originally appeared on sister site EnterpriseTech.