Analysts describe the business and research opportunities unlocked by Artificial Intelligence (AI) as exceptionally promising. But AI applications – especially deep learning systems which parse enormous amounts of data – are extremely demanding and require powerful parallel processing capabilities. This is why AI and High Performance Computing (HPC) are more and more often mentioned in the same breath.
Even the most powerful supercomputing platforms on the planet are now using AI methods to tackle workloads that have previously relied on more traditional HPC-driven simulations. For example, a quick glance through the list of finalists this year for the prestigious Gordon Bell Prize, one of the top annual honors in supercomputing, shows a team using the Oak Ridge Leadership Computing Facility’s 200-petaflop Summit – currently the world’s fastest supercomputer – to train an AI-based application to identify extreme weather patterns.
Most enterprises are already working to define an AI strategy, because not acting could potentially be a business disaster as competitors gain a wealth of previously unavailable insight. But not every organization will rely on one of the world’s most powerful supercomputers to process their AI workloads. Instead, most will build their own IT infrastructure solutions, on-premises, in-house. And supercomputers such as Summit are providing the inspiration for how to tackle this business-critical challenge.
Like the Sierra supercomputer at Lawrence Livermore National Labs, Summit is based on the HPC architecture developed by IBM for the CORAL program. A key to the CORAL solutions is the fact that they were assembled using only commercially available components. Because of this, IBM can now offer everything needed to build essentially a mini-version of a CORAL supercomputer to address the most challenging AI and HPC workloads in a solution called the IBM Power Systems Accelerated Computing Platform (IBM Power ACP).
The IBM Power ACP system is a complete solution that includes IBM POWER9 servers; IBM Elastic Storage Server (ESS); networking, development, and runtime software; and professional services designed to help any organization easily build the on-premises infrastructure needed to support AI, HPC, and other compute-intensive workloads.
Headlining the IBM Power ACP offering is the IBM Power Systems AC922 – the same servers used in the Summit and Sierra supercomputers. With up to 1 TB of RAM, two 20-core IBM POWER9 processors, and up to four NVIDIA Tesla V100 GPUs connected through NVLINK, IBM Power Systems AC922 servers provide superior performance for workloads such as AI frameworks and accelerated databases. IBM Power ACP can provide the world’s most powerful servers customized to each organization’s unique environment.
This “CORAL-light” solution from IBM can start with as few as four IBM Power Systems AC922 compute nodes housed in one rack, then grow to as many as 52 nodes in four racks, so organizations of all sizes and types can deploy supercomputer capabilities with ease. IBM Power ACP offers a number of important advantages:
- It is based on existing, proven components such as POWER9 servers and available software IBM PowerAI, which combines open-source deep learning frameworks with ease of deployment and IBM Spectrum LSF Suite, a comprehensive HPC workload management solution designed to increase both user productivity and hardware utilization, while decreasing system management costs.
- The solution offers a single point of contact for support across the entire software stack. This means that if a job doesn’t complete, rather than dive into complex troubleshooting, you can simply call IBM.
- Comprehensive installation support from IBM Lab Services is available as part of the solution. IBM technicians take the time to understand your unique requirements then provide guidance on everything – from system performance to interaction with existing IT. Plus, IBM does the factory configuration, including software installation, so deployment is as simple as roll in and connect.
A modern system for AI and HPC is not complete without super-fast storage. IBM ESS is an optional part of the offering and can be configured to meet the size and performance required for each enterprise. ESS is powered by the same IBM Spectrum Scale technology deployed at some of the world’s most powerful AI installations, for example the 250 PB installation on the Summit system at Oak Ridge National Labs. By adding storage to each node or by adding additional ESS building blocks, ESS solutions can grow as workloads evolve. Capacity and performance increase as new ESS building blocks are added, and the single name space grows accordingly. Given the nature of AI workloads, AI servers require storage delivering gigabytes per second performance. IBM Power ACP therefore delivers a well-balanced architectures not only of powerful servers geared to AI, but also IBM ESS storage, which can scale to terabytes per second workloads.
One of the keys to a successful IBM Power ACP deployment is the consultation and configuration support provided by IBM Systems Lab Service engineers. Complete systems are delivered on-site, rack-mounted and integrated, with the operating systems and required software preloaded and configured as defined in implementation design workshops. One size never fits all; IBM accelerates time-to-insight by ensuring that deployed solutions are quickly installed and moved into production.
Thanks to IBM Power ACP, you don’t need to have the same budget and expertise as Oak Ridge in order to benefit from world-class supercomputing capabilities. You may even be a smaller or mid-sized business and still take advantage of the same caliber of AI and HPC solutions as national laboratories. In the 21st century, insight is the ultimate competitive advantage. With IBM Power ACP, the opportunities of AI are within reach of almost every organization.
Visit IBM next week at SC18 (booth #3433) in Dallas, Texas, to learn about leading-edge HPC and AI solutions, including the IBM Power Systems Accelerated Computing Platform. Register here for technical briefings and user group sessions.