The AI revolution has already begun, right?
In some ways it has. Deep learning applications have already bested humans in complex games, including chess, Jeopardy, Go, and poker, and in practical tasks, such as image and speech recognition. They are also impacting our everyday lives, introducing human-like capabilities into personal digital assistants, online preference engines, fraud detection systems and more.
However, these solutions were developed primarily by organizations with deep pockets, deep expertise and high-end computing resources.[1] For the AI revolution to move into the mainstream, cost and complexity must be reduced, so smaller organizations can afford to develop, train, and deploy powerful deep learning applications.
It’s a tough challenge. Interest in AI is high, technologies are in flux and no one can reliably predict what those technologies will look like even five years from now. How do you simplify and drive down costs in such an inherently complex and changing environment?
Intel has a strategy, and it involves software as much as hardware. It also involves HPC.
Optimized Software Building Blocks that are Flexible—and Fast
Most of today’s deep learning algorithms were not designed to scale on modern computing systems. Intel has been addressing those limitations by working with researchers, vendors and the open-source community to parallelize and vectorize core software components for Intel® Xeon® and Intel® Xeon Phi™ processors.
The optimized tools, libraries, and frameworks often provide order-of-magnitude and higher performance gains, potentially reducing the cost and complexity of the required hardware infrastructure. They also integrate more easily into standards-based environments, so new AI developers have less to learn, deployment is simpler and costs are lower.
Bring AI and HPC Together to Unleash Broad and Deep Innovation
Optimized software development tools help, but deep learning applications are compute-intensive, data sets are growing exponentially, and time-to-results can be key to success. HPC offers a path to scaling compute power and data capacity to address these requirements.
However, combining AI and HPC brings additional challenges. AI and HPC have grown up in relative isolation, and there is currently limited overlap in expertise between the two areas. Intel is working with both communities to provide a better and more open foundation for mutual development.
Intel is also working to extend the benefits of AI and HPC to a broader audience. One example of this effort is Intel® HPC Orchestrator, an extended version of OpenHPC that provides a complete, integrated system software stack for HPC-class computing. Intel HPC Orchestrator will help the HPC ecosystem deliver value to customers more quickly by eliminating the complex and duplicated work of creating, testing, and validating a system software
stack. Intel has already integrated its AI-optimized software building blocks into Intel HPC Orchestrator to provide better development and runtime environments for AI applications. Work has also been done to optimize other core components, such as MPI, to provide higher performance and better scaling for the data- and compute-intensive demands of deep learning.
Powerful Hardware to Run It All
Of course, AI software can only be as powerful as the hardware that runs it. Intel is delivering disruptive new capabilities in its processors, and supporting them with synchronized advances in workstation and server platforms. Intel engineers are also integrating these advances—along with Intel HPC Orchestrator—into the Intel® Scalable System Framework (Intel SSF), a reference architecture for HPC clusters that are simpler, more affordable, more scalable, and designed to handle the full range of HPC and AI workloads. It’s a platform for the future of AI.
Click on a link to learn more about the benefits Intel SSF brings to AI at each layer of the solution stack: overview, compute, memory, fabric, storage.
[1] For example, Libratus, the application that beat five of the world’s top poker players, was created by a team at Carnegie Mellon University and relied on 600-nodes of a University of Pittsburgh supercomputer for overnight processing during the competition.