With the recent announcement of the Intel® Scalable System Framework (Intel® SSF), the democratization of HPC took a major stride forward.
Intel® SSF is an advanced architectural approach and flexible blueprint for developing scalable, balanced and efficient HPC systems. It simplifies the use of the essential HPC building blocks, such as processors, memory, storage, fabric and system software. This provides OEMs with the flexibility to create both advanced high end supercomputers that are paving the way to Exascale and clusters of all sizes that can bring HPC to a whole new class of users – engineers and scientists that who did not have access to these advanced computational capabilities until now.
Intel SSF will help bring about the mainstream adoption of HPC in industries that need advanced HPC capabilities but have been slow to adopt the technology due to complexity and cost. Manufacturing is a good example. The National Center for Manufacturing Science reports that of the some 300,000 manufacturers in the U.S, 95% are characterized as small-to-medium sized (SMBs). Only 15% of the SMBs are using HPC, primarily for modeling and simulation. Systems built using the Intel SSF will help lower the barriers to adoption to a point where engineers will have access to all the horsepower they need to run complex CFD and FEA applications.
Because Intel® SSF is designed to simplify the procurement, deployment and management of HPC systems. The full range of HPC capabilities will become more accessible to more industries running a variety of workloads such as modeling and simulation, Big Data analytics, visualization and machine learning.
Intel® Omni-Path Architecture
A key element of Intel SFF is the Intel® Omni-Path Architecture (Intel® OPA), launched at SC15. Intel® OPA directly addresses the performance and scaling weaknesses of current InfiniBand* technology by incorporating the best of existing technologies acquired from QLogic and Cray* and adding its own mix of new features and innovations.
Intel OPA is designed to scale cost-effectively from entry level HPC clusters to clusters with 10,000 nodes or more. It provides the CPU and fabric integration needed for increased computer density, higher switching speeds, improved reliability, reduced power and lower costs required by larger HPC deployments. Intel OPA comes with all the tools needed to install, verify and manage fabrics at this level of complexity.
Key Features
Some of Intel OPA key features and innovations include:
- Adaptive routing – Monitors the performance of the possible paths between fabric end-points and selects the least congested path to rebalance the packet load. Adaptive loading scales as the fabric grow larger and more complex.
- Dispersive routing – A critical role of fabric management is the initialization and configuration of routes through the fabric between pairs of nodes. Intel OPA supports a variety of routing methods including defining alternate routes that disperse traffic flows for redundancy, performance, and load balancing. Dispersive routing promotes optimal routing efficiency.
- Traffic flow optimization (TFO) – This feature optimizes the quality of service of messages by reducing the variation in latency of high priority traffic in the presence of low priority traffic. This addresses a traditional weakness of both Ethernet* and InfiniBand in which the transmission of a low priority message must be completed once the link starts even if a higher priority message becomes available. Traffic flow optimization allows higher priority messages to request a pause and be inserted into the data stream before the lower priority message is completed.
- Packet integrity protection (PIP) – Allows for rapid and transparent recovery of transmission errors between a sender and a receiver on an Intel OPA link. This approach eliminates the need for transport level timeouts and end-to-end retries and avoids the heavy latency penalty associated with other error recovery schemes.
- Dynamic lane scaling (DLS)– This feature allows operations to continue even if one or more lanes of a 4x link fail, eliminating the need to restart or go to a previous checkpoint to keep the application running. The job can then be run to completion before taking action to resolve the issue.
A New Age in HPC
Intel Scalable Systems Framework, with its advanced Intel Omni-Path Architecture fabric, is helping to usher in a new age in high performance computing – what has been called the “HPC everywhere” era. The democratization of HPC is finally becoming a reality.
*Other trademarks and brands may be the property of others.