Microsoft is boosting the high-performance computing (HPC) capabilities of its Windows Azure cloud, in keeping with the Big Compute strategy it first revealed during SC12 in November 2012.
Just a week after the tech stalwart lowered prices to better compete with cloud king Amazon, it is debuting new high-end instances for its Windows Azure Cloud Services and updated software for its Windows Server product, all aimed squarely at the high-performance computing camp. Alex Sutton, Group Program Manager, Windows Azure Big Compute, talked up the enhancements in a recent blog entry.
The Windows Azure Cloud Service now includes two new compute-intensive virtual machine sizes. Known as A8 and A9, they are Azure’s most performant instances to date. The A8 instance comes with 8 Intel virtual processor cores and 56 GB of RAM, while A9 comes with 16 such cores and 112 GB of memory. The instance family also includes 40 Gbps InfiniBand networking for low-latency and high-throughput communication.
The new offerings round out the family of instance types available as part of Azure’s Cloud Services offering, with A0 through A4 comprising the “Standard Instance” group, A5 through A7 making up the “Memory-Intensive Instances” set, and now A8 and A9 as the “Compute Intensive Instances,” providing “faster processors, faster interconnect, more virtual cores for higher compute power, [and] larger amounts of memory.”
The new instance type actually employs two interconnect protocols. Traditional Ethernet is the link to Azure Storage, CDN, and other Windows Azure services or solutions, while a 40 Gbps InfiniBand network connects compute instances within the same Cloud Services deployment. Furthermore, the InfiniBand network employs remote direct memory access (RDMA) technology for maximum efficiency of parallel MPI applications, an enhancement that Microsoft first previewed more than a year ago, when it debuted its Big Compute strategy.
The new instances target all the usual modeling and simulation suspects that require fast computation and low latency networking. As Sutton explains: “These instances are designed for compute-intensive workloads, particularly High Performance Computing (HPC) applications such as computational fluid dynamics, finite element analysis, and weather forecasting. Manufacturing, energy exploration, life sciences, and other industries all require HPC for innovation and will benefit from this new offering.”
Microsoft has been tooting the low latency horn on its Big Compute offerings since it unveiled the product set at SC12 in November 2012. Key to the effort is an InfiniBand network that supports remote direct memory access (RDMA) communication between compute nodes. By virtualizing RDMA through Hyper-V Microsoft achieved bare metal performance of less than 3 microsecond latency and more than 3.5 gigabytes per second bandwidth. Sutton explains that the RDMA capabilities are currently only supported on Microsoft’s implementation of the Message Passing Interface (MS-MPI), but the company and its partners are working to extend the RDMA capabilities to other MPI stacks, and to support it on Linux virtual machines.
Currently, the new instance types are only available on Windows Azure Cloud Services, which is Azure’s Platform as a Service (PaaS) offering, but support for Virtual Machines (Azure’s IaaS) is under way. Availability for the new instances is limited to US North Central and Europe West as the company builds out more regions.
Microsoft also announced the roll out of HPC Pack 2012 R2, its cluster management and job scheduler solution for Windows Server clusters. This upgrade to HPC Pack 2012 with SP1 clusters can be used for new Window HPC cluster installations. According to Microsoft, it offers “improved reliability, support on Windows Server 2012 R2 and Windows 8.1, as well as an enhanced feature set for Windows Azure integration, job scheduling, and cluster management.” Sutton adds that “A8 and A9 instances are supported, and this is the easiest way to test applications that use Microsoft MPI and Network Direct for low-latency RDMA networking.”