GPUs Add Up For ARM Chips In HPC

By Timothy Prickett Morgan

June 23, 2014

The first wave of credible 64-bit ARM processors are coming to market late this year or early next, and as is usually the case, the high-performance computing community is getting first crack at figuring out how these chips might be deployed to run various kinds of simulations more efficiently or cost effectively.

Applied Micro, which has first mover status in the 64-bit ARM server chip race with its X-Gene 1, is teaming up with Nvidia, maker of the Tesla GPU accelerators, at the International Super Computing conference in Leipzig, Germany to promote X-Gene and Tesla as the first of several dynamic duos. Three vendors – Cirrascale, E4 Computer Engineering, and Eurotech – are also previewing hybrid ARM-Tesla systems at the conference, and others will no doubt follow soon as more ARM chips come to market towards the end of this year and into early next year.

Given the ubiquity of Xeon processors in the supercomputing space, Nvidia has to integrate well with rival Intel’s Xeon processors and has to compete against the Xeon Phi parallel X86 coprocessors, too. But Nvidia, like many system buyers, wants a second or third option when it comes to processors, and that is why Nvidia was a founding member of the OpenPower Foundation, which seeks to establish multiple sources of IBM’s Power8 and follow-on processors and to link accelerators tightly to them. Nvidia is also waving the ARM banner high as well, and wants to be the accelerator of choice for ARMv8 platforms.

“GPUs make 64-bit ARM competitive in HPC on day one,” explains Ian Buck, general manager of GPU computing software at Nvidia. “We are clearly seeing viable and compelling ARM64 platforms coming online. It is obvious that there is excitement around ARM, and there are two reasons for that. One is that we haven’t had new, innovative CPUs for a while. Some of the ARM architectures are going up to 24 cores, and they are playing with what is on die, what is off, and Broadcom and Cavium come from the networking world and there are lots of networking angles they can play. The second reason for the excitement is choice. ARM represents choice, and a very diverse one.

nvidia-arm-hpc

While network devices like to have plenty of threads, the chips used in such gear are not generally equipped with lots of floating point math processing capability, says Buck. Nvidia, you can quickly guess, wants its Tesla to be the coprocessor of choice for 64-bit ARM platforms. Having created the CUDA programming environment, which supports 64-bit ARM chips starting with the 6.5 release, and a library of hundreds of third party simulation and analytics workloads to hybrid processor-GPU, Nvidia thinks it is well placed to help customers port their applications to ARM-Tesla hybrids.

“Based on our experience with ARM to date, the porting seems to go fairly quickly if you have well-structured code,” says Buck. “A lot of HPC codes have been around long enough that they don’t have a lot of intrinsics in there, the X86isms, and code seems to move fairly easily. If the code is already GPU-accelerated, then the performance just carries straight over. These ARM64 chips can drive full GPU performance.”

Applied Micro is going to have plenty of competition in the ARMv8 processor space, with AMD, Cavium, and Broadcom all putting forth very strong contenders to go up against the hegemony of Intel’s Xeon processors and its very credible defensive position with Atom chips for modest compute and low-power needs. Intel has a substantial lead in chip manufacturing processes – something between one and two nodes, depending on how you want to count it – and is behaving as if it has a bunch of AMDs on its heels. Never before in its history has Intel been so willing to tweak its processor designs to make them better fit the workloads of supercomputing and hyperscale customers alike, from adding special instructions to Xeons to baking special versions of the Xeons that run hotter or clock higher to actually welding an FPGA into a Xeon chip, as Intel last week announced it was going to do.

This newfound openness is one way Intel is going to counter the onslaught of different 64-bit ARM processors and the various ways their makers will accelerate workloads using GPUs, DSPs, FPGAs, and other specialized circuits. In effect, Intel is adopting the malleable approach of the ARM community to defend against ARM processors.

The initial X-Gene 1 processor from Applied Micro has been sampling since early 2013, and production wafers for the chip were started at the end of March and production chips are due around now. The X-Gene 1 chip is implemented in a 40 nanometer process at Taiwan Semiconductor Manufacturing Corp; it has eight custom ARMv8 cores, designed by Applied Micro itself, on each system-on-chip. The cores on the X-Gene 1 run at 2.4 GHz, and Sanchayan Sinha, senior product manager, tells HPCwire that in terms of single-threaded performance, the X-Gene 1 has about the same level of oomph as a four-core “Haswell” Xeon E3 and about the same memory bandwidth as a “Sandy Bridge” Xeon E5.

Sinha stressed that these were very rough comparisons and that real benchmarks would eventually result in harder figures than these approximations. That is, in fact, what the development systems being shown off at ISC’14 are all about. The company is working with server partners to run the High Performance Conjugate Gradients (HPCG) benchmark, which is being proposed as a follow-on to the more widely used Linpack parallel Fortran matrix math test, on X-Gene 1 systems. Sinha says that Applied Micro and Nvidia will be able to show that an X-Gene 1 plus a Tesla K20 coprocessor will be equivalent to an X86 processor plus the same Tesla K20 floating point motor.

x-gene-1-block

The X-Gene 2 chip is a rev on the initial design and also includes eight ARM cores, but it is implemented in a 28 nanometer process at TSMC. This shrink of the process will allow Applied Micro to crank up the clock speed and add more features to its SoC. One interesting feature that the company has divulged it will add to the X-Gene 2 is support for Remote Direct Memory Access (RDMA) on the network ports on the chip. Specifically, the Ethernet ports on the chip will be able to run RDMA over Converged Ethernet (RoCE), which brings the low-latency access of InfiniBand to the Ethernet protocol. This will make the X-Gene 2 chip not only suitable for HPC workloads that are latency sensitive, but also for database, storage, and transaction processing workloads in enterprise datacenters that also like low latency.

Further out beyond this, Applied Micro has teamed up with TSMC to use its 16 nanometer FinFET 3D transistor process to create X-Gene 3. Little is known about this processor except that it will have at least 16 cores on the SoC.

This early revs of the X-Gene 1 were put on development boards called “Mustang” internally by Applied Micro and known as the X-Gene XC-1 outside of the company. The ARM-based HPC systems that are being previewed by Cirrascale and E4 Computer Engineering are based on production-grade X-Gene 1 chips and the Mustang boards.

The Cirrascale development machine puts two Mustang boards and two Tesla K20 or K20X GPU accelerators in a compact 1U server chassis:

cirrascale-x-gene

This machine is called the RM1905D in the Cirrascale product catalog, and like other Mustang board it supports a maximum of 64 GB of memory for each X-Gene 1 chip across the processor’s two memory slots. The system has four Ethernet ports: three for data and one for system management. Two of the ports for data exchange run at 1 Gb/sec and the remaining one runs at 10 Gb/sec; the management port runs at 1 Gb/sec. The Mustang board has one PCI-Express 3.0 x8 slot, which is used to link the processor to the Tesla GPU, and the chassis has room to plug in a single SATA-2 drive (a 6 Gb/sec link). Each node in the chassis has a 400 watt power supply.

The feeds and speeds of E4 Computer Engineering’s EK003 were not available at press time, but Nvidia tells HPCwire that the machine will include two X-Gene 1 system boards in a 3U enclosure that has two Tesla K20 GPU coprocessors, and that the development machine will be aimed at seismic, signal and image processing, video analytics, track analysis, Web applications, and MapReduce workloads.

Cirrascale and E4 Computer Engineering plan to ship their development machines in July, according to Nvidia.

Eurotech has a custom motherboard design using the X-Gene 1 chip that has main memory soldered onto the board to give it a very low profile and therefore high density for its ARM-based Aurora system. The compute elements in this new Aurora machine are based on what the company calls its “brick technology,” and will employ direct hot-water cooling of the components in the brick. It will include a combination of ARM processors and Tesla coprocessors. Further details for this Eurotech Aurora system were not yet available at press time, but we will hunt them down. The company expects to ship production machines later this year.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Red Hat’s Disruption of CentOS Unleashes Storm of Dissent

January 22, 2021

Five weeks after angering much of the CentOS Linux developer community by unveiling controversial changes to the no-cost CentOS operating system, Red Hat has unveiled alternatives for affected users that give them severa Read more…

By Todd R. Weiss

China Unveils First 7nm Chip: Big Island

January 22, 2021

Shanghai Tianshu Zhaoxin Semiconductor Co. is claiming China’s first 7-nanometer chip, described as a leading-edge, general-purpose cloud computing chip based on a proprietary GPU architecture. Dubbed “Big Island Read more…

By George Leopold

HiPEAC Keynote: In-Memory Computing Steps Closer to Practical Reality

January 21, 2021

Pursuit of in-memory computing has long been an active area with recent progress showing promise. Just how in-memory computing works, how close it is to practical application, and what are some of the key opportunities a Read more…

By John Russell

HiPEAC’s Vision for a New Cyber Era, a ‘Continuum of Computing’

January 21, 2021

Earlier this week (Jan. 19), HiPEAC — the European Network on High Performance and Embedded Architecture and Compilation — published the 8th edition of the HiPEAC Vision, detailing an increasingly interconnected computing landscape where complex tasks are carried out across multiple... Read more…

By Tiffany Trader

Supercomputers Assist Hunt for Mysterious Axion Particle

January 21, 2021

In the 1970s, scientists theorized the existence of axions: particles born in the hearts of stars that, when exposed to a magnetic field, become light particles, and which may even comprise dark matter. To date, however, Read more…

By Oliver Peckham

AWS Solution Channel

Fire Dynamics Simulation CFD workflow on AWS

Modeling fires is key for many industries, from the design of new buildings, defining evacuation procedures for trains, planes and ships, and even the spread of wildfires. Read more…

Researchers Train Fluid Dynamics Neural Networks on Supercomputers

January 21, 2021

Fluid dynamics simulations are critical for applications ranging from wind turbine design to aircraft optimization. Running these simulations through direct numerical simulations, however, is computationally costly. Many Read more…

By Oliver Peckham

Red Hat’s Disruption of CentOS Unleashes Storm of Dissent

January 22, 2021

Five weeks after angering much of the CentOS Linux developer community by unveiling controversial changes to the no-cost CentOS operating system, Red Hat has un Read more…

By Todd R. Weiss

HiPEAC Keynote: In-Memory Computing Steps Closer to Practical Reality

January 21, 2021

Pursuit of in-memory computing has long been an active area with recent progress showing promise. Just how in-memory computing works, how close it is to practic Read more…

By John Russell

HiPEAC’s Vision for a New Cyber Era, a ‘Continuum of Computing’

January 21, 2021

Earlier this week (Jan. 19), HiPEAC — the European Network on High Performance and Embedded Architecture and Compilation — published the 8th edition of the HiPEAC Vision, detailing an increasingly interconnected computing landscape where complex tasks are carried out across multiple... Read more…

By Tiffany Trader

Saudi Aramco Unveils Dammam 7, Its New Top Ten Supercomputer

January 21, 2021

By revenue, oil and gas giant Saudi Aramco is one of the largest companies in the world, and it has historically employed commensurate amounts of supercomputing Read more…

By Oliver Peckham

President-elect Biden Taps Eric Lander and Deep Team on Science Policy

January 19, 2021

Last Friday U.S. President-elect Joe Biden named The Broad Institute founding director and president Eric Lander as his science advisor and as director of the Office of Science and Technology Policy. Lander, 63, is a mathematician by training and distinguished life sciences... Read more…

By John Russell

Pat Gelsinger Returns to Intel as CEO

January 14, 2021

The Intel board of directors has appointed a new CEO. Intel alum Pat Gelsinger is leaving his post as CEO of VMware to rejoin the company that he parted ways with 11 years ago. Gelsinger will succeed Bob Swan, who will remain CEO until Feb. 15. Gelsinger previously spent 30 years... Read more…

By Tiffany Trader

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

By John Russell

Intel ‘Ice Lake’ Server Chips in Production, Set for Volume Ramp This Quarter

January 12, 2021

Intel Corp. used this week’s virtual CES 2021 event to reassert its dominance of the datacenter with the formal roll out of its next-generation server chip, the 10nm Xeon Scalable processor that targets AI and HPC workloads. The third-generation “Ice Lake” family... Read more…

By George Leopold

Esperanto Unveils ML Chip with Nearly 1,100 RISC-V Cores

December 8, 2020

At the RISC-V Summit today, Art Swift, CEO of Esperanto Technologies, announced a new, RISC-V based chip aimed at machine learning and containing nearly 1,100 low-power cores based on the open-source RISC-V architecture. Esperanto Technologies, headquartered in... Read more…

By Oliver Peckham

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

By John Russell

Azure Scaled to Record 86,400 Cores for Molecular Dynamics

November 20, 2020

A new record for HPC scaling on the public cloud has been achieved on Microsoft Azure. Led by Dr. Jer-Ming Chia, the cloud provider partnered with the Beckman I Read more…

By Oliver Peckham

NICS Unleashes ‘Kraken’ Supercomputer

April 4, 2008

A Cray XT4 supercomputer, dubbed Kraken, is scheduled to come online in mid-summer at the National Institute for Computational Sciences (NICS). The soon-to-be petascale system, and the resulting NICS organization, are the result of an NSF Track II award of $65 million to the University of Tennessee and its partners to provide next-generation supercomputing for the nation's science community. Read more…

Is the Nvidia A100 GPU Performance Worth a Hardware Upgrade?

October 16, 2020

Over the last decade, accelerators have seen an increasing rate of adoption in high-performance computing (HPC) platforms, and in the June 2020 Top500 list, eig Read more…

By Hartwig Anzt, Ahmad Abdelfattah and Jack Dongarra

Aurora’s Troubles Move Frontier into Pole Exascale Position

October 1, 2020

Intel’s 7nm node delay has raised questions about the status of the Aurora supercomputer that was scheduled to be stood up at Argonne National Laboratory next year. Aurora was in the running to be the United States’ first exascale supercomputer although it was on a contemporaneous timeline with... Read more…

By Tiffany Trader

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

By Doug Black

Programming the Soon-to-Be World’s Fastest Supercomputer, Frontier

January 5, 2021

What’s it like designing an app for the world’s fastest supercomputer, set to come online in the United States in 2021? The University of Delaware’s Sunita Chandrasekaran is leading an elite international team in just that task. Chandrasekaran, assistant professor of computer and information sciences, recently was named... Read more…

By Tracey Bryant

Leading Solution Providers

Contributors

Top500: Fugaku Keeps Crown, Nvidia’s Selene Climbs to #5

November 16, 2020

With the publication of the 56th Top500 list today from SC20's virtual proceedings, Japan's Fugaku supercomputer – now fully deployed – notches another win, Read more…

By Tiffany Trader

Texas A&M Announces Flagship ‘Grace’ Supercomputer

November 9, 2020

Texas A&M University has announced its next flagship system: Grace. The new supercomputer, named for legendary programming pioneer Grace Hopper, is replacing the Ada system (itself named for mathematician Ada Lovelace) as the primary workhorse for Texas A&M’s High Performance Research Computing (HPRC). Read more…

By Oliver Peckham

At Oak Ridge, ‘End of Life’ Sometimes Isn’t

October 31, 2020

Sometimes, the old dog actually does go live on a farm. HPC systems are often cursed with short lifespans, as they are continually supplanted by the latest and Read more…

By Oliver Peckham

Gordon Bell Special Prize Goes to Massive SARS-CoV-2 Simulations

November 19, 2020

2020 has proven a harrowing year – but it has produced remarkable heroes. To that end, this year, the Association for Computing Machinery (ACM) introduced the Read more…

By Oliver Peckham

Nvidia and EuroHPC Team for Four Supercomputers, Including Massive ‘Leonardo’ System

October 15, 2020

The EuroHPC Joint Undertaking (JU) serves as Europe’s concerted supercomputing play, currently comprising 32 member states and billions of euros in funding. I Read more…

By Oliver Peckham

Intel Xe-HP GPU Deployed for Aurora Exascale Development

November 17, 2020

At SC20, Intel announced that it is making its Xe-HP high performance discrete GPUs available to early access developers. Notably, the new chips have been deplo Read more…

By Tiffany Trader

Nvidia-Arm Deal a Boon for RISC-V?

October 26, 2020

The $40 billion blockbuster acquisition deal that will bring chipmaker Arm into the Nvidia corporate family could provide a boost for the competing RISC-V architecture. As regulators in the U.S., China and the European Union begin scrutinizing the impact of the blockbuster deal on semiconductor industry competition and innovation, the deal has at the very least... Read more…

By George Leopold

HPE, AMD and EuroHPC Partner for Pre-Exascale LUMI Supercomputer

October 21, 2020

Not even a week after Nvidia announced that it would be providing hardware for the first four of the eight planned EuroHPC systems, HPE and AMD are announcing a Read more…

By Oliver Peckham

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This