Arm Yourselves for Exascale, Part 1

By Michael Wolfe

November 9, 2011

Today’s largest HPC systems are dominated (492 of the Top 500) by processors using two instruction sets (x86, Power) from three vendors (Intel, AMD, IBM). These processors have been typically designed for the highest single thread performance, but suffer from high cost (several hundred dollars to over $1500) and power demand (around 60-100W). As we build even larger and higher performance systems moving towards exascale, we might explore other avenues for delivering cost-efficient compute performance and reducing the power consumed by these systems.

In particular, there are at least three good reasons to explore whether processors designed for mobile systems can play a role in HPC, which I call innovation, federation and customization. Innovation, because the future of computing innovation is not on the desktop or in servers, but in ubiquitous computing, the internet of things. Federation, because embedded processors, like ARM-architecture devices, are available from a variety of vendors, thus freeing customers from single suppliers, allowing outstanding price and feature competition and increased innovation and flexibility. Customization, because the mobile market thrives on various manifestations of customization, and we in HPC might be able to take advantage of that.

Here, we use ARM-architecture processors as representative of mobile system processors, if only because ARM so dominates that space, though other possible processors include x86 (Intel Atom and AMD Geode), IBM PowerPC, MIPS, even embedded SPARC.

Innovation in the Post-PC World

In 2007, Steve Jobs predicted an upcoming explosion of “post-PC devices,” using the iPod as an example. He didn’t mean to suggest that the PC was dying or was doomed to eventual extinction, any more than PCs killed off workstations or mainframes. He meant that the growth seen in the PC industry was unlikely to continue at the same pace, and, as we’ve seen recently, the new growth path has been moving to phones, tablets, and other mobile, untethered, networked devices. This means that the innovators which have driven the PC world to ever greater capabilities have been moving to these new post-PC devices and ubiquitous computing. Hardware innovation is tending towards smaller, lower power devices.

Why do we care? Historically, supercomputers have been built with the devices available at the time. The first Cray-1 used four types of semiconductor chips: two types of NOR gates (fast for logic, bigger but slower for memory fanout), and two types of static RAM (fast for registers, slower but bigger for memory), and lots of wire. Contemporary supercomputers were built with essentially mainframe technology, mostly by the mainframe manufacturers.

A number of research parallel processors were designed, built and productively used in the 1980s using essentially workstation technology: printed circuit boards usually populated with commodity processors and connected by a high speed network. By the mid-1990s, massively parallel processors using RISC chips dominated the Top 500 supercomputer list.

In 2000, Intel introduced the Pentium 4, adding the double precision vector SSE2 instructions to the x86 family. This made the x86 a viable candidate for real supercomputing. Given the cost advantages of using high volume parts, more parallel supercomputers were designed using Intel and AMD processors. Within four years, over half the Top 500 supercomputers used some flavor of x86 processors, and that number is now close to 90%.

The cost of developing viable processors customized for general-purpose HPC is prohibitive, requiring system architects to use the best available commodity processors. Perhaps the one exception to that rule is IBM, which designed a special PowerPC chip for Blue Gene, though they adapted an existing commodity embedded processor rather than building a bespoke processor. When commodity innovation moves to the mobile world, we in the HPC industry may have to look at mobile processors as potentially the most cost effective solutions to our compute problems.

The Federation vs. the Empire

ARM, Ltd. doesn’t actually produce and sell chips. ARM licenses the core IP to vendors who include ARM cores in their own products. Most of these designs are Systems-on-chip (SOCs), including much of the glue logic on the same chip as the processor, as well as application-specific logic. This makes for better integration and lower part count for the eventual customer.

An SOC for a cell phone might include a DSP or two for audio encode/decode, a graphics driver for the display, interface for the keyboard, and radio components in addition to the main processor. An SOC for automotive electronic stability control might have interfaces for wheel speed sensors, accelerometers, an interface to control the brakes, and perhaps even a temperature sensor.

ARM processor deliveries are far ahead of x86 and PowerPC processor deliveries each year in units. The architecture is solid and viable. Moreover, there are a number of chip vendors building and supplying parts with ARM architecture cores, giving customers a broad choice of supplier. No one vendor can control availability or price, and there’s no fear of depending on a single source that may choose to change direction or that may not survive the long term. The ARM architecture may be the only viable candidate for an alternative processor to x86 and Power.

In the mobile world, standardization on ARM cores as the control processor has produced the same benefits that standardization on x86 has given the desktop. There are many choices for software ranging from operating systems, tools and applications ready to use for ARM processors. There is an army of trained programmers comfortable with programming, optimizing and tuning for ARM processors. There are a plethora of hardware devices that have been designed to work with ARM processors, though most of these would be integrated on the SOC.

There are two types of ARM licensees. Most vendors take the ARM core IP and integrate it directly into their own products; such an ARM core will be instruction-set compatible regardless of the vendor. Some vendors acquire an ARM architecture license, allowing them to augment their own ARM implementations. This gives them additional freedom to innovate or add extensions for particular target markets.

Customization for HPC

Within the ARM world, there is a high level of architectural variety. Among the more than 250 ARM microcontrollers in its catalog, STMicroelectronics, PGI’s parent company, offers one 32-bit ARM microcontroller that draws about 10 milliamps when running at its full speed (32 MHz), and can be scaled to lower clocks and voltages to draw even less current.

The latest high end ARM Cortex-A15 design supports one to four cores, SIMD floating point, up to 4MB level 2 cache, and up to 1TB (40 bits) of memory address space. Note there are no Cortex A15 MCUs available yet, though several are in the works. This architectural variety is a real strength of ARM in the mobile market; a designer can choose a version with all the necessary features, and without any unnecessary baggage, and keep within a desired size or power envelope.

As specific examples, let’s look at two current ARM processor offerings. One is the SPEAr chip from STMicroelectronics. The high end SPEAr 1340 has two ARM Cortex-A9 cores with up to 600MHz clock, 512KB level 2 cache, a Gigabit Ethernet port, a PCIe link, one SATA port, 2 USB ports, controllers for flash memory, interfaces for memory card, touch screen, small (6×6) keyboard, 7.1 channel sound, LCD controller, HD video decoders, digital video input, cryptographic accelerator, analog-digital converters, and various other IO features. The SPEAr is clearly designed for use in a multimedia device, and is optimized for low power.

The second is the ARMADA XP from Marvell; Marvell acquired the XScale business from Intel in 2006. The ARMADA XP is a relatively new product aimed directly at cloud computing. This chip has up to four ARM cores, up to 1.6GHz clock, 2MB shared level 2 cache, interface to DDR2 or DDR3 memory, four Gigabit Ethernet ports, four PCI-E ports, three USB 2.0 ports, two SATA ports, LCD controller, flash memory interface, UART, and more.

You could design either of those ARM chips right onto a small motherboard with memory and a disk and package a bunch of them into a 1U rack mount server. However, in the HPC space, do we really need USB ports, touch screen interfaces, and LCD controllers? Removing those from the chip might allow more room for more cores, or something more interesting.

The real potential for ARM architecture in HPC, and the third important reason to explore ARM, is the possibility to generate custom parts. Perhaps we could design the InfiniBand drivers right on the chip. Maybe we could add hardware support for quad-precision, which David Bailey and his colleagues predicted we’d want ten years or more ago. There may be an ability to add operations specific to certain markets, such as bioinformatics or financial.

Some of the more exciting systems over the past decade are custom designs, including Anton at D.E. Shaw Research, and the MDGRAPE-3 machine at RIKEN in Japan. In each case, custom design gives a significant performance advantage, but at high development cost, including fully custom software. Imagine if we could achieve similar performance advantages for specific applications, but retain most of the design and software development cost advantages of using standard chips.

In the mobile ARM space, there are different levels of customization. A fully custom chip would have a number of ARM cores, caches, memory interfaces, perhaps Ethernet or other ports, and maybe even some custom logic. The ARM architecture supports a coprocessor interface, so custom logic could be configured and controlled directly from software, just like early floating point units were. Even the ARM cores themselves can be customized by selecting a specific ARM version, or adding extensions like the NEON SIMD instructions.

The design of such a chip is easy on paper, but requires a long sequence of steps and perhaps a year or more before it comes out of fabrication and packaging. The design must be turned into RTL, laid out, verified, qualified on the technology to be used, a mask created, the chip fabricated and then tested. This takes both considerable time and money.

In the mobile space, the time and money is justified by very high volumes. Consider that Apple sold over 20 million iPhones and 9 million iPads in a single quarter this year. A custom chip in an iPhone or iPad would be justified based on that volume.

A second level is exemplified by the STMicroelectronics SPEAr chip mentioned above. ST offers these with a customizable logic block. During development, the customer would design and experiment with FPGA logic. When ready, the RTL from the FPGA is used to customize the on-chip logic. Because the chip is already designed with the custom logic block in place, validation is only required for the logic block, which takes only a few months.

A third level will be supported with the advent of through-silicon vias (TSVs). One obvious use for TSVs is to stack a memory chip on a processor chip, allowing lower latency, and higher bandwidth with many chip interconnections. But another important possibility is the ability to stack an FPGA or custom logic chip between a processor and memory, to be used like a coprocessor.

Summary

It’s a good time to explore alternatives to current standard processors for HPC, for at least three reasons. First, the HPC market can’t afford to develop its own processors, so it has to adopt the best of the commodity market, and the innovation in that market is moving to mobile. Second, ARM processors are by far the most popular 32-bit processors today and will soon have 64-bit versions available; moreover, there are many suppliers of ARM architecture processors, so if we’re going to look for a viable alternative, ARM is the leading (perhaps the only) candidate. Third, the potential for customization either broadly for the HPC market, or narrowly for specific applications, could give significant benefits to HPC that we simply can’t get from current commodity offerings. Add to these the potential cost and power advantages, and we’d be negligent if we don’t study this now.

This doesn’t mean that it’s inevitable, or an easy decision. There are several challenges and missing pieces that need to be filled in along the way. That will be the topic of my next article.

About the Author

Michael Wolfe has developed compilers for over 30 years in both academia and industry, and is now a senior compiler engineer at The Portland Group, Inc. (www.pgroup.com), a wholly-owned subsidiary of STMicroelectronics, Inc. The opinions stated here are those of the author, and do not represent opinions of The Portland Group, Inc. or STMicroelectronics, Inc.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

IDG to Be Bought by Chinese Investors; IDC to Spin Out HPC Group

January 19, 2017

US-based publishing and investment firm International Data Group, Inc. (IDG) will be acquired by a pair of Chinese investors, China Oceanwide Holdings Group Co., Ltd. Read more…

By Tiffany Trader

Weekly Twitter Roundup (Jan. 19, 2017)

January 19, 2017

Here at HPCwire, we aim to keep the HPC community apprised of the most relevant and interesting news items that get tweeted throughout the week. Read more…

By Thomas Ayres

France’s CEA and Japan’s RIKEN to Partner on ARM and Exascale

January 19, 2017

France’s CEA and Japan’s RIKEN institute announced a multi-faceted five-year collaboration to advance HPC generally and prepare for exascale computing. Among the particulars are efforts to: build out the ARM ecosystem; work on code development and code sharing on the existing and future platforms; share expertise in specific application areas (material and seismic sciences for example); improve techniques for using numerical simulation with big data; and expand HPC workforce training. It seems to be a very full agenda. Read more…

By Nishi Katsuya and John Russell

ARM Waving: Attention, Deployments, and Development

January 18, 2017

It’s been a heady two weeks for the ARM HPC advocacy camp. At this week’s Mont-Blanc Project meeting held at the Barcelona Supercomputer Center, Cray announced plans to build an ARM-based supercomputer in the U.K. while Mont-Blanc selected Cavium’s ThunderX2 ARM chip for its third phase of development. Last week, France’s CEA and Japan’s Riken announced a deep collaboration aimed largely at fostering the ARM ecosystem. This activity follows a busy 2016 when SoftBank acquired ARM, OpenHPC announced ARM support, ARM released its SVE spec, Fujistu chose ARM for the post K machine, and ARM acquired HPC tool provider Allinea in December. Read more…

By John Russell

HPE Extreme Performance Solutions

Remote Visualization: An Integral Technology for Upstream Oil & Gas

As the exploration and production (E&P) of natural resources evolves into an even more complex and vital task, visualization technology has become integral for the upstream oil and gas industry. Read more…

Women Coders from Russia, Italy, and Poland Top Study

January 17, 2017

According to a study posted on HackerRank today the best women coders as judged by performance on HackerRank challenges come from Russia, Italy, and Poland. Read more…

By John Russell

Spurred by Global Ambitions, Inspur in Joint HPC Deal with DDN

January 17, 2017

Inspur, the fast-growth cloud computing and server vendor from China that has several systems on the current Top500 list, and DDN, a leader in high-end storage, have announced a joint sales and marketing agreement to produce solutions based on DDN storage platforms integrated with servers, networking, software and services from Inspur. Read more…

By Doug Black

Weekly Twitter Roundup (Jan. 12, 2017)

January 12, 2017

Here at HPCwire, we aim to keep the HPC community apprised of the most relevant and interesting news items that get tweeted throughout the week. Read more…

By Thomas Ayres

NSF Seeks Input on Cyberinfrastructure Advances Needed

January 12, 2017

In cased you missed it, the National Science Foundation posted a “Dear Colleague Letter” (DCL) late last week seeking input on needs for the next generation of cyberinfrastructure to support science and engineering. Read more…

By John Russell

IDG to Be Bought by Chinese Investors; IDC to Spin Out HPC Group

January 19, 2017

US-based publishing and investment firm International Data Group, Inc. (IDG) will be acquired by a pair of Chinese investors, China Oceanwide Holdings Group Co., Ltd. Read more…

By Tiffany Trader

France’s CEA and Japan’s RIKEN to Partner on ARM and Exascale

January 19, 2017

France’s CEA and Japan’s RIKEN institute announced a multi-faceted five-year collaboration to advance HPC generally and prepare for exascale computing. Among the particulars are efforts to: build out the ARM ecosystem; work on code development and code sharing on the existing and future platforms; share expertise in specific application areas (material and seismic sciences for example); improve techniques for using numerical simulation with big data; and expand HPC workforce training. It seems to be a very full agenda. Read more…

By Nishi Katsuya and John Russell

ARM Waving: Attention, Deployments, and Development

January 18, 2017

It’s been a heady two weeks for the ARM HPC advocacy camp. At this week’s Mont-Blanc Project meeting held at the Barcelona Supercomputer Center, Cray announced plans to build an ARM-based supercomputer in the U.K. while Mont-Blanc selected Cavium’s ThunderX2 ARM chip for its third phase of development. Last week, France’s CEA and Japan’s Riken announced a deep collaboration aimed largely at fostering the ARM ecosystem. This activity follows a busy 2016 when SoftBank acquired ARM, OpenHPC announced ARM support, ARM released its SVE spec, Fujistu chose ARM for the post K machine, and ARM acquired HPC tool provider Allinea in December. Read more…

By John Russell

Spurred by Global Ambitions, Inspur in Joint HPC Deal with DDN

January 17, 2017

Inspur, the fast-growth cloud computing and server vendor from China that has several systems on the current Top500 list, and DDN, a leader in high-end storage, have announced a joint sales and marketing agreement to produce solutions based on DDN storage platforms integrated with servers, networking, software and services from Inspur. Read more…

By Doug Black

For IBM/OpenPOWER: Success in 2017 = (Volume) Sales

January 11, 2017

To a large degree IBM and the OpenPOWER Foundation have done what they said they would – assembling a substantial and growing ecosystem and bringing Power-based products to market, all in about three years. Read more…

By John Russell

UberCloud Cites Progress in HPC Cloud Computing

January 10, 2017

200 HPC cloud experiments, 80 case studies, and a ton of hands-on experience gained, that’s the harvest of four years of UberCloud HPC Experiments. Read more…

By Wolfgang Gentzsch and Burak Yenier

A Conversation with Women in HPC Director Toni Collis

January 6, 2017

In this SC16 video interview, HPCwire Managing Editor Tiffany Trader sits down with Toni Collis, the director and founder of the Women in HPC (WHPC) network, to discuss the strides made since the organization’s debut in 2014. Read more…

By Tiffany Trader

BioTeam’s Berman Charts 2017 HPC Trends in Life Sciences

January 4, 2017

Twenty years ago high performance computing was nearly absent from life sciences. Today it’s used throughout life sciences and biomedical research. Genomics and the data deluge from modern lab instruments are the main drivers, but so is the longer-term desire to perform predictive simulation in support of Precision Medicine (PM). There’s even a specialized life sciences supercomputer, ‘Anton’ from D.E. Shaw Research, and the Pittsburgh Supercomputing Center is standing up its second Anton 2 and actively soliciting project proposals. There’s a lot going on. Read more…

By John Russell

AWS Beats Azure to K80 General Availability

September 30, 2016

Amazon Web Services has seeded its cloud with Nvidia Tesla K80 GPUs to meet the growing demand for accelerated computing across an increasingly-diverse range of workloads. The P2 instance family is a welcome addition for compute- and data-focused users who were growing frustrated with the performance limitations of Amazon's G2 instances, which are backed by three-year-old Nvidia GRID K520 graphics cards. Read more…

By Tiffany Trader

US, China Vie for Supercomputing Supremacy

November 14, 2016

The 48th edition of the TOP500 list is fresh off the presses and while there is no new number one system, as previously teased by China, there are a number of notable entrants from the US and around the world and significant trends to report on. Read more…

By Tiffany Trader

Vectors: How the Old Became New Again in Supercomputing

September 26, 2016

Vector instructions, once a powerful performance innovation of supercomputing in the 1970s and 1980s became an obsolete technology in the 1990s. But like the mythical phoenix bird, vector instructions have arisen from the ashes. Here is the history of a technology that went from new to old then back to new. Read more…

By Lynd Stringer

For IBM/OpenPOWER: Success in 2017 = (Volume) Sales

January 11, 2017

To a large degree IBM and the OpenPOWER Foundation have done what they said they would – assembling a substantial and growing ecosystem and bringing Power-based products to market, all in about three years. Read more…

By John Russell

Container App ‘Singularity’ Eases Scientific Computing

October 20, 2016

HPC container platform Singularity is just six months out from its 1.0 release but already is making inroads across the HPC research landscape. It's in use at Lawrence Berkeley National Laboratory (LBNL), where Singularity founder Gregory Kurtzer has worked in the High Performance Computing Services (HPCS) group for 16 years. Read more…

By Tiffany Trader

Dell EMC Engineers Strategy to Democratize HPC

September 29, 2016

The freshly minted Dell EMC division of Dell Technologies is on a mission to take HPC mainstream with a strategy that hinges on engineered solutions, beginning with a focus on three industry verticals: manufacturing, research and life sciences. "Unlike traditional HPC where everybody bought parts, assembled parts and ran the workloads and did iterative engineering, we want folks to focus on time to innovation and let us worry about the infrastructure," said Jim Ganthier, senior vice president, validated solutions organization at Dell EMC Converged Platforms Solution Division. Read more…

By Tiffany Trader

Lighting up Aurora: Behind the Scenes at the Creation of the DOE’s Upcoming 200 Petaflops Supercomputer

December 1, 2016

In April 2015, U.S. Department of Energy Undersecretary Franklin Orr announced that Intel would be the prime contractor for Aurora: Read more…

By Jan Rowell

Enlisting Deep Learning in the War on Cancer

December 7, 2016

Sometime in Q2 2017 the first ‘results’ of the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) will become publicly available according to Rick Stevens. He leads one of three JDACS4C pilot projects pressing deep learning (DL) into service in the War on Cancer. Read more…

By John Russell

Leading Solution Providers

D-Wave SC16 Update: What’s Bo Ewald Saying These Days

November 18, 2016

Tucked in a back section of the SC16 exhibit hall, quantum computing pioneer D-Wave has been talking up its new 2000-qubit processor announced in September. Forget for a moment the criticism sometimes aimed at D-Wave. This small Canadian company has sold several machines including, for example, ones to Lockheed and NASA, and has worked with Google on mapping machine learning problems to quantum computing. In July Los Alamos National Laboratory took possession of a 1000-quibit D-Wave 2X system that LANL ordered a year ago around the time of SC15. Read more…

By John Russell

CPU Benchmarking: Haswell Versus POWER8

June 2, 2015

With OpenPOWER activity ramping up and IBM’s prominent role in the upcoming DOE machines Summit and Sierra, it’s a good time to look at how the IBM POWER CPU stacks up against the x86 Xeon Haswell CPU from Intel. Read more…

By Tiffany Trader

Nvidia Sees Bright Future for AI Supercomputing

November 23, 2016

Graphics chipmaker Nvidia made a strong showing at SC16 in Salt Lake City last week. Read more…

By Tiffany Trader

Beyond von Neumann, Neuromorphic Computing Steadily Advances

March 21, 2016

Neuromorphic computing – brain inspired computing – has long been a tantalizing goal. The human brain does with around 20 watts what supercomputers do with megawatts. And power consumption isn’t the only difference. Fundamentally, brains ‘think differently’ than the von Neumann architecture-based computers. While neuromorphic computing progress has been intriguing, it has still not proven very practical. Read more…

By John Russell

The Exascale Computing Project Awards $39.8M to 22 Projects

September 7, 2016

The Department of Energy’s Exascale Computing Project (ECP) hit an important milestone today with the announcement of its first round of funding, moving the nation closer to its goal of reaching capable exascale computing by 2023. Read more…

By Tiffany Trader

Dell Knights Landing Machine Sets New STAC Records

November 2, 2016

The Securities Technology Analysis Center, commonly known as STAC, has released a new report characterizing the performance of the Knight Landing-based Dell PowerEdge C6320p server on the STAC-A2 benchmarking suite, widely used by the financial services industry to test and evaluate computing platforms. The Dell machine has set new records for both the baseline Greeks benchmark and the large Greeks benchmark. Read more…

By Tiffany Trader

BioTeam’s Berman Charts 2017 HPC Trends in Life Sciences

January 4, 2017

Twenty years ago high performance computing was nearly absent from life sciences. Today it’s used throughout life sciences and biomedical research. Genomics and the data deluge from modern lab instruments are the main drivers, but so is the longer-term desire to perform predictive simulation in support of Precision Medicine (PM). There’s even a specialized life sciences supercomputer, ‘Anton’ from D.E. Shaw Research, and the Pittsburgh Supercomputing Center is standing up its second Anton 2 and actively soliciting project proposals. There’s a lot going on. Read more…

By John Russell

What Knights Landing Is Not

June 18, 2016

As we get ready to launch the newest member of the Intel Xeon Phi family, code named Knights Landing, it is natural that there be some questions and potentially some confusion. Read more…

By James Reinders, Intel

  • arrow
  • Click Here for More Headlines
  • arrow
Share This