Startup Provides a New Twist on Reconfigurable Supercomputing

By Michael Feldman

November 17, 2008

The HPC community has been dabbling with Field Programmable Gate Arrays (FPGAs) for several years now, but the technology has never reached escape velocity. The attraction of reconfigurable computing has kept the supercomputing crowd dreaming, but clunky and non-standard programming environments, lack of FPGA chip real estate for 64-bit floating point operations, and I/O bandwidth limitations have inhibited their use in mainstream HPC. The common refrain of “FPGAs are the future of supercomputing and always will be” seemed destined to be a permanent joke.

Convey Hybrid CoreBut at SC08 this week, startup Convey Computer Corp. launched a new server and software stack that aims to tame FPGAs and deliver reconfigurable computing to everyday HPC users. In a nutshell, the company has developed a “hybrid core” server, the HC-1, which wraps FPGAs into a reconfigurable coprocessor that runs alongside a standard multicore x86 CPU. The CPU and coprocessor can be programmed with standard C/C++ and Fortran. Essentially, you can take legacy code, run it through the Convey compiler, and out pops an executable that runs an order of magnitude faster on a Convey box than it would on an x86 system.

Convey is brainchild of Steve Wallach, co-founder and CTO of Convex Computer, a company that developed vector supercomputers back in the 80s and 90s. (In case you were wondering, yes, Convey = Convex+1.) Since programming vector processors was a pain for users, Convex developed automatic vectorizing compilers to enable standard codes to take advantage of their machines. In 1995, the company was bought out by HP and eventually Wallach hopped on the consulting circuit, selling his computing expertise to the government and IT venture capitalists.

Steve Wallach

His idea for hybrid core computing was born out of conversations with his contemporaries at Intel and Xilinx. Wallach convinced them that he would be able to take their commodity processors and create an innovative and commercially-viable platform for HPC users. Both Intel Capital and Xilinx became investors in Convey, along with CenterPoint Ventures, InterWest Partners and Rho Ventures. The initial funding amounted to $15.1 million.

Wallach, now the chief scientist at Convey, tapped some of the Convex alumni and assembled a 28-person team to get the new company off the ground. The Convey engineers resurrected the Convex auto-vectorization model with a new twist: using FPGAs as reconfigurable acceleration engines. But the idea of insulating the developer from the hardware is the same. “Our view is that you should be able to program in standard Fortran, C and C++,” says Wallach. So no extra language keywords, extensions, or special APIs are required to extract the extra performance from the FPGA-based coprocessor. According to Wallach, “you should put the burden on the compiler to do all the heavy lifting.”

This is a departure from most other HPC accelerator-based systems, where proprietary language or runtime API extensions are needed to tap the non-CPU hardware. Environments like CUDA (for GPUs) or ImpulseC (for FPGAs) rely on extended forms of C, which means legacy code must be ported before it can be accelerated. It also means newly developed code is tied to a particular architecture or must rely on a configuration management system to maintain separate source trees. All of that translates into lost human productivity.

On the hardware side, Convey’s principle architectural innovation is tightly coupling the x86 CPU with the reconfigurable coprocessor. To accomplish this, the Convey engineers designed a server with a CPU and multi-FPGA coprocessor that share the same view of virtual memory. The x86 is used mostly for scalar logic and the coprocessor is used for vector acceleration, while taking advantage of the FPGA’s ability to be tuned to workload-specific instruction streams. Since the coprocessor implements virtual memory and cache coherence, no data has to be shuffled back and forth between the CPU and externally connected FPGAs.

Convey Hybrid Core ComputingThe way the coprocessor is reconfigured for different applications is by loading the FPGAs with a “personality,” which describe an instruction set that has been optimized for a specific workload. For example, there could be different personalities for bioinformatics, CFD, financial analytics, and seismic processing. If you had a financial analytic calculation where you wanted to see the results with different interest rates or with random numbers plugged in, your application would require double-precision function units and instructions to facilitate such operations as random number generation and exponentiation, square roots and logarithms. Other applications like seismic processing require single-precision, complex floating point instructions.

At compile-time, the developer selects a command-line switch to specify the appropriate personality for the application source. Based on the switch, the compiler extracts the parallelism from the source code by generating the personality’s extended instructions intermixed with x86 instructions, as appropriate. Prior to execution, the OS configures the FPGAs by loading the personality image corresponding to the extended instruction set.

At any one time, the coprocessor executes a single personality. In most cases, this will be sufficient for an entire application. But the FPGAs can be dynamically reconfigured during execution if an application embodies multiple types of workloads. A personality switch takes on the order of hundreds of milliseconds. The idea is that unless your application has a umm… “personality disorder,” switching occurs relatively infrequently during execution — basically during program startup or application phase changes.

There is also the ability for developers to build “procedural” personalities, which implement entire routines that are invoked like procedures or functions. To do this, a programmer will need to employ the Personality Development Kit (insert your own geek joke here) supplied by Convey.

The base hardware is a 2U rack-mountable server containing two sockets — one for an Intel CPU and one for the coprocessor. The coprocessor contains a host interface, three or four FPGA (Xilinx Virtex-5) chips, and a memory controller. The host interface encapsulates the communication with the CPU, instruction fetching and decoding, plus a common set of scalar op-codes for the coprocessor. The first version of the system will employ Intel’s front-side bus to talk to the coprocessor. But with Nehalem processors just around the corner, Convey already has plans in place for a QuickPath Interconnect-based system.

The memory controller manages a high bandwidth memory subsystem, which is incorporated into the CPU’s virtual memory space. It uses 16 DDR2 memory channels to deliver an aggregate bandwidth of 80 GB/sec. That’s a lot faster than what is currently available on an Intel Harpertown system and is even faster than what will be available on next year’s Nehalem chips. At these speeds, the controller is able to transfer individual 64-bit words (as opposed to just entire cache lines), which is how a vector processor would like to be fed.

Innovation doesn’t come cheap. An HC-1 server retail for around $32,000. But the pitch is that since an average HPC app can be accelerated 10x on this platform, each HC-1 is equivalent to 10 vanilla x86 boxes. If true that would translate to significant savings for system acquisition costs, as well as power and cooling.

UCSD is an early customer, using the HC-1 to accelerate a proteomics application, called InsPecT. Scientists there expect to achieve a 16x speedup with the new system. Pavel Pevzner, director of UCSD’s Center for Computational Mass Spectrometry, says a single rack of HC-1 servers can replace eight racks of conventional servers at the center.

How well the Convey platform performs over a range of HPC codes remains to be seen. And introducing a new company with a new architecture certainly has some risks, especially in this economy. But Wallach thinks he’s got a winner and seems undeterred about launching into a headwind. “The way you make money and be successful is to be a contrarian,” he says.

Steve Wallach will be honored at SC08 with IEEE’s Seymour Cray Award. For more about Wallach, see our in-depth interview with him in today’s issue.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Q&A with Altair CEO James Scapa, an HPCwire Person to Watch in 2021

May 14, 2021

Chairman, CEO and co-founder of Altair James R. Scapa closed several acquisitions for the company in 2020, including the purchase and integration of Univa and Ellexus. Scapa founded Altair more than 35 years ago with two Read more…

HLRS HPC Helps to Model Muscle Movements

May 13, 2021

The growing scale of HPC is allowing simulation of more and more complex systems at greater detail than ever before, particularly in the biological research spheres. Now, researchers at the University of Stuttgart are le Read more…

Behind the Met Office’s Procurement of a Billion-Dollar Microsoft System

May 13, 2021

The UK’s national weather service, the Met Office, caused shockwaves of curiosity a few weeks ago when it formally announced that its forthcoming billion-dollar supercomputer – expected to be the most powerful weather and climate-focused supercomputer in the world when it launches in 2022... Read more…

AMD, GlobalFoundries Commit to $1.6 Billion Wafer Supply Deal

May 13, 2021

AMD plans to purchase $1.6 billion worth of wafers from GlobalFoundries in the 2022 to 2024 timeframe, the chipmaker revealed today (May 13) in an SEC filing. In the face of global semiconductor shortages and record-high demand, AMD is renegotiating its Wafer Supply Agreement and bumping up capacity. Read more…

Hyperion Offers Snapshot of Quantum Computing Market

May 13, 2021

The nascent quantum computer (QC) market will grow 27 percent annually (CAGR) reaching $830 million in 2024 according to an update provided today by analyst firm Hyperion Research at the HPC User Forum being held this we Read more…

AWS Solution Channel

Numerical weather prediction on AWS Graviton2

The Weather Research and Forecasting (WRF) model is a numerical weather prediction (NWP) system designed to serve both atmospheric research and operational forecasting needs. Read more…

Hyperion: HPC Server Market Ekes 1 Percent Gain in 2020, Storage Poised for ‘Tipping Point’

May 12, 2021

The HPC User Forum meeting taking place virtually this week (May 11-13) kicked off with Hyperion Research’s market update, covering the 2020 period. Although the HPC server market had been facing a 6.7 percent COVID-re Read more…

Behind the Met Office’s Procurement of a Billion-Dollar Microsoft System

May 13, 2021

The UK’s national weather service, the Met Office, caused shockwaves of curiosity a few weeks ago when it formally announced that its forthcoming billion-dollar supercomputer – expected to be the most powerful weather and climate-focused supercomputer in the world when it launches in 2022... Read more…

AMD, GlobalFoundries Commit to $1.6 Billion Wafer Supply Deal

May 13, 2021

AMD plans to purchase $1.6 billion worth of wafers from GlobalFoundries in the 2022 to 2024 timeframe, the chipmaker revealed today (May 13) in an SEC filing. In the face of global semiconductor shortages and record-high demand, AMD is renegotiating its Wafer Supply Agreement and bumping up capacity. Read more…

Hyperion Offers Snapshot of Quantum Computing Market

May 13, 2021

The nascent quantum computer (QC) market will grow 27 percent annually (CAGR) reaching $830 million in 2024 according to an update provided today by analyst fir Read more…

Hyperion: HPC Server Market Ekes 1 Percent Gain in 2020, Storage Poised for ‘Tipping Point’

May 12, 2021

The HPC User Forum meeting taking place virtually this week (May 11-13) kicked off with Hyperion Research’s market update, covering the 2020 period. Although Read more…

IBM Debuts Qiskit Runtime for Quantum Computing; Reports Dramatic Speed-up

May 11, 2021

In conjunction with its virtual Think event, IBM today introduced an enhanced Qiskit Runtime Software for quantum computing, which it says demonstrated 120x spe Read more…

AMD Chipmaker TSMC to Use AMD Chips for Chipmaking

May 8, 2021

TSMC has tapped AMD to support its major manufacturing and R&D workloads. AMD will provide its Epyc Rome 7702P CPUs – with 64 cores operating at a base cl Read more…

Fast Pass Through (Some of) the Quantum Landscape with ORNL’s Raphael Pooser

May 7, 2021

In a rather remarkable way, and despite the frequent hype, the behind-the-scenes work of developing quantum computing has dramatically accelerated in the past f Read more…

IBM Research Debuts 2nm Test Chip with 50 Billion Transistors

May 6, 2021

IBM Research today announced the successful prototyping of the world's first 2 nanometer chip, fabricated with silicon nanosheet technology on a standard 300mm Read more…

AMD Chipmaker TSMC to Use AMD Chips for Chipmaking

May 8, 2021

TSMC has tapped AMD to support its major manufacturing and R&D workloads. AMD will provide its Epyc Rome 7702P CPUs – with 64 cores operating at a base cl Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

CERN Is Betting Big on Exascale

April 1, 2021

The European Organization for Nuclear Research (CERN) involves 23 countries, 15,000 researchers, billions of dollars a year, and the biggest machine in the worl Read more…

HPE Launches Storage Line Loaded with IBM’s Spectrum Scale File System

April 6, 2021

HPE today launched a new family of storage solutions bundled with IBM’s Spectrum Scale Erasure Code Edition parallel file system (description below) and featu Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

Saudi Aramco Unveils Dammam 7, Its New Top Ten Supercomputer

January 21, 2021

By revenue, oil and gas giant Saudi Aramco is one of the largest companies in the world, and it has historically employed commensurate amounts of supercomputing Read more…

Quantum Computer Start-up IonQ Plans IPO via SPAC

March 8, 2021

IonQ, a Maryland-based quantum computing start-up working with ion trap technology, plans to go public via a Special Purpose Acquisition Company (SPAC) merger a Read more…

Leading Solution Providers

Contributors

AMD Launches Epyc ‘Milan’ with 19 SKUs for HPC, Enterprise and Hyperscale

March 15, 2021

At a virtual launch event held today (Monday), AMD revealed its third-generation Epyc “Milan” CPU lineup: a set of 19 SKUs -- including the flagship 64-core, 280-watt 7763 part --  aimed at HPC, enterprise and cloud workloads. Notably, the third-gen Epyc Milan chips achieve 19 percent... Read more…

Can Deep Learning Replace Numerical Weather Prediction?

March 3, 2021

Numerical weather prediction (NWP) is a mainstay of supercomputing. Some of the first applications of the first supercomputers dealt with climate modeling, and Read more…

Livermore’s El Capitan Supercomputer to Debut HPE ‘Rabbit’ Near Node Local Storage

February 18, 2021

A near node local storage innovation called Rabbit factored heavily into Lawrence Livermore National Laboratory’s decision to select Cray’s proposal for its CORAL-2 machine, the lab’s first exascale-class supercomputer, El Capitan. Details of this new storage technology were revealed... Read more…

African Supercomputing Center Inaugurates ‘Toubkal,’ Most Powerful Supercomputer on the Continent

February 25, 2021

Historically, Africa hasn’t exactly been synonymous with supercomputing. There are only a handful of supercomputers on the continent, with few ranking on the Read more…

GTC21: Nvidia Launches cuQuantum; Dips a Toe in Quantum Computing

April 13, 2021

Yesterday Nvidia officially dipped a toe into quantum computing with the launch of cuQuantum SDK, a development platform for simulating quantum circuits on GPU-accelerated systems. As Nvidia CEO Jensen Huang emphasized in his keynote, Nvidia doesn’t plan to build... Read more…

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

The History of Supercomputing vs. COVID-19

March 9, 2021

The COVID-19 pandemic poses a greater challenge to the high-performance computing community than any before. HPCwire's coverage of the supercomputing response t Read more…

Microsoft to Provide World’s Most Powerful Weather & Climate Supercomputer for UK’s Met Office

April 22, 2021

More than 14 months ago, the UK government announced plans to invest £1.2 billion ($1.56 billion) into weather and climate supercomputing, including procuremen Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire