FPGAs: The New Promise for Re-configurable Computing?

By Christopher Lazou

March 18, 2005

The UK National HPC service at the University of Manchester, in collaboration with the Ohio Supercomputer Center, recently hosted a technical symposium, sponsored by Cray and SGI, on re-configurable computing with Field Programmable Gate Arrays (FPGAs). The symposium provided the opportunity to explore the possibilities of using re-configurable computing to accelerate software applications. Several of the speakers described their early experiences with FPGAs, and technical experts in the field were at hand to explain how to use FPGAs to meet high application performance goals. Many of the speakers enthused about this new area of computing and claimed plenty of promise.

Speakers and panelists included experts from Cray Inc, Mitrion, Nallatech, NASA Langley Research Center, Ohio Supercomputer Center, SGI, Starbridge Systems, Xilinx, as well as FPGA users and researchers from several UK and European universities.

FPGAs are part of a class of devices known as PLDs (Programmable Logic Devices), which can be programmed in the field after manufacture. Re- configurable computing is a general-purpose hardware agent configured to carry out a specific task, but can be reconfigured on-demand to carry out other specific tasks.

FPGAs have been used in many embedded systems for the last 10 to 15 years. They enable implementation of an algorithm for a particular function in hardware. FPGAs are found embedded in many industrial applications including space robots (such as the recent Spirit and Opportunity rovers that landed on Mars) and also in computer PCI and I/O PADs. Newer FPGAs have many more gates and tightly integrated interface functions. They are being included in mainstream computers such as the Cray XD1 and soon in the SGI Altix systems as co-processors for accelerating specific applications. The programming paradigm is to identify a kernel from a large application legacy code, convert the underlying algorithm into FPGA binary code and execute this in hardware.

For certain types of computations needing integer or fixed point arithmetic, the benefits of FPGAS can be very significant — two orders of magnitude speed improvement compared to using a conventional cluster CPU. Identifying the inherent parallelism of the algorithm in question and utilizing the hardware gates to concisely perform the relevant calculation is often done using logical shifts and Boolean operations to achieve this speedup, but only for the kernel. The whole application would have a smaller improvement in performance thanks to Amdahl's Law. Nevertheless, some HPC applications, for example cryptography, fall into a special class of applications, which can have significant overall performance benefit. Also, the encryption users can muster large funding to entice computer vendors to enter this market. Below is a brief synopsis of some of the talks at the symposium.

Clive Walker, of Xilinx, gave a technical overview of their new FPGA, Virtex4. In the 1980s, FPGAs were a simple block of logic, but new devices have moved away from this and have tightly integrated many interfaces and connectivity functions on the same dye. They also expanded from some 60 thousand gates to 6 million gates with 24Mbytes of SRAM, 256Mbytes of SDRAM and 32/64bit PCI. The Virtex4 has increased density, lowered power consumption and price, compared to the Virtex II Pro, their previous device.

Virtex4 addressed the need for simpler interfacing designs and automatic synchronization of data and clock (data capture) for memory I/Fs, such as DDR2 SDRAM and QDR2 SRAM. It performs chip synchronization aligning incoming data and frequency division delivering 1Gbps source streams. The serial interconnect operates at speeds of up to 11Gbps. The device has a built-in equalization for signal integrity, an Xtreme DSP, a flexible connect Xilinx- FSL and a Power PC core for embedded solutions.

He went on to give a summary and a demonstration, using Virtex-II Pro as a hardware accelerator in conjunction with the Xilinx system generator.

Rebecca Krull, from Starbridge Systems Inc., a small company founded in 1998, Salt Lake City, Utah, gave a talk on the Starbridge Viva: “The development environment for high performance re-configurable computing”. Viva is a Windows-based development tool for programming Starbridge's Hypercomputers or any FPGA-based re-configurable target system running on various operating environments, including Linux, Windows and Macintosh.

Viva consists of several components. It provides a graphical language, which allows a developer to directly program FPGAs at the algorithm level, or, as low as the bit level; an editor with Windows-based drag and drop graphical environment, allowing developers to call Viva algorithms from legacy code written in C, C++ or Java; an auto-generated data interface to specify inputs and view resulting outputs interactively, helpful for debugging; an EDIF in/EDIF out, capability to import and export EDIF files; a compiler/synthesizer to translate Viva code directly into circuits; a library consisting of collections of Viva objects, such as math, memory, I/O, control, logic, structure, and signal and image processing objects. Each object is “polymorphic” with regard to the data it can manipulate. This means that the object is not limited to accept the same type or size of data the original object manipulated.

Viva can be used as a universal development environment for any hardware platform whose physical attributes are defined in a system description. It is, in essence, a device driver for the target hardware system. Viva programs are abstractions independent of hardware and can be run on any hardware system containing a system description. For example, a Viva code can be prototyped in a Windows environment using the FPGA emulation mode on an X-86 machine. It is relatively easy to use; those with hardware training learn quickly, while software people take longer.

Other vendors using Viva include Honeywell, Smiths Aerospace, and Nallatech. Starbridge is also working with Cray to port it on the Cray XD1 and with SGI to port it on the SGI Altix.

Amar Shan, from Cray, spoke about the key factors for successfully accelerating applications with re-configurable computing. He said, “Cray's mantra is sustained performance on real applications. When adding new hardware on a system, one has to evaluate the overall impact on the whole application space. In this case, how would FPGAs influence performance, especially for the traditional floating-point application? For fine grained parallelism, FPGAs have the potential to deliver two orders of magnitude speed- up on the algorithm, but current FPGAs are running at one tenth of the frequency of say an AMD Opteron, so the actual speed-up reduces by ten times. The ability to move in and out of an FPGA is another cost to be amortized in evaluating the real benefit to the application.”

The Cray XD1 is purpose-built and optimized for high performance workloads with system-wide process synchronization. Its Opteron processors are directly connected via its own RapidArray Interconnect (1TB/s Cross Bar switch), which consists of 12 custom communication processors, with 96GB/s non-blocking switching fabric, per chassis. This delivers 8GB/s bandwidth between SMPs according to Cray's 1.6 microsecond MPI latency. Each chassis presents 24 RapidArray inter-chassis links with an aggregate 48GB/s bandwidth.

The high bandwidth and low latency features were historically associated with high productivity vector systems, but the Cray XD1 has a couple of other innovative features. It provides six Xilinx Virtex-II Pro Field Programmable Gate Arrays (FPGAs) per chassis attached to the RapidArray fabric for massively parallel execution of critical algorithm components.

In the Cray XD1, FPGA is tightly coupled to the Opteron, acts like a programmable co-processor and performs vector operations. It is well suited for searching and sorting, signal processing, audio/video/image manipulation, encryption, error correction, code/decode, packet processing, random number generation and so on. According to Amar Shan, it promises orders of magnitude performance improvement for target applications. For example, FPGA implementation of RC5 Cipher Breaking is a 1000x faster than on 2.4GHz Pentium4. For Elliptic Curve Cryptography, it is 895-1300x faster than 1GHz Pentium3. When performing vehicular traffic simulation it is 300x faster on Xilinx Virtex II (XC2V6000) relative to 1.7 GHz Xeon and the Virtex II Pro (XC2VP100) is 650x faster relative to 1.7GHz Xeon.

Richard Wilson, university of Durham, UK, gave a presentation called “Simulation of Adaptive Optics for Astronomy on the Cray XD1.” Adaptive optics addresses the effects of atmospheric turbulence and image distortion. Light from distant stars or galaxies traverses space for many light years clearly focused, then enters the earth's atmosphere and gets distorted. The task of telescopes is to remove the noise from the atmosphere and correct the image. Currently, telescopes with 10-meter mirrors, such as the William Herschel in Las Palmas, use laser guided star projections for adaptive optical (AO) systems. The next generation telescopes are expected to use segmented mirrors between 60 to 100 meters diameter and this will require AO of a much larger scale. The team at Durham is simulating AO, and decided to scale their code 100X and port it on the Cray XD1. Since much of the simulation is fixed and repetitive, it can be implemented in FPGAs. The plan is to have the implementation as a black box running in FPGA. Their choice of using the Cray XD1 is because the FPGAs are tightly connected and tightly integrated with memory of the XD1, delivering very high performance. Once the AO simulation is developed and verified, the objective is for the FPGAs running the AO algorithm to be built in the telescope.

Anders Dellson, from Mitrionics AB, a company from Sweden, talked about “Fast, Flexible and Effortless Programming of FPGAs.” He described the Mitrion-C programming language (compiler and synthesizer), which enables a high-level software approach implementing computational algorithms on FPGAs. The “exceptional” acceleration is achieved through massively parallel execution at the finest grain level of the algorithm. The Mitrion compiler and debugger help the programmer in revealing parallelism inherent in the algorithm. The processor is then adapted to optimally utilize the FPGA surface, allocating processing resources (gates) where they are best needed for implementing a specific algorithm.

Anders gave a very interesting color-coded graphical demonstration using Mitrion-C. Life science and computational biology performing sequence alignment and digital image analysis are highly represented in Mitrion's customers. Other application areas include computational chemistry, encryption and automation. Mitrion based FPGA-computer platforms are offered through Nallatech, Silicon Graphics and Cray.

Mike Woodacre, chief engineer – systems architecture at SGI talked about “Re- configurable Computing Within SGI's System Architecture.” This talk covered the use of re-configurable processing elements within the NUMAflex system architecture. He addressed the benefits of providing a scalable solution for re-configurable processing, tightly coupled to the global shared memory architecture that other processing elements in the system can use.

FPGAs are becoming popular, and he went on to describe the tools and software stack developed by SGI for efficiently integrating re-configurable technology with general purpose processing. SGI roadmap envisages a heterogeneous system with globally addressable space, low latencies, high bandwidth and fast communication interconnects. Several 3rd party FPGA software language tools (environments) are to be supported on SGI Altix systems. These include, Impulse, Mitrion-C, Celoxica-C and Starbridge Viva (VHDL).

An SGI Altix system with FPGA was demonstrated at the symposium. This system is currently undergoing Beta testing in the field and is likely to be available (marketed) this summer. SGI is currently talking to potential customers who would prefer a system with many FPGAs and few CPUs.

Olaf O. Storaasli, senior research scientist at NASA Langley, enthusiastically described how NASA Langley uses a re-configurable FPGA-based Hypercomputer from Starbridge for their research and how they are addressing the solution of comprehensive engineering and scientific calculations. Two approaches are used for analyzing calculations: First, develop analysis codes in VIVA (to fully exploit parallelism) on Hypercomputer, and then use Hypercomputer to accelerate time-consuming (bottleneck) calculations.

Since NASA C++/FORTRAN legacy codes do not exploit all of the FPGA parallelism possible (hundreds of operations/cycle), the algorithms were entirely written in the VIVA language using the first approach. However, the second approach was used for a large legacy code where over 95% of the finite element equation solution computations are concentrated in a two-page FORTRAN kernel. This matrix-factor kernel was replaced by VIVA “gateware” to exploit FPGA parallelism. This VIVA kernel development involved researchers at Alpha-Star Corporation (GENOA structures code), Starbridge Systems (VIVA developers), and NASA who developed GPS Solver, used in GENOA.

NASA FPGA-based research initially focused on rapid structural analysis, but has now been extended to include linear algebra, matrix equation solution and integration (Runge-Kutta for fluid dynamics and Newmark-Beta for finite element structural mechanics). Many other algorithms have been rewritten in FPGAs including Cellular automata described in Stephen Wolfran's, “A New Kind of Science,” the traveling salesman problem, a cantilever beam optimization problem and many others with great improvement in performance.

Olaf and his colleagues have a $14.8 million grant to evaluate and develop capabilities supporting partnerships in space exploration. They plan to use re-configurable scalable computing for space applications like the Spirit and Opportunity Rovers (Mars probe) or for tele-operated Rover (robotic exploration) using Fuzzy logic.

Steve Chappell, director of applications engineering at Celoxica, gave a talk Titled, “Implementing HPC Algorithms in FPGA Accelerated Systems”. He proceeded to say that, “There are compelling reasons to consider ‘FPGA accelerators' in HPC system infrastructure. They can be a natural choice for accelerating integer-based, wide data path and massively parallel computation. Moreover, the current generation of devices can enable fast parallel floating-point calculations in many applications.”

Managing the complexity of FPGA hardware design in a predominantly software driven application sector is a particular challenge for the development of re- configurable computing applications. Chappell described a software-compiled system design methodology using the Handel-C compiler/synthesizer and their hardware-software partitioning tools that overcome most of these challenges. Unlike traditional hardware design, or block based design entry, these technologies provide a practical and familiar design flow for HPC application developers to explore the acceleration of software systems.

During the panel discussion, claims were made that FPGAs are ready to be used in HPC. It was explained that the HPC community consists of several segments. The HP (Embedded) Community has endorsed FPGAs and the governmental agencies are moving fast to embrace it (encryption); 3rd party software providers, such as Nastran providers, are waiting for hardware vendors to come on stream. This is beginning to happen with the Cray XD1, SGI Altix (soon), Starbridge, and so on.

HPC is always keen to exploit innovation if it provides real performance gains, but is FPGA technology ready for HPC? At present, the tool sets to instrument FPGAs are not mature enough to deliver seamless utilization. There is no 64-bit floating-point arithmetic on the FPGA, although this can be done using the FPGA fabric. Speakers and the panel people tended to offer the user community a choice of some pain to achieve potentially a lot of gain. Phrases such as “a change of mindset is needed” and “a new programming paradigm” were banded about. There is a need to innovate to do “better” science. There is a need to learn how to exploit these “wonderful” and potentially very rewarding FPGA devices. The consensus view is that with more silicon to play with, computer architectures are being augmented by integrating specialized devices, FPGAs and graphics cards to perform specific functions, enhancing computing power for specific application domains, without leaving the general purpose computing system environment.

Finally, these guys are serious about FPGAs. They are setting up an open FPGA organization to collect developed algorithms on FPGAs for important applications. These will involve organizing user forums with the aim of bringing together many of the players to promote FPGA technology. Kevin Wohlever, the director of the Springfield operation of the Ohio Supercomputing Center, spearheads this activity. Further details will appear on the web site: www.openfpga.org

 


(Brands and names are the property of their respective owners) Copyright: Christopher Lazou, HiPerCom Consultants, Ltd., UK. February 2005.

 

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

2024 Winter Classic: Texas Two Step

April 18, 2024

Texas Tech University. Their middle name is ‘tech’, so it’s no surprise that they’ve been fielding not one, but two teams in the last three Winter Classic cluster competitions. Their teams, dubbed Matador and Red Read more…

2024 Winter Classic: The Return of Team Fayetteville

April 18, 2024

Hailing from Fayetteville, NC, Fayetteville State University stayed under the radar in their first Winter Classic competition in 2022. Solid students for sure, but not a lot of HPC experience. All good. They didn’t Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use of Rigetti’s Novera 9-qubit QPU. The approach by a quantum Read more…

2024 Winter Classic: Meet Team Morehouse

April 17, 2024

Morehouse College? The university is well-known for their long list of illustrious graduates, the rigor of their academics, and the quality of the instruction. They were one of the first schools to sign up for the Winter Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pressing needs and hurdles to widespread AI adoption. The sudde Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Google Announces Homegrown ARM-based CPUs 

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in it Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire