HPCwire

The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing

HPCwire >> Features

NVIDIA Unveils Teraflop GPU Computing


NVIDIA has announced two new Tesla-branded GPU computing products at ISC'08, continuing the company's efforts to move into the HPC market. The new products are based on NVIDIA's next generation 10-series GPU processor architecture. The T10P processor unveiled today offers double precision float point support, more local memory, plus much higher overall performance. NVIDIA is touting the new 10-series chip as the second generation processor for CUDA, the company's GPU computing development platform.

The T10P, which is built on 55nm process technology, doubles the capability of the previous generation Tesla offerings, which were based the 8-series NVIDIA architecture. The new GPU has twice the FP precision (32-bit to 64-bit) and the raw compute performance (500 gigaflops to 1 teraflop). It's important to note that the teraflop figure is single precision performance; double precision performance is delivered at a much more modest 100 gigaflops.


NVIDIA T10P

The T10P also nearly doubles the number of cores from 128 to 240. The new processor is an evolution of the 8- and 9-series GPUs, and like those older processors, allows NVIDIA to share the same componentry across the Quadro and GeForce product lines. Because of the common architecture, CUDA is able to maintain backward and cross compatibility for applications, and also allows the user software to be independent of the number of cores on the chip. The CUDA driver queues up the application threads and the hardware does the fine-grained mapping of the threads to the processing cores at runtime. So the same CUDA app can run on a cluster, a workstation or a notebook, as long as they contain recent vintage NVIDIA hardware.

Each of the 240 cores in the T10P is implemented as a "thread processor" with an integer unit, floating point unit, and a register file. Eight thread processors are arranged in a thread processor array, which shares a special functions unit (transcendental and other functions) a double precision (DP) floating point unit, and 16KB of shared memory that works at cache speed. Except for the DP unit, the design is the same as the NVIDIA's 8-series GPU architecture.

In addition to the performance and memory bumps, the T10P will also benefit from a wider memory interface (512 bits), faster memory I/O (102 GB/sec), and upgraded I/O interface (PCIe x16 Gen2). But it's the DP capability that will make HPC users take notice, especially now that the latest IBM Cell processor (PowerXCell 8i) and AMD FireStream GPU now boast DP capability. The absence of double precision FP support has limited Tesla's potential market, especially in certain financial and scientific realms where applications need 64-bit floating point math.

The disparity between single and double floating point performance on the T10P reflects a trade-off that NVIDIA made between cost and capability. It also reflects the fact that a lot of HPC users can use 32-bit floating point to eke out more performance, jumping into the slower double precision calculations only when necessary. Nonetheless, the T10P's 100 DP gigaflops is in the same ballpark as IBM's PowerXCell 8i, which achieves nearly 109 DP gigaflops, and the brand new ClearSpeed CSX700 processor at 96 gigaflops. However, the new AMD FireStream 9250 GPU breaks out of the pack at 200 DP gigaflops.

The T10P will end up in two new Tesla products: the S1070, a 1U box to be hooked up to HPC servers; and the C1060, an accelerator card for high performance desktop systems. They are being priced aggressively: MSRP for the S1070 is $7,995, a couple of thousand less than the first generation Tesla S870; while MSRP for the C1060 is $1699, $400 less that the previous desktop offering.

The S1070 puts four 1.5 GHz T10P devices in a standard 1U chassis, yielding 4 teraflops of single precision performance plus 16 GB of on-board memory. If the host has a couple of free PCIe 2.0 slots, two S1070 boxes can be attached, producing an 8 teraflop computer node in a 3U space. The large on-board 16 GB of memory (4 GB per T10P) will help minimize the number of host memory transfers, which slow down application performance when data sets are large.

A single S1070 draws 700 watts when heavily loaded, compared to about 550 watts for the previous generation S870 offering. But since NVIDIA has doubled the FLOPS, that represents much better performance per watt. At 700 watts, the company is pushing the upper end of the power envelope for a 1U box -- most Xeon or Opteron servers are in the 400W-500W range. But NVIDIA believes most users they're going after are more concerned with compute density and FLOPS/watt than they are their electric bill.

The C1060 card is for technical workstations and packs a single T10P GPU. With a slightly slower clock (1.33 GHz) on the GPU than the server offering, peak performance tops out at around 887 single precision gigaflops, with double precision proportionately less. The slower clock was necessary to keep the device inside of 160 watts, a more reasonable thermal envelope on a desktop.

NVIDIA hopes to parlay the new products into an expanded footprint in the HPC market. Although the company isn't sharing unit sales of the first generation Tesla boxes, Geoff Ballew, product manager for the Tesla Server group, did say they have around 250 HPC customers on CUDA platforms spread across the usual suspects of HPC verticals: oil & gas, finance, medical, digital content, and research.

"Oil and gas is an area where we've had tremendous success," says Ballew, "one, because the price of a barrel of oil keeps going up, so they're very motivated to use new tools to find more oil. But it's also been one where their problem is nicely aligned with our [solution], and they've been scratching their heads on how to get the performance they want out of traditional clusters."

Examples of some of the larger Tesla installations include Hess, NCSA, JFCOM, SAIC, University of Illinois, University of North Carolina, Max Plank Institute, Rice University, University of Maryland, GusGus, Eotvas University, University of Wuppertal, IPE/Chinese Academy of Sciences, and a number of unnamed Cell phone manufacturers. Ballew assured me that he had a lot more customers that he couldn't talk about yet.

NVIDIA has an even broader base of users that could drive future Tesla sales. The company estimates they have 70 million CUDA-capable GPUs -- Tesla, GeForce, and Quadro -- deployed and more than 60 thousand CUDA downloads. If the company can move some percentage of these grassroots customers onto Tesla platforms, they'll have a steady supply of new customers.

The Tesla products announced today won't go into production until August, so we'll see only demo systems at ISC this week. But NVIDIA is hinting that Tesla-equipped supercomputers could appear on the November TOP500 list, with perhaps even a system that breaks into the top 20.

HPCwire on Twitter

Article Tools

  • Print This Page
  • Bookmark This Article

Share Options

(Digg, Technorati, more)


Subscribe

Discussion

There are 0 discussion items posted.  

HPC in the Cloud Part 2
People to Watch 2010


Top Headlines

Australia Commissions Cray Supercomputer

Mar 19 | OfficialWire | New super to support intelligence work Down Under. Read more...

Intel Partners See 'Easy' Upgrade Path With Xeon 5600 Chips

Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...

AMD: OEMs primed for Opteron 6100s

Mar 17 | The Register | But what about the tier ones? Read more...

Arrival of the Desktop Supercomputer

Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...

Scheduling HPC In The Cloud

Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...

Featured Whitepapers

Virtualization for Aggregation And The vSMP Architecture™

Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.

Copper Cable Technologies for High Performance Computing

Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.

Multimedia

Webcast: Virtualized Data Center Roundtable

Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.

Webcast: Watch SC09 Birds of a Feather Video: Scalable Fault-Tolerant HPC Supercomputers

Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.

Webcast: High Performance Computing for a Smarter Planet

LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html

SC09 HPC in the Cloud

Newsletters

Stay informed! Subscribe to HPCwire email Newsletters.






HPC Job Bank


Featured Events

HPC User Forum DICE
2010 High Performance Computing Linux Financial Markets
Cloud Computing Expo
Cloud Lab
ESC
DEISA PRACE Symposium