Texas Instruments Puts ARM-DSP Processors Into Play for HPC

By Michael Feldman

November 20, 2012

NVIDIA, Intel and AMD were not the only chip vendors unveiling new HPC accelerators last week SC12. Texas Instruments (TI) announced a set of heterogeneous processors that they believe will offer among the best performance per watt in the industry. In this case, the chipmaker glued an ARM CPU and digital signal processor (DSP) together on the same die, offering a low-power SoC with an impressive number of FLOPS.

This represents TI’s second attempt to push a wedge into the high performance computing space. The company made its initial foray into the market in October 2011 when it introduced its multicore Keystone DSPs (TMS320C66x). The primary destination of those chips was 4G cellular base stations and radio network controllers, but since floating point functionality had to be added to serve that market, TI felt the same silicon could double as HPC accelerators.

One of the problems with the standalone DSP devices being used for HPC was that the application kernels had to be offloaded from a CPU host to the DSP. That wasn’t because the DSPs couldn’t run a whole application (the DSP is closer to a manycore CPU than a GPU), but because there was no Linux OS or MPI library ports for the architecture. ARM, though, had support for both of these pieces of software, allowing developers to use a traditional driver-accelerator model.

There are actually six new SoCs being introduced by TI, two of which are ARM-only (no DSP integration) that are aimed at powering routers, switches, wireless appliances, and other networking devices. The four remaining parts are the ARM-DSP heterogenous chips. These heterogeneous chips are fully tricked-out SoCs, with an ARM Cortex A15 CPU, a Keystone DSP, a shared memory controller, an integrated fabric and an I/O interface. The fabric itself is a custom design from TI, known as TeraNet, which delivers a low latency, multi-terabit/second fabric that connects the ARM CPU, DSP and memory controller.

Of the four heterogeneous, two are high-end parts – the 66AK2H06 and 66AK2H12 – targeted to high performance computing, as well as media processing, video analytics, gaming, VDI, and radar. The 66AK2H12 4-core ARM/8-core DSP is the more powerful of the two. It offers 198 gigaflops of single precision (SP) floating point performance or 70 gigaflops in double precision. That includes the DSP floating point as well as the Neon FP unit in the ARM CPU.

Although, this ARM-DSP SoC represents only about half the FLOPS of a high-end x86 CPU, the TI chip delivers this in about one-tenth the power – 13 to 14 watts. For single precision, that works out to about 16 SP gigaflops per watt, which is about the same as last year’s stand-alone 8-core DSP chip, sans CPU. It’s also nearly as good as latest NVIDIA’s K10 Tesla part, which delivers about 20 SP gigaflops per watt.

Since the ARM CPU is 32-bit architecture, memory reach for these chips is limited. In fact, each SoC can only access up to 16 GB – not much compared to standard x86 CPU, but about twice as much as a traditional accelerator. The hetero chips, though, don’t need an external CPU to feed it, as the K10 does; the on-chip ARM serves as the host driver. This eliminates the PCIe communication overhead of a CPU hooked to an discrete accelerator.

And since the ARM and DSP units share some of the same memory, it can at least potentially simplify programming of these devices. In that sense, it’s closer to AMD’s Fusion (or APU) architecture, which glues an x86 CPU and GPU onto the same die. At this point though, the AMD offerings are being targeted for client devices, such as laptops, rather than servers.

TI is actually not making so much of a distinction in where their chips will end up. According to Arnon Friedmann, TI’s business manager for the multicore processors unit, the same SoCs targeted for servers could also be applied to embedded devices. For example, a sensor network of cameras doing video surveillance could use an ARM-DSP chip to do some local image processing; the output of which could then be shunted to a server farm of these same chips to perform deeper analytics on the pre-processed video.

“That’s a level of scalability that we think our devices bring, which others in HPC don’t offer today,” Friedmann told HPCwire. “So if you look at NVIDIA [GPUs] and Intel MIC, there really aren’t cut-down versions of these really high performance devices and they’re not quite as geared for embedded as we are.”

For HPC-type developers, TI offers both OpenMP and MPI. The chipmaker also has an alpha version of OpenCL that supports an ARM CPU that can work in conjunction with the on-chip DSP. Down the road, TI is looking to support the newly hatched OpenMP accelerator directives, which are expected to be officially codified in the standard sometime next year.

As with the other accelerators from NVIDIA, Intel and AMD vying for HPC business, the success of the TI parts will depend upon how easy they are to program and how much application performance ensues. Regarding the latter, there is already some encouraging news. According to Friedmann, an FFT kernel from an aperture radar code produced performance on par with that of a GPU, but when they moved the entire application to the chip, performance was boosted 8-fold. Friedmann says interested parties are looking to do similar ports for even larger applications.

Right now, the chipmaker is trying to bring in more HPC users to move their MPI codes over to their ARM-DSP SoCs in order to drum up interest from server makers to build hardware. In the meantime, for do-it-yourselfers, TI’s two SoCs aimed at HPC are available for sampling now. Broader availability is expected in the first quarter of 2013, with general availability of evaluation modules coming in the second quarter.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Nvidia Debuts Turing Architecture, Focusing on Real-Time Ray Tracing

August 16, 2018

From the SIGGRAPH professional graphics conference in Vancouver this week, Nvidia CEO Jensen Huang unveiled Turing, the company's next-gen GPU platform that introduces new RT Cores to accelerate ray tracing and new Tenso Read more…

By Tiffany Trader

HPC Coding: The Power of L(o)osing Control

August 16, 2018

Exascale roadmaps, exascale projects and exascale lobbyists ask, on-again-off-again, for a fundamental rewrite of major code building blocks. Otherwise, so they claim, codes will not scale up. Naturally, some exascale pr Read more…

By Tobias Weinzierl

STAQ(ing) the Quantum Computing Deck

August 16, 2018

Quantum computers – at least for now – remain noisy. That’s another way of saying unreliable and in diverse ways that often depend on the specific quantum technology used. One idea is to mitigate noisiness and perh Read more…

By John Russell

HPE Extreme Performance Solutions

Introducing the First Integrated System Management Software for HPC Clusters from HPE

How do you manage your complex, growing cluster environments? Answer that big challenge with the new HPC cluster management solution: HPE Performance Cluster Manager. Read more…

IBM Accelerated Insights

Super Problem Solving

You might think that tackling the world’s toughest problems is a job only for superheroes, but at special places such as the Oak Ridge National Laboratory, supercomputers are the real heroes. Read more…

NREL ‘Eagle’ Supercomputer to Advance Energy Tech R&D

August 14, 2018

The U.S. Department of Energy (DOE) National Renewable Energy Laboratory (NREL) has contracted with Hewlett Packard Enterprise (HPE) for a new 8-petaflops (peak) supercomputer that will be used to advance early-stage R&a Read more…

By Tiffany Trader

STAQ(ing) the Quantum Computing Deck

August 16, 2018

Quantum computers – at least for now – remain noisy. That’s another way of saying unreliable and in diverse ways that often depend on the specific quantum Read more…

By John Russell

NREL ‘Eagle’ Supercomputer to Advance Energy Tech R&D

August 14, 2018

The U.S. Department of Energy (DOE) National Renewable Energy Laboratory (NREL) has contracted with Hewlett Packard Enterprise (HPE) for a new 8-petaflops (peak Read more…

By Tiffany Trader

CERN Project Sees Orders-of-Magnitude Speedup with AI Approach

August 14, 2018

An award-winning effort at CERN has demonstrated potential to significantly change how the physics based modeling and simulation communities view machine learni Read more…

By Rob Farber

Intel Announces Cooper Lake, Advances AI Strategy

August 9, 2018

Intel's chief datacenter exec Navin Shenoy kicked off the company's Data-Centric Innovation Summit Wednesday, the day-long program devoted to Intel's datacenter Read more…

By Tiffany Trader

SLATE Update: Making Math Libraries Exascale-ready

August 9, 2018

Practically-speaking, achieving exascale computing requires enabling HPC software to effectively use accelerators – mostly GPUs at present – and that remain Read more…

By John Russell

Summertime in Washington: Some Unexpected Advanced Computing News

August 8, 2018

Summertime in Washington DC is known for its heat and humidity. That is why most people get away to either the mountains or the seashore and things slow down. H Read more…

By Alex R. Larzelere

NSF Invests $15 Million in Quantum STAQ

August 7, 2018

Quantum computing development is in full ascent as global backers aim to transcend the limitations of classical computing by leveraging the magical-seeming prop Read more…

By Tiffany Trader

By the Numbers: Cray Would Like Exascale to Be the Icing on the Cake

August 1, 2018

On its earnings call held for investors yesterday, Cray gave an accounting for its latest quarterly financials, offered future guidance and provided an update o Read more…

By Tiffany Trader

Leading Solution Providers

SC17 Booth Video Tours Playlist

Altair @ SC17

Altair

AMD @ SC17

AMD

ASRock Rack @ SC17

ASRock Rack

CEJN @ SC17

CEJN

DDN Storage @ SC17

DDN Storage

Huawei @ SC17

Huawei

IBM @ SC17

IBM

IBM Power Systems @ SC17

IBM Power Systems

Intel @ SC17

Intel

Lenovo @ SC17

Lenovo

Mellanox Technologies @ SC17

Mellanox Technologies

Microsoft @ SC17

Microsoft

Penguin Computing @ SC17

Penguin Computing

Pure Storage @ SC17

Pure Storage

Supericro @ SC17

Supericro

Tyan @ SC17

Tyan

Univa @ SC17

Univa

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This