Most people associate high performance computing with those big multi-rack supercomputers humming away in national labs. But if you’re in the HPC community, you know that the vast majority of systems are much smaller — commodity clusters made up of a handful of nodes, or perhaps dozens, or even hundreds. Now though, HPC technology is making its way into even smaller system, in particular, embedded devices and appliances.
An article penned by Mentor Graphics’ Pete Decher that appeared this week in EE Times describes the trend, noting that “with the introduction of more compact and more powerful embedded processors, embedded systems are becoming HPC capable.”
The beneficiaries of this technology are medical devices (MRI and CT imaging), military and aerospace systems (e.g. radar and navigation), automotive computers (collision avoidance), and even handheld consumer devices (voice recognition). The use of high performance hardware and software is not completely new to the embedded space of course, but recent advances in processor technology are giving the industry access to computational power that used to only be available in HPC clusters. Decher writes:
All of this is possible due to advancements in processing hardware. What we’re seeing now is what the military and aerospace community call commercial off-the-shelf or COTS, which usually connotes commodity-type devices that are capable of high-performance computing. Companies like Intel, Freescale, NVIDIA, Xilinx, and TI are creating an explosion of new devices targeted at HPC applications. Intel recently introduced its new multicore Sandy Bridge class of devices (2nd Generation iCore processors) with Advanced Vector (math) eXtensions called AVX. In the same timeframe, Intel has also introduced its new Many Integrated Cores (MIC) processor architecture. Code named “Knights Corner”, this architecture supports the interconnection of 50 Larrabee class cores. Freescale recently introduced a new generation of high-end multicore Power PC chips called the QorIQ AMP Series, with a re-introduction of an improved AltiVec vector processing accelerator. The new QorIQ architecture can support up to 24 virtual cores per chip.
Then there is the whole GPGPU phenomenon, courtesy of NVIDIA and AMD, that already delivers more than a teraflop of single precision floating point performance in a single chip. Xilinx and Altera are introducing devices that integrate FPGA logic with multicore CPUs. (For example, the Zynq-7000 from Xilinx has a dual-core ARM A9 processor with the Neon vector accelerator, plus an FPGA fabric.). Along those same lines are Texas Instruments’ Integra line, which integrates a C6x DSP with an ARM Cortex A8 CPU.
The downside to all these new architectural wonders is programming complexity. Decher says the difficulty of software development on heterogeneous platforms is high, noting that “typical embedded software development costs are exceeding well over 50% of the entire system cost.” A side-effect of this complexity is software portability, since, for example, programs developed for GPUs typically aren’t interchangeable with those developed for say DSPs.
Ideally, says Decher, you would have access to high-level libraries that were hardware-independent, enabling applications to be easily ported from one platform to another. But in the midst of all this microprocessor diversity, that’s probably not completely attainable.
Software issues aside, Decher sees an expansive future for HPC in the embedded space. As more and more flops becomes available in these chips, their use will penetrate into every imaginable device with a need for compute-intensive work.