The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
January 06, 2009
Do It Yourself Hardware Acceleration
For all the benefits claimed by hardware acceleration, from exponential performance improvements to massive power and space savings, most of these benefits focus on what can be accomplished with little detail on how to accomplish it. Hardware acceleration always seems to have the implied acronym DIY (Do It Yourself).
Most of the time, this either means purchasing someone else's proprietary hardware and software, implementing algorithms at the far end of the system bus, and hoping that the partner's roadmap aligns with your evolving goals. Or it means developing your own boards, custom hardware, custom software, custom interfaces, and custom protocols while maintaining expertise in all of these fields. In this scenario, designers are not just taking advantage of hardware to accelerate their software; they are doing full hardware design, plain and simple.
These have been the obstacles of hardware acceleration for more than a decade and most software developers have found it best to ride Moore's Law of continual improvement, waiting for the next generation processor, rather than venturing into the realm of hardware acceleration.
As has been well publicized, the door to continual improvement has closed, however in turn this has opened the door to hardware acceleration. Hardware acceleration will look vastly different five years from now than it does today. For those who are not watching it closely, it will probably look different next month. Any acceleration path must not only be revolutionary in what it provides, it must also be evolutionary!
From Revolutionary...
Let's start with the revolution! At the forefront of this revolution there has been the responsiveness of AMD and Intel to open their processor interconnects. This allows hardware to move from being an add-on attachment isolated at the far end of a PCI bus or some other distant extension to sitting next to the CPU as an equal. With AMD's Torrenza Initiative and Intel's QuickAssist Technology, hardware accelerators now have a low-latency way to communicate with the processor, as well as direct access to system memory.
On the accelerator side, both AMD and Intel have embraced the idea of in-socket accelerators (ISAs). By taking an FPGA, which essentially is configurable hardware that can be programmed to implement custom functions, and placing it on a board that plugs directly into a processor socket, existing multi-processor systems can now be converted to processor and accelerator systems without new board design. XtremeData's XD2000 modules utilizing Altera's Stratix FPGAs, are an example of how users are allowed to leverage existing boards and systems, whether they are multi-CPU desktops, blade servers, or ATCA cards. Now the entire hardware acceleration platform can be developed with COTS components and boards.
When designing within this revolution, developers may choose to custom build their own bridge for their specific software/hardware interface. Besides requiring extensive work and an understanding of software/hardware co-design, development is now brought down to the physical implementation where the designer's work is locked to a specific processor and in-socket accelerator. This not only inhibits any flexibility for the designer to try different existing architectures, but most importantly, this makes it difficult to upgrade as new processors, bridges, and in-socket accelerators become available.
With any newly embraced technology, things change quickly. Intel's processor interconnect is moving from the Front Side Bus (FSB) to the QuickPath Interconnect (QPI). AMD is riding the evolution of the HyperTransport interconnect technology (HT) standard that is continually being updated. FPGAs are increasing dramatically in density as they move to 40nm process technology and beyond. In such an environment, any development work risks being too specific to a given technology node and being left behind as new process technologies are emerging.
To Evolutionary...
To be evolutionary, a hardware accelerator must not only provide a bridge to the CPU, but one that can evolve with new technology. At the most basic level, hardware accelerators need to cloak the underlying hardware, so the processor talks software, the accelerator talks hardware, and yet they can easily communicate with each other.

(Digg, Technorati, more)
Jul 09 | Engineer Live | The demand for computational tools to underpin the 3D seismic interpretation process has never been more apparent. Read more...
Jul 08 | EE Times | Unemployment for U.S. engineers has reached record levels, according to government figures. Read more...
Jul 08 | Network World | Global spending for 2009 projected to drop 6 percent, for a total of $3.2 trillion. Read more...
Jul 08 | Linux Magazine | Portability or efficiency? Neither is guaranteed when writing explicit parallel code. Read more...
Jul 07 | Ars Technica | Japanese company builds custom ASIC to accelerate real-time ray traced rendering for the auto industry. Read more...
Apr 14 | | Many HPC IT departments are feeling the rising pressure to deliver more capacity computing and performance while trying to reduce the total cost of ownership. This white paper discusses how an environmentally-friendly and open-standards HPC building block based computing system using flexible interconnect options helps address capacity computing needs.
Source: Addison Snell, GM/VP, Tabor Research; sponsored by Dell
Many organizations that could benefit from the use of HPC clusters find that it is complicated to get the systems up and running because of limited IT resources or the complexities of the clusters themselves. Learn how the Intel Cluster Ready program, for which Dell was an original partner, seeks to address this challenge for entry level and mid-range HPC users.
BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.
Sun Studio Compilers and Tools and Sun HPC ClusterTools allow you to create high performance parallel applications for OpenSolaris, Solaris and Linux. Sun Studio Express 11/08 includes MPI performance analysis capabilities and full OpenMP 3.0 compiler support. Learn about all this and the latest in Sun HPC ClusterTools 8.1.