The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
October 08, 2008
In the not-too-distant past, ENIAC was programmed with switches and a plugboard. Stored program computers soon followed that allowed one to write a program, load it into the computer memory, and run it. Initially, those programs had to be written in or manually translated into binary machine code, but soon assembly languages and assemblers were developed to simplify the process.
Soon followed operating systems, multiprogramming, and the concept of an application binary interface (ABI). The ABI defines the interface between an application and the operating system, libraries and other components. One aspect of an ABI is to define a calling convention, including how arguments are passed to a function and where the return value can be retrieved. For instance, the x64 ABI defines that the first six integer or pointer arguments are passed in registers (%rdi, %rsi, %rdx, %rcx, %r8, %r9); the first eight floating point arguments (single or double precision) are passed in SSE registers (%xmm0 to %xmm7); and any remaining arguments are pushed on the stack (in right-to-left order). This allows up to 14 arguments to be passed in registers, which surely captures most function calls.
But not all; WRF, the Weather Research and Forecasting Model, is used for both atmospheric research and numerical weather prediction. A version of WRF is included in the SPEC CPU2006 suite. One routine (copying from the WRF source code) is "a mixed phase ice microphysics scheme" (WSM5), with 49 arguments; it calls a subroutine (WSM52D) to handle the two-dimensional physics with 47 arguments (19 integers, 18 floating point scalars, 10 floating point arrays). Imagine writing the routine call by hand in assembly language; it takes over 100 instructions just to marshall the arguments and put them in the right place.
Instead, the computing community created higher level programming languages. While the first compiler (for the A-0 system) was more like what we would today call a loader, programming languages and compilers have progressed to where we use many higher level languages (C, Java, Fortran, others too many to enumerate) with a great increase in productivity. Much programming is done without using a textual language at all; for instance, a spreadsheet is a form of a program, and various visual programming interfaces exist. Now, the routine call in WRF with 47 arguments takes one Fortran statement, much easier to write and maintain than the corresponding assembly code:
CALL wsm52D(t, q(ims,kms,j), qci, qrs, &
w(ims,kms,j), den(ims,kms,j), &
p(ims,kms,j), delz(ims,kms,j), rain(ims,j), &
rainncv(ims,j),delt,g, cpd, cpv, rd, rv, t0c, &
ep1, ep2, qmin, &
XLS, XLV0, XLF0, den0, denr, &
cliq,cice,psat, &
j, &
ids,ide, jds,jde, kds,kde, &
ims,ime, jms,jme, kms,kme, &
its,ite, jts,jte, kts,kte )
If programming in binary is akin to using fingers and teeth, and assembly language is like using sticks and stone knives, think of higher level languages as the power tools of programming.
Enter GPUs
The earliest GPUs were hardware graphics accelerators to handle line drawing, area fill, image transfer, and so on, offloading the CPU. The adoption of standardized libraries such as OpenGL and Direct3D drove the development of hardware 3D graphics accelerators, particularly with programmable shading capability. Since 2000, the programmability of the graphics accelerator chips has improved to the point where they can be used for nongraphics applications. This has been called GPGPU (General Purpose computation on GPUs). Early GPGPU programming used the existing graphics libraries, such as OpenGL, mapping between computing concepts (array, loop, execute) and graphics concepts (texture, kernel, draw). This is truly heroic programming. It's like using a chain saw to carve blocks of ice: in the right hands, it can produce something beautiful, but one wrong mistake and all you have is ice cubes (or worse).
More recently, the programming research and development community have tried to come up with programming models that would map well onto GPUs and similar parallel computers, particularly stream programming, evidenced in several projects: StreamIt at MIT, Sh at Waterloo (which led to RapidMind), Brook at Stanford (which spun out briefly as PeakStream, and which AMD has adopted and extended as Brook+), and others.
The GPU programming model that has caught the most attention is NVIDIA's CUDA. The language is an extension to C; the software includes compiler, libraries, and many examples. There is a large user community, including a few dozen universities that use it in course work. Moreover, the software is free, though it (obviously) only targets NVIDIA GPUs.
Page: 1 of 4(Digg, Technorati, more)
White Paper: HPC in a Green and Modular Solution Building Block
Learn how the Appro GreenBlade™ System helps consolidate server, storage, network, power and simplified management capabilities in a single package while providing the performance-density, energy-efficiency and best ROI for your business.
Jul 01 | GenomeWeb Daily News | The popularity of cloud computing in the life sciences community was on full display at April's Bio-IT World conference. Read more...
Jul 01 | Linux Magazine | How can getting to the ocean help with HPC computing? Read more...
Jun 29 | GCN.com | Agency issues RFI for "Ubiquitous High Performance Computing" systems. Read more...
Jun 29 | Computerworld | The bottom of the TOP500 reveals the coming revolution in truly accessible high-end computing. Read more...
Jun 18 | EE Times | Parallel software also takes spotlight at Stanford confab. Read more...
Apr 14 | | Many HPC IT departments are feeling the rising pressure to deliver more capacity computing and performance while trying to reduce the total cost of ownership. This white paper discusses how an environmentally-friendly and open-standards HPC building block based computing system using flexible interconnect options helps address capacity computing needs.
Source: Addison Snell, GM/VP, Tabor Research; sponsored by Dell
Many organizations that could benefit from the use of HPC clusters find that it is complicated to get the systems up and running because of limited IT resources or the complexities of the clusters themselves. Learn how the Intel Cluster Ready program, for which Dell was an original partner, seeks to address this challenge for entry level and mid-range HPC users.
BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.
Sun Studio Compilers and Tools and Sun HPC ClusterTools allow you to create high performance parallel applications for OpenSolaris, Solaris and Linux. Sun Studio Express 11/08 includes MPI performance analysis capabilities and full OpenMP 3.0 compiler support. Learn about all this and the latest in Sun HPC ClusterTools 8.1.