The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
October 07, 2005
The following is an excerpt of a keynote presentation by Michael Wolfe of The Portland Group, a subsidiary of STMicroelectronics Inc., at CASES 2005 in San Francisco. He has worked on compilers for the high performance and parallel computing market in the commercial and academic worlds for more than 30 years. He has published numerous technical articles and authored two books.
The study of compilers includes almost all of classical computer science: programming languages, formal languages, algorithms, data structures, instruction set design, computer architecture, implementation, and everything almost down to logic design. Parts of the compiler include parsing, flow analysis, code improvements, parallelism detection at the instruction level as well as for multiple processors, resource allocation (registers, functional units), scheduling, and so on. Associated tools include assemblers, linkers, disassemblers, profilers, optimized libraries, and more. All of these have a long and distinguished history, actively developed over the past half century. So, why is it so hard to create a satisfactory programming environment for today's embedded systems? Why and how must such an environment differ from those in use on general purpose computer systems?
I'm going to discuss this problem from the top down, starting at the marketplace for programming environments (read: compilers) for embedded systems versus the market in the general purpose workstation or cluster world. The Portland Group (PGI) develops very high performance C and Fortran compilers. Since 2000, as part of STMicroelectronics, we have also developed highly optimizing compilers for embedded processors. The PGI business includes a large and growing customer base in the Intel and AMD x86 Linux market. That market includes multiple competing compiler vendors, which is an unusual situation compared to the traditional RISC/UNIX workstation and server market. This situation developed largely because the primary CPU vendor, Intel, did not develop and market a compiler solution until the turn of the century. Now, of course, Intel has a highly visible compiler group, targeting the Pentium family, XScale embedded systems, and the Itanium.
At The Portland Group, our continuing work in high-performance computing, in addition to work on compilers for ST embedded systems, gives us an interesting perspective. We're not alone in developing compilers for both the high performance and embedded markets. Some independent compiler companies can say the same, as well as, of course, Intel.
So, we have this healthy business selling compilers for the high performance Linux applications business. Who are our customers? Let me generalize and break our customers into two categories. In one category is a set of users who develop programs for internal use. These include laboratories and corporations developing critical modeling applications, such as an oil company developing seismic signal processing codes, or an aircraft engine manufacturer developing heat analysis, stress analysis, and fluid flow codes. These developers are in a rapid compile-test-run cycle. With these customers, the number of end-user applications per compiler sale is relatively low, or, conversely, the number of compiler sales per end-user application is relatively high.
In another category are independent software vendors, or ISVs. These are developers of large commercial applications, like fluid flow or crash test applications that are purchased and used by large corporations or laboratories in binary form. Here, to the compiler vendor, the direct value of the sale is relatively low; there may be many end users of the application for each compiler sale. The importance of these customers comes from the marketing value of a large ISV using our software, and also because sales of these applications to a user will often indirectly result in additional compiler sales. These resulting sales would then fall into the first group.
The point here is that we have two types of external customers. The first type results in large numbers of compiler users, who are constantly beating on the compiler with a wide variety of programs, resulting in lots of feedback to the compiler developers (us) about problems with correctness, performance, and compatibility. The second type results in very widespread use of executables generated by our compilers, which results in extreme stress on those executables from a quality standpoint, and in addition often generates even more widespread use of our compilers.
This is in great contrast to the embedded systems marketplace. In the embedded systems market, there are very few (if any) compiler users who are developing applications just for themselves. Take as a hypothetical situation, suppose ST was to market a processor core for inclusion into a cell phone. You might think ST would look forward to the compiler sales that come with a design win, but you'd be wrong. The compiler and software development tools are often thrown in with the design win. Because software is not free, this changes the financial impact of compiler development; instead of being a profit center, it's a cost center. Money spent on compilers is money not spent on architectural improvements, manufacturing process, marketing, and more. Look at how the design win was made. Very similar to the way high performance system sales were made in the 1980s, the potential customer comes up with a benchmark program or programs that exemplify the types of computation for which the embedded system will be used. Here, the customer can be very precise, since they already have the application. A team of specialists inside the semiconductor company then use whatever tools are available to tune the program for the platform, while their competitors will have another team doing the same for their platform.
Once a hardware design win is secured, how many applications (read: compiler sales and users) will this entail? Answer: few. In fact, since the design win probably depended on successful porting of the application (singular) to the platform, there's very little post-win application development, though there is some amount of tuning, as the final product design becomes more concrete.
In all, this results in little desire on the part of product managers to invest in compiler and tools development. They can invest in an applications engineer who will help port and tune the application and get this design win, or in a compiler engineer who will tune the compiler with improvements that will probably come in too late for this design win and may or may not help the next one. Even relatively simple and standard product features that appear in any successful workstation compiler may be missing in these products. As one example, we acquired the software development kit from one of our competitors. Our experience was that the kit was hard to buy, took a long time to get delivered, was hard to install, and once it was finally installed, it was hard to use. This kind of experience would never happen in an environment where the vendor is forced to compete for the business and loyalty of their users.
Page: 1 of 4(Digg, Technorati, more)
New Paper: Parallel Computing Without Parallel Programming
Learn how domain experts can run VHLL programs like MATLAB® on a variety of high-performance platforms without low-level reprogramming and how to work with the largest datasets and complex algorithms without sacrificing ease of use or reducing productivity.
Spider, the world's biggest Lustre-based, centerwide file system, has been fully tested to support Oak Ridge National Laboratory's new petascale Cray XT4/XT5 Jaguar supercomputer and is now offering early access to scientists.
Read More...
Wolfram Alpha, the Web-based computational engine introduced in May, is not a traditional supercomputing application, but relies on supercomputers to satisfy its unique requirements.
Read More...
There was a new energy at this year's TeraGrid '09 conference thanks to an outstanding turnout for the student program. Thanks to support from the National Science Foundation, more than 100 high school, undergraduate and graduate students were able to participate in the conference.
Read More...
Jul 09 | Engineer Live | The demand for computational tools to underpin the 3D seismic interpretation process has never been more apparent. Read more...
Jul 08 | EE Times | Unemployment for U.S. engineers has reached record levels, according to government figures. Read more...
Jul 08 | Network World | Global spending for 2009 projected to drop 6 percent, for a total of $3.2 trillion. Read more...
Jul 08 | Linux Magazine | Portability or efficiency? Neither is guaranteed when writing explicit parallel code. Read more...
Jul 07 | Ars Technica | Japanese company builds custom ASIC to accelerate real-time ray traced rendering for the auto industry. Read more...
Jul 10 | | Engineers, scientists, and other domain experts depend on the productivity enabled by very high-level language (VHLL) tools like MATLAB® and Python. However, as datasets grow larger and programs get more sophisticated, ordinary desktop computers can no longer keep up. The paper explores how to run VHLL programs on high-performance platforms without low-level reprogramming. Work with large datasets and complex algorithms without sacrificing ease of use or reducing productivity.
Apr 14 | | Many HPC IT departments are feeling the rising pressure to deliver more capacity computing and performance while trying to reduce the total cost of ownership. This white paper discusses how an environmentally-friendly and open-standards HPC building block based computing system using flexible interconnect options helps address capacity computing needs.
Source: Addison Snell, GM/VP, Tabor Research; sponsored by Dell
Many organizations that could benefit from the use of HPC clusters find that it is complicated to get the systems up and running because of limited IT resources or the complexities of the clusters themselves. Learn how the Intel Cluster Ready program, for which Dell was an original partner, seeks to address this challenge for entry level and mid-range HPC users.
BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.
Sun Studio Compilers and Tools and Sun HPC ClusterTools allow you to create high performance parallel applications for OpenSolaris, Solaris and Linux. Sun Studio Express 11/08 includes MPI performance analysis capabilities and full OpenMP 3.0 compiler support. Learn about all this and the latest in Sun HPC ClusterTools 8.1.