Here is a collection of highlights, selected totally subjectively, from this week’s HPC news stream as reported at insideHPC.com and HPCwire.
10 words and a link
Top chip engineer leaves Sun for Microsoft
Visual timeline of the rise and bankruptcy of Silicon Graphics
Nehalem memory cheat sheet
Sun revamps HPC line, new Nehalem, networking, storage
Intel announces Q1, posts profit
Cisco buys scheduling software maker
NCSA mentors students in high performance systems
IBM stream computing prototype achieves 21x in finance
NCAR Cheyenne moves forward
U Mich picks SiCortex for heart research
Software enables proteomics research on Amazon EC2
Recap of the German Windows-HPC user group
Argonne raises machine room temps, explores energy-aware job scheduling
DOE calls for INCITE proposals, 1.3 billion hours at stake
The right way to exascale
Dan Reed reposted an essay on his blog that recently appeared at the CACM blog in which he talks about the shortcuts (my word, not his) we took to get to petascale, and his hope that we take a longer view on the way to exascale.
He writes (referring to some of the original petascale activities in the early 1990s):
At the time, most of us were convinced that achieving petascale performance within a decade would require some new architectural approaches and custom designs, along with radically new system software and programming tools. We were wrong, or at least so it superficially seems. We broke the petascale barrier in 2008 using commodity x86 microprocessors and GPUs, Infiniband interconnects, minimally modified Linux and the same message-based programming model we have been using for the past twenty years.
However, as peak system performance has risen, the number of users has declined. Programming massively parallel systems is not easy, and even terascale computing is not routine. Horst Simon explained this with an interesting analogy, which I have taken the liberty of elaborating slightly. The ascent of Mt. Everest by Edmund Hillary and Tenzing Norgay in 1953 was heroic. Today, amateurs still die each year attempting to replicate the feat. We may have scaled Mt. Petascale, but we are far from making it pleasant or even routine weekend hike.
This raises the real question, were we wrong in believing different hardware and software approaches were needed to make petascale computing a reality? I think we were absolutely right that new approaches were needed. However, our recommendations for a new research and development agenda were not realized. At least in part, I believe this is because we have been loathe to mount the integrated research and development needed to change our current hardware/software ecosystem and procurement models.
Reed’s suggested solution?
I believe it is time for us to move from our deus ex machina model of explicitly managed resources to a fully distributed, asynchronous model that embraces component failure as a standard occurrence. To draw a biological analogy, we must reason about systemic, organism health and behavior rather than cellular signaling and death, and not allow cell death (component failure) to trigger organism death (system failure). Such a shift in world view has profound implications for how we structure the future of international high-performance computing research, academic-government-industrial collaborations and system procurements.
I agree with this point of view, and it has echoes of some of the comments Thomas Sterling made at the HPCC conference a couple weeks ago in Newport as well, in the sense that both advocate a revolutionary, rather than an evolutionary, approach to exascale. My own reason for agreeing with this point of view is that while, yes, we can build petacale machines, we are getting between one and five percent of peak on general applications. This is what an evolutionary model gets you. We are well past the point when a flop is worth more than an hour of application developer’s time. We need to encourage the development of integrated hardware/software systems that help programmers write correct, large scale applications that get 15, 20, or even 30 percent of peak performance. To mangle Hamming, the purpose of supercomputing is discovery, not FLOPS.
Not that I think it will happen. The government has been stubbornly unwilling to coordinate its high end computing activities around any of the several research agendas that it has funded the creation of, but not the implementation (you could pick an arbitrary starting point with PITAC reports, or move either way in time to find sad examples of neglect). My own observations from inside part of this system is that the government has largely begun to think of HPC as “plumbing” that should “just work” in support of R&D, not as an object of R&D itself. There are a few exceptions (mostly in parts of DOE), but without leadership that starts in the President’s office (probably with the science advisor pushing an effort to get POTUS to make his department secretaries fall in line), this is not likely to change on its own.
Our curse is that we have something that kind of works. One of my grad school professors used to say that the most dangerous computational answers are those that “look about right.” If we had a model that was totally broken, we’d be forced to invest in new models of computation and because of the scale of that investment we’d be encouraged to make a coordinated effort of it. But our model isn’t totally broken, and as long as it kind of works, I don’t see anyone willing to dump out the existing rice bowls and start over.
Leak in supercomputer building forces replacement of $4M in gear
Late last week Indystar.com reported that a steam leak in a building being built to house Indiana University’s supercomputers forced the replacement of $4.2M in support gear.
According to IU architect Bob Meadows:
He says repairs could set back construction of the $32.7 million project by three months, with completion now scheduled in July.
No computers have been installed in the Data Center, but generators and battery backup systems in the building must be replaced.
New VP, new business for Cray
Cray has been in the process of formalizing a new line of business for at least a couple of months. Its “custom engineering” business will help customers with specialized requirements get access to Cray’s engineering bench to help them build custom solutions:
“Our custom engineering efforts are focused on leveraging our supercomputing technology, experience and innovation and tailoring these into computing, storage and consulting solutions designed to meet very specific customer needs,” said Peter Ungaro, president and CEO of Cray. “We are very excited to have Skip help us continue to grow and expand in this important aspect of our business and add to the overall strength of our leadership team.”
From what I understand, the team will focus not just on tweaking Cray gear for special installations, but on addressing customer needs and building solutions out of whatever technologies make sense. I would expect, however, that the Cray stockroom would be the first stop.
This week the company announced the leadership for the new business:
Cray Inc. today announced the appointment of John “Skip” Richardson to the position of vice president of business development for the company’s custom engineering team. With more than 20 years of business development experience in the technology and aerospace industries, Richardson will be responsible for promoting Cray’s custom engineering solutions to government agencies, commercial customers and systems integrators.
…
Prior to joining Cray, Richardson served as vice president of corporate business development at Sarnoff, Inc., a subsidiary of SRI International, where he was responsible for managing and growing business-to-business and government contracts in research and development for U.S. Department of Defense and commercial clients. Prior to Sarnoff, Richardson held various business development roles at Digimarc, IBM, Halliburton and Honeywell.
—–
John West is part of the team that summarizes the headlines in HPC news every day at insideHPC.com. You can contact him at [email protected].