European HPC Industry in Need of Revitalization
The UK's Atomic Weapons Establishment (AWE) hosted an excellent HPC Europe Workshop at St Catherine's college, Oxford. This interactive workshop, which took place on September 25-26, discussed issues of concern to Europeans working in the HPC area. It brought together top experts in HPC — restricted to less than 10 from each European country — to share intelligence and develop common strategies, hoping to collectively leverage future directions of European HPC. Invitations to speak or attend were made by a country representative who also advised on the agenda. This year, about 80 delegates attended from within both the customer and supplier segments. Non-European vendors were present only for the last sessions.
The previous workshop, held at Maffliers, Paris in 2004, focused on the strengths and weaknesses of high performance technical computing in Europe. This year the focus was on strengthening HPC in Europe.
The first day consisted of 17 European presentations to set the scene on the state of HPC in Europe. The second day consisted of 12 vendor presentations and a vendor panel discussion. To give the reader a flavour of the proceedings, below are few examples of how the European user community is satisfying its computing needs.
Professor Richard Kenway of the University of Edinburgh said: “Some areas of science are limited by computer performance. They demand sustained speeds of petaflops or more, now. This presents a greater challenge than the recent step to teraflops, because massive parallelism must be delivered at an affordable price, and many codes running today on the UK HPCx teraflops system will not scale up to run efficiently on ten to hundred times more processors. Hardware that is tailored to the application, and/or serendipitous use of cheap commodity parts developed for some other purpose, will be needed to keep machine costs down, and software will need to be re-engineered to use it efficiently. These factors are driving us towards a diversity of architectures, international facilities and community code bases. There is scope for innovative solutions both from small players and from traditional vendors, as we have seen in the QCDOC, FPGA and Blue Gene/L projects at Edinburgh's Advanced Computing Facility. This growing diversity gives Europe the opportunity to re-enter the high-performance computing arena”.
Thomas Lippert, from the John von Neumann-Institute for Computing (NIC), Germany, said: “Presently, we witness a rapid transition of cutting edge supercomputing towards highest scalability, utilizing up to hundreds of thousands of processors for cost-effective capability computing…. There is hot debate within the community as to whether the advent of highly scalable systems like Blue Gene/L and the emergence of a more heterogeneous hardware landscape, signal the onset of a paradigm shift in HPC. Still, there are HPC problems that are less scalable by nature and might require intricate communication capabilities or a large non-distributed memory space with extremely fast memory access. NIC has recently complemented its 9 teraflops general purpose SMP-cluster, with a 46 teraflops Blue Gene/L supercomputer, which is currently one of the fastest computers in Europe. Both systems share a huge global parallel file system, which is part of the file system of the European DEISA alliance. With this configuration, the NIC is able both to meet the requirements of a very broad range of projects and to support a selected number of high-end simulation projects.” Lippert then presented simulation examples from materials science demonstrating the added value through this heterogeneous hardware approach and NIC's plans for joining the European e-science ecosystem.
Several specific applications were also presented by other speakers. Artem Oganov of ETH Zurich presented: “USPEX — an evolutionary algorithm for crystal structure prediction”. He described how their simulations found new phases of planetary materials with lowest energy levels at extreme pressures, identifying structures where experimental data are insufficient. This algorithm has the potential for designing new materials entirely on the computer. He discussed some of the applications of this method for a number of substances (C, N, O, S, H2O, MgSiO3, CaCO3, MgCO3) and possible industrial uses.
Mark Savill of Cranfield University, UK focused on the recent usage of the National Supercomputer by the UK Applied Aerodynamics HPC Consortium. Flagship computations of aircraft engine components and whole aircraft configurations were discussed — especially a project to simulate vertical descent of a Harrier model for hot-gas ingestion studies.
Reinhard Budich from the Max Plank Institute, Germany talked about the European Network for Earth System (ENES) Modelling and the current situation at the German climate computing centre. After describing their current activities he went on to say that hardware is the cheapest component, software and data management are the real barriers for achieving goals. Climate is very high on the European political agenda, especially understanding the dynamics of the human impact of climate change. ENES is involved in discussions with 30 institutes worldwide, in defining the work expected to start in 2009, for the IPCC AR5 planned for 2012/13 timeframe.
In the hardware systems field, Piero Vicini of INFN, Rome described “The APE project”. In the last 20 years, the INFN APE group (APE is an acronym for “Array Processor Experiment”) has been involved in the development of massively parallel supercomputer dedicated to LQCD (Lattice Quantum Chromo Dynamics), a typical “killer” application for a general purpose supercomputer. ApeNEXT is a fourth generation system, capable of a peak performance of 5 teraflops with a sustained efficiency of more than 50 percent for key applications. It shows impressive ratios of flops/watt and flops/volume at a cost-value ratio of half a Euro per megaflops. Vicini claimed Ape is a highly proficient European HPC computer in the same application space as the Blue Gene/L, used in collaborative work by teams in USA, UK and Japan for LQCD. The next Ape system aims at petaflops performance for a larger class of scientific and engineering applications.
Claude Camozzi of Bull, France talked about: “FAME2: a Pole de Compétitivité project towards petaflops computing”. This is a collaborative project from the French “pole de compétitivité named System@tic” and has the ambition to provide an emulation infrastructure allowing industrial and academic research laboratories to anticipate availability of nodes based on COTS for petaflops systems. It should enable software developers to create and adapt tools and innovative applications for this new scale of computing power. The efficiency of some hardware accelerators (from European vendors, e.g., ClearSpeed and Ape) will also be evaluated in order to provide guidelines on how to provide “capability system features” on “capacity systems”. They are also looking at new database concepts using native XML and new multilingual research tools. Their goal is to be able to retrieve any reference from a 50 terabyte database within a couple of seconds. This speaker also described the federative aspect of this project at the French level and made suggestions on how to open and leverage this collaborative effort across Europe.
As stated above, the workshop's main focus was to exchange experiences and views on how to strengthen HPC in Europe. Issues like: What do Europeans do best in HPC? Whether it is software development for applications specific to Europe, hardware components and integration, or total solution integration. What issues arise from using non-European HPC? For example, what would happen if there were trade restrictions on high technology exports from the USA to certain European countries? In addition, how do Europeans optimise their relationship with non-European vendors? And lastly, what HPC projects can best be done at a European level, but can't be done well at a national level?
To put this in context, a substantial number of people attending this workshop were either representatives or directors of national large-scale computing facilities currently delivering teraflops of sustained performance on Bull, Cray, IBM, and NEC systems. A number of these participants expressed strong concern that Europe is falling behind the USA, Japan and Asia in using HPC as a strategic resource to achieve economic competitiveness.
A glance at the Top500 list provides evidence that Europe is lagging far behind the United States and Japan in supercomputers. Other indicators cited include patents and published research papers. This is very alarming and is a direct consequence of setbacks in large 'computational projects' at the beginning of the 1990s when the European intensive computing industry collapsed. Today only a few small European computer businesses survive. For example, Meiko collapsed but was bought by the Italian firm Finmeccanica and renamed Quadrics. This company is today producing the 'Rolls Royce' of networks. ClearSpeed is also a spin-off of the failed UK INMOS Transputer effort of the late 1980s. In France, a revitalised Bull is coming back to the forefront with the TERA-10 machine. This system delivered 12.5 teraflops sustained performance on the CEA/DAM benchmark.
As Jean Gonnord, head of the Numerical Simulation Project and Computing at CEA/DAM said: “With an almost non-existent industrial framework and lack of any real strategy, Europeans are using a 'cost base' policy in intensive computing. Laboratories are investing in HPC using their own research funding, so naturally the aim is to get the cheapest machines. This has some odd effects: users practise self-censorship and depend on the American and Japanese makers to define what tomorrow's computing will be like, and this makes Europe fall even farther behind”.
In other words, HPC is of the highest and most pervasive strategic importance. As I wrote in my book 20 years ago: “It enables scientists to solve today's problems and to develop the new technology for tomorrow's industry, affecting national employment patterns and national wealth”. It is also the main tool for the simulation and stewardship of nuclear weapons and delivery systems. Thus HPC is intertwined with national policies spanning the whole spectrum of national security, the armament industry and the whole industrial civilian economy. It would be perilous for Europe to ignore it.
Europe was very late compared to the USA and Asia in embracing HPC. To make up for lost ground, Europe should implement a more proactive policy in supercomputing, centred on a synergy between defence, industry and research.
There are, however, some positive signs on the horizon. For example, the success story of the TERA-10 project, at CEA, was based on having a real policy in high performance computing — grouping resources and using the defence industry research synergy — and according to Gonnord, shows the way for France to get back in the HPC race. Gonnord went on: “Times change — and mentalities too! Since the beginning of 2005 we have seen several changes. For example, the French National Research Agency (ANR) has included 'intensive computing' as an aspect in its program and launched a call for projects last July. Nearly fifty projects were submitted last September and have been evaluated. Another sign is that the System@tic competitiveness initiative, of which Ter@tec is one of the key elements, has just launched a project to develop the new generation of computers leading to petaflops. Of course, these efforts do not compare with those undertaken in the United States, but it's a good start”.
The other good news is that a similar initiative is to be launched at the European level. After a year of effort and persuasion, supercomputing is going to reappear in the budget of the 7th European RTD Framework Programme (2007-2013), which should include an industrial aspect. The beacon project in this initiative will be, if accepted, to set up three or four large computing centres in Europe with the mission not of just providing computing for a given scientific theme, but to stay permanently in the top five of the Top500 list. Undoubtedly, this will mean that major numerical challenges could be solved in the majority of scientific disciplines leading to major technological breakthroughs.
Existing large scale computing centres in Europe are already preparing the case for hosting a petaflops system. In this respect Jean Gonnord said: “The CEA/DAM-Île-de-France scientific computing complex is a natural candidate to host and organise such a structure. But one thing is sure — all of these projects will only make sense if they are based, like in the United States, Japan and now in China, on a solid local industrial network and a proactive policy of national states and the European Union”.
The invited vendors gave excellent presentations discussing their Roadmaps. Cray talked about adaptive supercomputing. NEC talked about HPC solutions using hybrid systems for delivering sustained application performance. IBM talked about their commitment to HPC in Europe. Bull spoke of their plans to deliver petaflops systems. Chip manufacturers Intel, AMD and ClearSpeed presented their future product visions. Other vendors, including Quadrics, also gave talks.
Personally, I find the Cray concept of 'Adaptive Supercomputing' very attractive. It recognises that although multi-core commodity processors will deliver some improvement, exploiting parallelism through a variety of processor technologies, e.g., using scalar, vector, multi-threading and hardware accelerators (FPGAs or Clearspeed) creates the greatest opportunity for application acceleration.
Adaptive supercomputing combines multiple processing architectures into a single scalable system. Looking at it from the user perspective, one has the application program, followed by a transparent interface, using libraries, tools, compilers, scheduling system management and a runtime system. The adaptive software consists of a compiler that knows what types of processors are available on the heterogeneous system and targets code to the most appropriate processor. The result is to adapt the system to the application — not the application to the system. The “Grand Challenge” is to do this efficiently.
The beauty of this concept is that once the workload profile is known, a user buys a system with the right mix of hardware to match that workload profile and the onus is then on the vendor's system software to deliver high productivity. One assumes that the above vision played a part in the recent sales successes at AWE, CSCS, EPSRC, NERSC and ORNL, by Cray. Interestingly, these sites are mainly replacing IBM hardware with the Cray XT3.
The HPC community has accepted that some kind of adaptive supercomputing is necessary to support the future needs of HPC users as their need for higher performance on more complex applications outpaces Moore's Law. In fact, the key players in the petaflops initiatives are broadly adopting the concept, despite using distinct heterogeneous hardware paths. The IBM hybrid Opteron-Cell Roadrunner system, to be installed at LANL, is the latest example.
In addition to the above vendor presentations, there was a panel discussion. The panel represented a broad spectrum of the industry: Chip vendors AMD, Intel and ClearSpeed; computer vendors Bull, NEC, IBM, Cray and Linux Networx; and service provider T-Systems all explained their position in the European landscape.
To conclude, there was a strong feeling at this workshop that Europeans should get their act together and find the political will to put in place funding structures that encourage a home-grown HPC industry if they wish to remain competitive players in this pervasive strategic field. For Europeans to carry on as in the recent past would be unwise and perilous in the long term.