UK Creates Massive 200,000-Core ‘HPC Service’
The United Kingdom is rapidly ramping up its HPC capabilities. The nation just launched its third HPC service in the last 12 months, a 200,000-core powerhouse designed to accommodate a wide range of academic and industry workloads.
“Accelerator,” as it’s known, was formed by taking the UK national high performance computing service HECToR, a Cray XE6 platform, and augmenting it with two new machines: an IBM BlueGene/Q and an AMD dual configuration Linux-Windows cluster, nicknamed “Indy,” for its industry-affiliation.
With no fancy middleware or interconnects to bridge them, Accelerator is essentially just three separate machines housed under one roof at the Edinburgh Parallel Computing Centre (EPCC). By combining the resources, they can claim, as the press material states, “the largest on-demand supercomputing resource in Europe.”
Accelerator does indeed surpass the capabilities of other UK services like the OCF’s enCORE service (8,000 cores) and the CORE HPC Service, managed by the University of Cambridge and Imperial College London (22,000 cores).
George Graham, Business Development Manager for EPCC, highlights the specifications of the Accelerator machines: HECToR, the prototype for the service, is a Cray machine with 90,112 cores and a peak output of 827 teraflops. The slightly more-equipped BlueGene/Q sports 98,304 cores and boasts a peak performance of 1.26 petaflops. Indy, in comparison, is a small cluster by today’s standards, with 1,576 cores, but, as Graham emphasizes, its main affinity is not peak performance, but openness and ease-of-use.
Indy supports standard installations of Linux and Windows, which allows it to accommodate a wide variety of industry workloads. In contrast, the Cray and IBM supers are more specialized; they require non-standard Linux distros in order to leverage the proprietary system interconnects.
The majority of Accelerator’s funding comes from EPSRC, a UK government agency, but the Indy cluster was financed out of the Edinburgh University budget and supplied by a local HPC provider, Viglen.
All three machines along with petabyte scale data storage are housed inside EPCC’s purpose-built Advanced Compute Facility. The large site has been active for many years, notes Graham, but has undergone continuous renovations to accommodate HECToR and meet the university’s growing computational demands.
The HPC service can cater to nearly every kind of research: life and earth sciences, pharmaceuticals, energy, and all manner of engineering and product development workloads.
“There is no limit to the application domain capability of our service machines,” says Graham. But he adds that ideal use cases of each system do vary.
As best-in-class leadership supercomputing systems, HECToR and BlueGene are targeted at very high-scale, high-resolution simulation and modeling challenges, for example whole nuclear reactor simulations rather than just one rod.
The ability to perform complete system simulations is a truly defining breakthrough, speaking both to how far we’ve come and the awesome potential that lies ahead. “This is the path to exascale,” observes Graham.
Indy is targeted at slightly more constrained challenges – CFD and FEA are common workloads – but nevertheless it’s a type of solution that can be a transformative digital tool for small-to-medium sized enterprises.
The machines can be accessed from around the globe using an Internet connection, but it’s not a cloud in the usual sense, says Graham. He describes the setup as very simple remote access, albeit over secure-connection SSH.
“Users have batch-based queueing access for jobs, however we wrap that with user authentication, security and privacy,” he says. “So it provides for a very healthy service.”
Asked if the machines could be configured in such a way that it would be possible to harness all 200,000-cores, Graham considers the idea before responding. “That’s not there now,” he says, “but it’s not unreasonable to think that our research could enable that kind of setup.”
“At EPCC we undertake a lot of research work,” he continues, “and some of it is in the domain of grid and cloud computing, so it’s not unfeasible to think that we would apply some of the lessons learned in order to provide mechanisms through which our independent architecture can be accessed via a holistic service.”
When it comes to data transfer constraints – a common roadblock to remote computing – Graham notes this is not generally an issue on the input side; the challenge is dealing with the large data files that are generated by the compute. However, he is quick to point out several solutions:
The first is that our systems are at an advanced compute facility that is connected to the UK-wide Super-JANET Network which has high data transfer performance across all UK higher education establishments. Any user that can get to a local campus can benefit from the very high data transfer.
Second, we remove the need for users to pull back the large amount of data that has been generated. What we can offer them is via a facility of on-demand serial queues, or working from login nodes, the ability to do post-processing of the data while the data is in-situ on our systems. So they can do post-processing, visualization, and so on while the data is on our system, which drastically reduce the need for data transfer.
And third, we are talking batch-based, queue-based technology and large data storage; we can always ship the data, load it onto a secure portable disk and courier it between us and the user establishment.
Although the expanded service is new, HECToR has been up and running for five years now. It has quite a wide community of users as it meets the needs of UK and European researchers and also satisfies a body of industry users. From a commercial business-development point of view, Graham notes their resources are predominately focused on UK, but they have had users from the States and across continental Europe. There really are no geographical barriers since the service can be accessed over a standard Internet connection.
It’s apparent from the heightened level of activity over the last few years that the UK government has a real objective in driving HPC to improve UK competitiveness, and these types of public-private collaborations are part of their strategy. They’re investing millions of pounds and they expect to see a return on investment in terms of innovation as well as real economic stimulus. As industry users pay to rent time on these big systems they are in effect underwriting the cost of the systems. That’s true, says Graham, but he returns to the collaborative nature of the arrangement: “Think of it as a three-way partnership between government and industry and the higher-education establishments,” he says.