Dutch HPC Cloud Running at Full Throttle

By Jose Luis Vazquez-Poletti

October 19, 2011

Last October a special event took place at Amsterdam’s Science Park. After 2 years of hard work, the first user-friendly HPC cloud infrastructure in Europe is working at full throttle. The collaboration between the supercomputer center, SARA, and grid computing project, BiG Grid, lie behind this epic milestone.

The result of this collaboration is a HPC Cloud infrastructure with usability for the scientific community and performance in mind. On one hand, scientists can use a computer environment that is virtually identical to the environment they are used to. On the other, they get access to self-service and dynamically scalable high performance computing resources, which can be configured with a high level of detail.

What’s under the hood of the HPC Cloud system? Basically 608 cores and 4.75TB RAM distributed in 19 physical nodes with 32 Intel 2.13 GHz CPU cores and 256 GB RAM each. Also, each node has a 10 TB “local disk”.

Virtualization is done with KVM. The latest version of OpenNebula, considered a de facto standard in virtual infrastructure managers, was chosen to be the engine inside SARA’s HPC Cloud. In fact, the developers of OpenNebula were contacted during the first steps of the setup in order to discuss how to get the most of OpenNebula and make the final infrastructure address the challenging needs of the HPC community. Moreover, from the beginning they involved the users in the testing of the platform, resulting in an active contribution to the OpenNebula ecosystem.

The platform has attracted scientists from a wide range of fields such as Bioinformatics, Ecology, Geography or Computer Science. A number of the users presented on the 4th of October in Amsterdam  at the HPC Cloud day. Among these, there are some key applications I found interesting for this article.

The first one is from the University of Amsterdam’s Microarray Department/Integrative Bioinformatics Unit (MAD/IBU). Their research ranges from seed breeding to DNA damage but always facing gene transcription across the entire genome. Comparing strings in such big databases is a challenging task where a huge computational power becomes vital.

The Biomedical Imaging Group Rotterdam (BIGR) from Erasmus MC works in two main research lines. The first one is population imaging by developing robust, accurate and fully automated tools that allow to understand diseases from brain changes.  The second is about providing earlier and more accurate diagnosis through computer assisted tools.

The Netherlands Institute of Ecology (NIOO-KNAW) work at genome level and they don’t hesitate to use tools pertaining to the computing portal paradigm to perform basic research. In fact, they made some of these tools (Galaxy and Cloudman) to be available in the HPC Cloud environment. This way, the platform is not only useful for their analysis of high-throughput community sequencing data, but also for other research groups that will benefit of the ported tools.

The Koninklijke Bibliotheek (Royal Library) works in what is named humanities computing and I have to say that their use of HPC Cloud was a surprise for me. Basically the objective is to offer everyone access through Internet to everything published in and about the Netherlands between 1618 and 1995. Optical Character Recognition in nowadays documents is relatively easy to perform but the task gets complicated when working on very old books due to calligraphy and damage. However, they expect to digitalize 10% of these publications by 2013.

Considering the actual economic crisis, the Rotterdam School of Management of Erasmus University focuses on Finance and, in particular, liquidity or how to quickly trade large volume with low costs. The World market is the one establishing the ceiling of the amount data that needs to be processed. Their framework deals with more than 400 exchanges worldwide, 45 million different instruments, 350 data fields of historical data starting in 1996.

These are only some examples but again, remember that the HPC Cloud infrastructure was built having users in mind. A user that already knows what they want, and has an existing environment, can rebuilt that in the HPC cloud in an afternoon. In this context, 90-minute courses are taught to get users up and running, ready to deploy large clusters in the infrastructure. The result is that projects to port software to this infrastructure usually take only days instead of months or years.

The only drawback (for me) is that the platform funded by a national project so it can only accommodate Dutch researchers and their affiliations. However, SARA doesn’t close any doors for foreigners in the future, when the funding may change. And several dutch researchers participate in international projects, allowing access to the infrastructure to international project members.

I would like to express all my gratitude to the HPC Cloud project leader Drs. Floris Sluiter from SARA, who very kindly provided me the information needed for writing this article.

Links of Note

The presentations of the HPC Cloud Day on 4th October in Amsterdam:

program and presentations:


Video recordings: http://ftp.sara.nl/pub/cave/outgoing/clouddag/

SARA is a national supercomputing center, originally founded by the University of Amsterdam, the Vrije University and the stichting Mathematisch Centrum (now Centrum Wiskunde & Informatica). Forty years have passed since its first commitment of processing data coming from the three founders and now it’s providing HPC services at a national level.

BiG Grid is a project led by NCF, Nikhef and NBIC that aims to set up the national grid infrastructure for scientific research.

About the Author

Dr. Jose Luis Vazquez-Poletti is Assistant Professor in Computer Architecture at Complutense University of Madrid (Spain), and a Cloud Computing Researcher at the Distributed Systems Architecture Research Group (http://dsa-research.org/).

He is (and has been) directly involved in EU funded projects, such as EGEE (Grid Computing) and 4CaaSt (PaaS Cloud), as well as many Spanish national initiatives.

From 2005 to 2009 his research focused in application porting onto Grid Computing infrastructures, activity that let him be “where the real action was”. These applications pertained to a wide range of areas, from Fusion Physics to Bioinformatics. During this period he achieved the abilities needed for profiling applications and making them benefit of distributed computing infrastructures. Additionally, he shared these abilities in many training events organized within the EGEE Project and similar initiatives.

Since 2010 his research interests lie in different aspects of Cloud Computing, but always having real life applications in mind, specially those pertaining to the High Performance Computing domain.

Website: http://dsa-research.org/jlvazquez/
Linkedin: http://es.linkedin.com/in/jlvazquezpoletti/

