A Toolkit for Materials Scientific Cloud Computing
There are two elements in improving high performance computing with regard to scientific computing. The first is well covered over on HPCwire, and examines how the latest advancements in the greatest supercomputers push the boundaries of modeling and computation. The second component deals with making the applications that run those models and simulations more accessible to scientists who may not have access to those top-end supercomputers.
Recognizing this, researchers from the physics department at the University of Washington at Seattle, through a grant from the National Science Foundation, created what they call a ‘virtual platform’ for scientific cloud computing, or SC2VP, which they simply named “SC2IT” for scientific cloud computing interface tools.
“The main elements of our new platform include a virtual cloud computer blueprint or AMI, which contains preinstalled and optimized scientific codes and utilities.”
The platform, according to the researchers, is meant to simulate the parallelism and extensive data storage capabilities of a large supercomputer in a cloud environment, with the emphasis being to cater toward the material sciences. “This blueprint,” the researchers noted, “contains libraries, compilers, a parallel computing environment, and preconfigured applications typically useful for materials scientists. E.g. these applications can calculate structural and electronic properties of materials.”
What is important to note here is that the physicists built their toolkit with creating an HPC cluster as their top priority. Building a virtual machine is, according to the physicists, is not as difficult as making that cluster run high performance scientific applications. As they noted, “launching a set of virtual machines from a cloud provider is easy but does not produce a fully functional HPC cluster… To truly bring advanced science to a broad class of end users, another step is necessary beyond launching a parallel MS program on a cloud cluster.”
That next level, according to the researchers, involves congregating various scientific computing codes that optimize certain types of problems. These codes already exist through previous computational research, but the trick was incorporating them en masse in a toolset that would allow them to be deployed on a virtual cluster.
“The development of novel scientific software is often modular: computational scientists link existing codes together and combine them with new developments to produce state-of-the-art results.”
The particular existing codes they used included a Density Functional Theory code, whose purpose is to assess and order the dynamic motion relationships among the coordinates in a material, essentially building a model of how a substance moves. With that, they added two codes to calculate a material’s vibrational tendencies, including “a new module to next derive vibrational properties; and thirdly, an existing spectroscopy code to finally calculate an X-ray spectrum incorporating the vibrational information.”
Below is a screenshot of how the researchers implemented the spectroscopy code through the Graphical User Interface (GUI) hey set up. The red arrow denotes where the user can identify upon which resources the implementation would ideally run.
Materials science has grown rapidly over the last decade, simply because the resolution and precision with which one can observe and test substances has seen a marked increase. However, as is seen in genomics, another popular scientific cloud computing use case, those materials and their associated tests represent a lot of information to be stored and processed. Being able to run those computations and store the necessary data in the cloud would promote cost-effectiveness, providing access to researchers without extensive in-house HPC resources.
It also incidentally fosters collaboration, as retrieving data from a virtual cluster in the cloud is simpler than transmitting entire datasets over a user’s limited bandwidth or (still fairly common today) sending physical hard drives with copies of the large datasets in the mail.
While the focus is on materials science, it is the hope of the University of Washington physicists that this approach of aggregating optimized codes and tools can be applied to other fields of study. “We embedded the interface and blueprint in a GUI environment that enables MS end users to perform specific SCC calculations with a few mouse clicks. The same approach can be followed for SCC enhancement of GUIs for other fields of research.”
Again, this approach could prove useful to fields like genomics, where datasets are exploding and the incentive to run experiments quickly is high. The researchers here designed their toolset to run on the Amazon Elastic Compute Cloud, representing an eye toward completing high performance applications. “We tested the performance of this setup to prove that HPC calculations for materials science can be done efficiently in a cloud environment.”