Cryo-electron microscopy (Cryo-EM) technology allows biomedical researchers to image frozen biological molecules, such as proteins, viruses and nucleic acids, and obtain structures of molecules that were impossible using previous methods. Cryo-EM requires both large and expensive electron microscopes as well as substantial high performance computing (HPC) resources to process microscope imagery and extract three-dimensional structures from them. The compute and storage infrastructure necessary to support these workloads is often prohibitively expensive for individual researchers and small research labs to process a small number of Cryo-EM projects per year, with compute and storage hardware, GPUs, and staffing costs amounting to $500,000+. This is where the cloud services from Amazon Web Services (AWS) can come handy.
To overcome these challenges, the NY Structural Biology Center built Stion, a web application that provides on-demand access to GPU instances on AWS for biomedical researchers to process Cryo-EM data. The main objective of building Stion is to reduce the infrastructure overhead for researchers so that they can focus on the science. Stion provides the building blocks for end-to-end data processing, including educational tutorials to improve skills, and access to scalable computational resources required for Cryo-EM processing. Taken together this open-source platform reduces the barrier for entry for new users by providing them a pathway to become familiar with Cryo-EM cloud computing and provides biomedical researchers a framework to learn how to process their own Cryo-EM data.
Why AWS?
We chose AWS for a few reasons. First, the AWS global infrastructure allows researchers all over the globe to launch AWS instances within minutes without having to maintain any physical infrastructure. They have the ability to run complex workloads through the browser from any part of the world if they have a stable internet connection.
Second, we did an in-depth analysis of both short-term and long-term pricing for different cloud providers for compute, storage and networking, which are the backbone of our application. After a detailed comparative analysis and benchmarking of our ideal pricing models, we came to a conclusion that AWS provides the lowest-cost pricing all-in for on-demand compute instances, object-based storage, and file-based storage.
Finally, one of the major reasons for selecting AWS as our go-to cloud vendor is because of the extensive support it provides to customers. AWS has plenty of online resources such as technical guides and in-depth documentation about every product and service they offer, but AWS support really stands out because of their well-trained personnel in every domain.
Solution overview
Stion provides a dedicated GPU sandbox pre-loaded with software packages such as CryoSPARC, RELION, Appion-Protomo, and EMAN2. Each of the software comes with preloaded datasets for researchers to get started with data processing immediately.
Architecture design is a key element to build a data processing pipeline on the cloud. We built a hybrid architecture where some of the resources are hosted on-premises and most of the resources are hosted on AWS Cloud. Figure 1 shows the on-prem and AWS resources, and Table 1 provides an accompanying description of the data analysis workflow.
Figure 1. Stion architecture diagram, including the workflow for data transfer and processing, as described in Table 1.
Read the full blog here to learn how to use Stion to run your Cryo-EM workloads on AWS.
Reminder: You can learn a lot from AWS HPC engineers by subscribing to the HPC Tech Short YouTube channel, and following the AWS HPC Blog channel.