Migrating Scientific Experiments to the Cloud

By Daniel de Oliveira, Fernanda Araújo Baião and Marta Mattoso

March 4, 2011

The most important advantage behind the concept of cloud computing for scientific experiments is that the average scientist is capable of accessing many types of resources without having to buy or configure the whole infrastructure.

This is a fundamental need for scientists and scientific applications. It is preferable that scientists be isolated from the complexity of configuring and instantiating the whole environment, focusing only on the development of the in silico experiment.

The amount of published scientific and industrial papers provide evidence that cloud computing is being considered as a definitive paradigm and it is already being adopted by many scientific projects.

However, many issues have to be analyzed when scientists decide to migrate a scientific experiment to be executed in a cloud environment. The article “Azure Use Case Highlights Challenges for HPC Applications in the Cloud” presents several challenges focused on HPC support, specifically, for Windows Azure Platform. We discuss in this article some important topics on cloud computing support from a scientific perspective. Some of these topics were organized as a taxonomy in our chapter “Towards a Taxonomy for Cloud Computing from an e-Science Perspective” of the book “Cloud Computing: Principles, Systems and Applications”[11].

Background on e-Science and Scientific Workflows

Over the last decades, the effective use of computational scientific experiments evolved in a fast pace, leading to what is being called e-Science . The e-Science experiments are also known as in silico experiments [12]. In silico experiments are commonly found in many domains, such as bioinformatics [13] and deep water oil exploitation [14]. An in silico experiment is conducted by a scien-tist, who is responsible for managing the entire experiment, which comprises composing, executing and analyzing it. Most of the in silico experiments are composed by a set of programs chained in a coherent flow. This flow of programs aiming at a final scientific goal is commonly named scientific workflow [12,15].

A scientific workflow may be defined as an abstraction that allows the structured controlled composition of programs and data as a sequence of operations aiming a desired result. Scientific workflows represent an attractive alternative to model pipelines or script-based flows of programs or services that represent solid algorithms and computational methods. Scientific Workflow Management Systems (SWfMS) are responsible for the workflow execution by coordinating the invocation of programs, either locally or in remote environments. SWfMS need to offer support throughout the whole experiment life cycle, including: (i) design the workflow through a guided interface (to follow a specific scientific method [16]); (ii) control several variations of workflow executions [15]; (iii) execute the workflow in an efficient way (often in parallel); (iv) handle failures (v) access, store and manage data.

The combination of the life cycle support with the HPC environment has many challenges to SWfMS due to the heterogeneous execution environments of the workflow. When the HPC is a cloud platform, more issues arise as discussed next.

Cloud check-list before migrating a scientific experiment

We discuss scientific workflow issues related to cloud computing in terms of architectural characteristics, business model, technology infrastructure, privacy, pricing, orientation and access, as shown in Figure 1 .

Main issues in clouds for scientific applications

Pricing

Cost is one of the most important characteristics in both scientific and business domains. Since most of the public clouds adopt the pay per use model, it is important to preview the final price to be paid and to determine how the financial resources available for a scientific experiment are used. In general, the price to be paid for using clouds follow three main types (that have to be analyzed by scientists): free (normally if scientists have their own cloud), pay-per-use (pays a spe-cific value related to his resource utilization normally in hours) and bill broken (where scientists pay for using each component independent of used time). However, this evaluation is far from simple, since costs saved by cloud, such as, acquiring equipment and hiring supporting staff are difficult to calculate.

Business Model

Clouds may be classified into three main categories [17]: Software as a Service (SaaS), Infrastructure as a Service (IaaS) and Platform as a Service (PaaS), creating a model named SPI [17]. The evaluation of a cloud environment must consider the business model particularly with respect to scientific data support. In the e-Science field, the generated data is one of the most valuable resources. The SPI model does not consider services that are based on storage or databases. Thus, it is important to check models that provide Storage as a Service and Database as a Service. Storage as a Service provides access to several storage facilities that are remotely located. Database as a Service provides operations and functions of a remotely hosted database management system. Database services are particularly important in scientific experiments to store provenance data [18], so it can be queried with controlled access, what is not supported by storage services.

Architectural Characteristics

When analyzing the main architectural characteristics of clouds it is important to check and analyze the support for Virtualization, Security, Resource Sharing and Scalability. For example, clouds can occasionally relocate applications among hosts and allocate multiple applications on the same host according to resource availability. These moves and instabilities can generate negative impacts on workflow performance due to the flow of activity executions and data transfer between them. Ideally the cloud scheduler should be in sync with the SWfMS to be aware of the flow.

Privacy

Privacy is a fundamental issue in scientific experiments. Many unpublished experiments and results have to be private during the course of the experiment. We may classify cloud approaches in Private, Public and Hybrid. From the scientist point of view and in terms of privacy, the most “secure” approach is to use private clouds. In private clouds, all the security control is defined by the scientist (or a computer specialist team) which means that external access are more controlled by the scientist. However, hybrid and public clouds usually provide advanced security mechanisms (such as security policies in Amazon EC2) that guarantee the privacy of data and applications. Scientists have to analyze if the provided mechanisms are enough for their expectations.

Access

There are several types of access provided, such as (non-exhaustive list): Browsers, Thin Clients, Mobile Clients and API, for example. Analyzing the access type provided is important for scientists when choosing a cloud environment to run their experiments. Scientific experiments should be able to be accessed by different ways: web pages, mobile devices. The effective use of different technologies in scientific experiments leads to the need of different types of access. Web browsers are commonly used for accessing cloud services. It is an intuitive idea to use Web browsers since almost every computer has at least one browser installed and may access cloud services. In addition, many Web browsers are focused on cloud computing, such as Google Chrome. Thin clients and mobile are other im-portant types of access for clouds out of a desktop within handhelds or mobile phones. And finally, API is a fundamental way for accessing clouds via programming languages commands (such as Java, Python or C). Complex scientific applications usually make use of APIs to access cloud infrastructure in a native form. In this case, scientists have to analyze the access methods already used for their application and verify if this access method can be used or adapted to be migrated to a cloud environment.

Cloud Orientation

The cloud orientation differs according to the business model used. In the SaaS model, application are deployed on the cloud, and can only be invoked, i.e. all the execution control is in charge of the deployed application. In this case, we consider this approach as task centric Scientists need to transfer control to the application owners instead of having control of it during the course of the experiment. On the other hand, when the infrastructure is provided as a service (IaaS where virtualized hardware is provided to be configured and controlled), the scientist has full control of the actions. The programs that will execute, the environment configurations are chosen by scientists. In this case, we consider this approach as user centric. Scientists have to analyze which approach is more suitable for their needs. If they want to execute only one application such as bioinformatics BLAST, they can choose a task centric approach. However, if they want to try several programs, change environment configurations, the user centric approach is more suitable.

Technology Infrastructure

The technological infrastructure defines how a specific cloud approach is imple-mented. It can be based on based on grids [19], Peer-to-Peer [20], PC clouds, and cluster clouds or combination of them. This evaluation may be compromised in public clouds, such as Amazon EC2 [21], because we are not able to know which kind of technology is used to implement the cloud. However, in private clouds it is possible to obtain this information. It is quite useful because many experiments need a computational cluster or a grid to execute in parallel and produce results in a feasible time.

Conclusions

This article highlighted that despite the high interest about cloud computing from the scientific community (especially those that need to execute HPC scientific ap-plications); it is still a wide open field. Choosing the best cloud support is a step forward, but there is still a need for services focused on the scientific workflow execution to bridge the gap between the cloud and the SWfMS. SciCumulus [22] is an initiative in this direction. Some SWfMS, such as Swift [23] and Pegasus [24] are also incorporating cloud support in their systems.

About the Authors

Daniel de Oliveira is a Ph.D. student at the Department of Computer Science at the COPPE Institute from Federal University of Rio de Janeiro. He received a B.Sc. degree in 2005 and M.Sc. degree in 2008, both from Federal University of Rio de Janeiro, Brazil. He is currently working on his Ph.D. thesis in Computer Science in the same institution. His interests include Cloud Computing, e-Science, workflow management, data mining, text mining and ontologies. He is also mem-ber of IEEE, ACM and of the Brazilian Computer Society.

Fernanda Baião is a Professor of the Department of Applied Informatics of the Federal University of the State of Rio de Janeiro (UNIRIO) since 2004, where she leads the Distributed Databases Research Group. She received the Doctor of Science degree from the Federal University of Rio de Janeiro (UFRJ) in 2001. During the year 2000 she worked as a visiting student at the University of Wis-consin, Madison (USA). Her current research interests include distributed and parallel databases, data management in scientific workflows, conceptual data modeling and machine learning techniques. She participates in research projects in those areas, with funding from several Brazilian government agencies, including CNPq, CAPES and FAPERJ. She participates in several program committees of national and international conferences and workshops, and is a member of ACM and of the Brazilian Computer Society.

Marta Mattoso is a Professor of the Department of Computer Science at the COPPE Institute from Federal University of Rio de Janeiro (UFRJ) since 1994, where she leads the Distributed Database Research Group. She has received the Doctor of Science degree from UFRJ. Dr. Mattoso has been active in the database research community for more than ten years and her current research interests in-clude distributed and parallel databases, data management aspects of scientific workflows. She is the principal investigator in research projects in those areas, with funding from several Brazilian government agencies, including CNPq, CAPES, FINEP and FAPERJ. She has published over 200 refereed international journal articles and conference papers. She has served in program committees of international conferences, and is a regular reviewer of several international journals.

References

[1] N. Antonopoulos and L. Gillam, 2010, Cloud Computing: Principles, Systems and Applications. 1 ed. Springer.

[2] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, et al., 2010, A view of cloud computing, Commun. ACM, v. 53, n. 4, p. 50-58.

[3] R. Buyya, C.S. Yeo, and S. Venugopal, 2008, Market-Oriented Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities, In: Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications, p. 5-13

[4] E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good, 2008, The cost of doing science on the cloud: the Montage example, In: SC ’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, p. 1-12, Austin, Texas.

[5] Y. El-Khamra, H. Kim, S. Jha, and M. Parashar, 2010, Exploring the Performance Fluctuations of HPC Work-loads on Clouds, In: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, p. 383–387, Washington, DC, USA.

[6] C. Evangelinos and C. Hill, 2008, Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon’s EC2, Chicago, IL.

[7] I. Foster, Y. Zhao, I. Raicu, and S. Lu, 2008, Cloud Computing and Grid Computing 360-Degree Compared, In: Grid Computing Environments Workshop, 2008. GCE ’08, p. 10, 1

[8] T. Hey, S. Tansley, and K. Tolle, 2009, The Fourth Paradigm: Data-Intensive Scientific Discovery. Online book, Url.: http://emotionalcompetency.com/sci/booktoc.html.

[9] A. Matsunaga, M. Tsugawa, and J. Fortes, 2008, CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications, IEEE eScience 2008, p. 229, 222.

[10] C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berriman, and J. Good, 2008, On the use of cloud computing for scientific workflows, In: IEEE Fourth International Conference on eScience (eScience 2008), Indianapolis, USA, p. 7–12

[11] D. Oliveira, F. Baião, and M. Mattoso, 2010, “Towards a Taxonomy for Cloud Computing from an e-Science Perspective”, Cloud Computing: Principles, Systems and Applications (to be published), Heidelberg: Springer-Verlag

[12] I.J. Taylor, E. Deelman, D.B. Gannon, M. Shields, and (Eds.), 2007, Workflows for e-Science: Scientific Workflows for Grids. 1 ed. Springer.

[13] M. Addis, J. Ferris, M. Greenwood, P. Li, D. Marvin, T. Oinn, and A. Wipat, 2003, Experiences with e-Science workflow specification and enactment in bioinformatics, Proceedings of UK e-Science All Hands Meeting, p. 459–467.
 
[14] W. Martinho, E. Ogasawara, D. Oliveira, F. Chirigati, I. Santos, G. Travassos, and M. Mattoso, 2009, A Concep-tion Process for Abstract Workflows: An Example on Deep Water Oil Exploitation Domain, In: 5th IEEE International Conference on e-Science, Oxford, UK.

[15] M. Mattoso, C. Werner, G.H. Travassos, V. Braganholo, L. Murta, E. Ogasawara, D. Oliveira, S.M.S.D. Cruz, and W. Martinho, 2010, Towards Supporting the Life Cycle of Large Scale Scientific Experiments, In-ternational Journal of Business Process Integration and Management, v. 5, n. 1, p. 79–92.

[16] R.D. Jarrard, 2001, Scientific Methods. Online book, Url.: http://emotionalcompetency.com/sci/booktoc.html.

[17] L. Youseff, M. Butrico, and D. Da Silva, 2008, Toward a Unified Ontology of Cloud Computing, In: Grid Com-puting Environments Workshop, 2008. GCE ’08, p. 10, 1

[18] J. Freire, D. Koop, E. Santos, and C.T. Silva, 2008, Provenance for Computational Tasks: A Survey, Computing in Science and Engineering, v.10, n. 3, p. 11-21.

[19] I. Foster and C. Kesselman, 2004, The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann.

[20] E. Pacitti, P. Valduriez, and M. Mattoso, 2007, Grid Data Management: Open Problems and New Issues, Journal of Grid Computing, v. 5, n. 3, p. 273-281.

[21] Amazon EC2, 2010. Amazon Elastic Compute Cloud (Amazon EC2). Amazon Elastic Compute Cloud (Amazon EC2). Dispon?vel em: http://aws.amazon.com/ec2/. Acesso em: 5 Mar 2010.

[22] D. Oliveira, E. Ogasawara, F. Baião, and M. Mattoso, 2010, SciCumulus: A Lightweigth Cloud Middleware to Explore Many Task Computing Paradigm in Scientific Workflows, In: Proc. 3rd IEEE International Conference on Cloud Computing, Miami, FL.

[23] Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, V. Nefedova, I. Raicu, T. Stef-Praun, and M. Wilde, 2007, Swift: Fast, Reliable, Loosely Coupled Parallel Computation, In: Services 2007, p. 206, 199, Salt Lake City, UT, USA.

[24] E. Deelman, G. Mehta, G. Singh, M. Su, and K. Vahi, 2007, “Pegasus: Mapping Large-Scale Workflows to Dis-tributed Resources”, Workflows for e-Science,  Springer, p. 376-394.
 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion XL — were added to the benchmark suite as MLPerf continues Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire