In this week’s hand-picked assortment, researchers explore the path to more energy-efficient cloud datacenters, investigate new frameworks and runtime environments that are compatible with Windows Azure, and design a unified programming model for diverse data-intensive cloud computing paradigms.
Cloud and datacenter are converging terms, and as “the cloud” grows so does the datacenter. The move to centralized pools of computing resources has fast-tracked the creation of mega-sized datacenters with the energy appetites to match. It’s an issue that’s becoming more and more critical – as resources are limited and power requirements grow. The subject area will require significant research and development and computer scientists are stepping up to the plate to identify innovative solutions. One of these researchers is Jing SiYuan from the School of Computer Science at Leshan Normal University in China.
In a new research paper SiYuan observes that the most common way of saving energy is a dynamic on-demand resource provisioning system that shuts off idle servers to save energy. But it’s also important to maximize resource utilization by minimizing the number of servers that are required at a given time. The paper lays out a method to both minimize energy consumption and optimize VM migration. The proposed solution utilizes a network-flow-theory based approximate algorithm.
The findings show that compared to existing approaches, the algorithm can slightly decrease the energy consumption but greatly decrease the number of VM placement change (by almost 75 percent). By limiting VM migrations and starts/stops, resource overhead is reduced and system performance is improved.
HPC on Azure
Georgia State University researcher Dinesh Agarwal is adding to the growing body of information on HPC cloud with his dissertation on the “Scientific High Performance Computing (HPC) Applications On The Azure Cloud Platform.”
The accessibility of cloud computing resources like Amazon Web Services and Windows Azure makes for an attractive option for researchers across a wide range of discipline. Elasticity, pay-per-use and on-demand provisioning are desirable traits, but when it comes to performance, these offerings do not come with any guarantees. There’s also a lack of development tools, which hamper their use for HPC purposes. Insufficient portability is another roadblock.
“Among all clouds,” writes Agarwal, “the emerging Azure cloud from Microsoft in particular remains a challenge for HPC program development both due to lack of its support for traditional parallel programming support such as Message Passing Interface (MPI) and map-reduce and due to its evolving application programming interfaces (APIs).”
In light of this, Agarwal and his team created new frameworks and runtime environments to assist HPC application developers. The idea is to provide developers with tools like the ones from traditional parallel and distributed computing environments, such as MPI, to use for scientific application development on the Azure cloud platform. Agarwal notes that creating an efficient framework for any cloud platform is a challenging problem because the services are offered as a black-boxes accessible only via application programming interfaces (APIs).
The main components of this PhD thesis are: “(i) creating a generic framework for bag-of-tasks HPC applications to serve as the basic building block for application development on the Azure cloud platform, (ii) creating a set of APIs for HPC application development over the Azure cloud platform, which is similar to message passing interface (MPI) from traditional parallel and distributed setting, and (iii) implementing Crayons using the proposed APIs as the first end-to-end parallel scientific application to parallelize the fundamental GIS operations.”
Next >> Unified Programming Model
Data-Focused Unified Programming Model
Faced with the need to process large volumes of data, researchers have several computational paradigms to select from, including batch processing, iterative, interactive, memory-based, data flow oriented, relational, structured, among others. These different techniques are mostly incompatible with each other, but what if there was a unified framework that could support these different approaches? That’s exactly what research duo Maneesh Varshney and Vishwa Goudar from the Computer Science Department of the University of California, Los Angeles, had in mind when they developed Blue.
Figure 1: Blue framework provides a generic programming model for developing diverse cluster computing paradigms (Source) |
The researchers lay out their findings in a new technical report, “Blue: A Unified Programming Model for Diverse Data-intensive Cloud Computing Paradigms.”
They write: “The motivation for this paper is to ease the development of new cluster applications, by introducing an intermediate layer (Figure 1) between resource management and applications. This layer [serves as] a generic programming model upon which any arbitrary cluster application can be built. Not only will this significantly diminish the cost of developing applications, the users will be able to easily select the computation paradigm that best meets their needs.”
In developing the Blue framework and programming model, the researchers aimed for a solution that was neither too low-level and difficult to implement, nor too high-level and fixed. The paper includes an outline for implementation strategy, and points out the framework’s key strengths (notably efficiency and fault-tolerance for cluster programs) and limitations (while it targets data-intensive computational problems, it is not the best choice for task parallelism).