April 26, 2013

Research Roundup: Prepping Clouds for Data Influx

Tiffany Trader

The top research stories of the week have been hand-selected from leading scientific centers, prominent journals and relevant conference proceedings. In this week’s assortment, researchers explore the path to data-intensive cloud computing, investigate extreme heterogenous architectures for a truly multi-purpose cloud, and set out to create a more predictable programming model for cloud computing.

Towards Data-Intensive Cloud Computing

Cloud computing has many advantages, such as scalability, elasticity and high-availability. While the cloud can be a suitable paradigm for large computations, there are often I/O restrictions that create barriers to running data-intensive workloads. A team of researchers from the FAST-National University of Computer and Emerging Sciences in Karachi, Pakistan, address this topic in a new paper.

The authors observe that datasets in the terabyte to petabyte range are par for the course in HPC. In order to execute complex queries in a timely manner, systems need intensive computational power and massive storage capabilities. Data is also being generated at a fast pace which creates additional extensive challenges for storage, linking and processing.

By hiding some of these complexities in an abstraction layer, cloud computing is a promising approach, but only if it can handle ever-larger datasets effectively and efficiently.

“Data-intensive cloud provides an abstraction of high availability, usability, and efficiency to users,” the researchers assert. “However, underlying this abstraction, there are stringent requirements and challenges to facilitate scalable and resourceful services through effective physical infrastructure, smart networking solutions, intelligent software tools, and useful software approaches.”

According to the team, “data-intensive cloud computing involves study of both programming techniques and platforms to solve data-intensive tasks and management and administration of hardware and software which can facilitate these solutions.”

The paper provides a detailed survey of the numerous challenges and a discussion of possible solutions.

Extreme Heterogeneity

The last decade has seen continuing push toward heterogenous architectures, but is there a more extreme form of heterogeneity still to come? There is according to one group of computer scientists. The diverse research team, with affiliations that include Microsoft as well as US, Mexican, European and Asian universities, presented a paper on the subject at the International Symposium on Pervasive Systems, Algorithms and Networks (I-SPAN’ 2012) in San Marcos, Texas, December 13–15, 2012.

In “Introducing the Extreme Heterogeneous Architecture,” they write:

“The computer industry is moving towards two extremes: extremely high-performance high-throughput cloud computing, and low-power mobile computing. Cloud computing, while providing high performance, is very costly. Google and Microsoft Bing spend billions of dollars each year to maintain their server farms, mainly due to the high power bills. On the other hand, mobile computing is under a very tight energy budget, but yet the end users demand ever increasing performance on these devices.”

Conventional architectures have diverged to meet the needs of multiple user groups. But wouldn’t it be ideal if there was a way to deliver high-performance and low power consumption at the same time? The authors set out to explore a novel architecture model that addresses both these extremes, setting the stage for the Extremely Heterogeneous Architecture (EHA) project.

“EHA is a novel architecture that incorporates both general-purpose and specialized cores on the same chip,” the authors explain. “The general-purpose cores take care of generic control and computation. On the other hand, the specialized cores, including GPU, hard accelerators (ASIC accelerators), and soft accelerators (FPGAs), are designed for accelerating frequently used or heavy weight applications. When acceleration is not needed, the specialized cores are turned off to reduce power consumption. We demonstrate that EHA is able to improve performance through acceleration, and at the same time reduce power consumption.”

As a heterogeneous architecture, EHA is capable of accelerating heterogeneous workloads on the same chip. This is useful because it is often the case that datacenters (either in-house or in “the cloud”) provide many services – media streaming, searching, indexing, scientific computations, and so on.

The EHA project has two main goals. The first one is to design a chip that is suitable for many different cloud services, thereby greatly reducing both recurring and non-recurring costs of datacenters or clouds. Second, they plan to implement a light-weight EHA for use with mobile devices, with the aim of optimizing user experience under tight power constraints.

Cloud Programming for Predictable Performance

The International Journal of Grid and Distributed Computing includes an interesting study, titled “BSPCloud: A Hybrid Distributed-memory and Shared-memory Programming Model.”

A group of researchers from Shanghai University and China Telecom Corporation Ltd. write that “current programming models for cloud computing mainly focus on improving the efficiency of the cloud computing platforms but little has been done on the performance predictability of models.” In light of this, they are investigating a new programming model for cloud computing, called BSPCloud, that leverages multicore architectures while also providing predictable performance.

The team explain that “BSPCloud uses a hybrid of distributed-memory and shared-memory bulk synchronous parallel (BSP) programming model. Computing tasks are first divided into a set of coarse granularity bulks which are computed by the distributed-memory BSP model, and each coarse granularity bulk is further divided into a set of bulk threads which are computed by the shared-memory BSP model.”

The paper presents a proof-of-concept BSPCloud parallel programming library implemented in java. The researchers use the BSPCloud library on matrix multiplication, while the performance predictability and speedup are evaluated in the cloud platform. The results show the speedup and scalability of BSPCloud in different configurations.