February 19, 2014

Scheduling HPC as a Service

Carlo del Mundo

HPC has always been the go-to field for solving large-scale scientific and engineering problems. However, running applications on HPC systems requires significant technical know-how of the underlying systems software to effectively run applications. To address the tedium of setting up HPC environments, HPC as a Service (HPCaaS) has recently been proposed to move HPC into the cloud.

Borrowing from the success of Software as a Service (SaaS), HPCaaS purports to do the same — simplifying and commoditizing HPC to the masses via an automated cloud delivery system. The emphasis is making the process of scheduling jobs on HPC resources as transparent as possible. A user need not know how many processors to use but that a job gets executed with a specified amount of parallelism.

So, what’s limiting the adoption of HPC as a service? For one, typical HPC jobs are rigid in nature; they must execute under a finite set of resources. For instance, a user must explicitly specify the exact number of processors before submitting a parallel job. If there is enough processors, the job is run. If not, the system stalls until there is enough processors to fulfill the task. This rigid way of scheduling works well when there are enough resources to fulfill the job.  However, in cases where HPC is seen as a service, resource contention causes stalls resulting in poor utilization.

To alleviate the burden of poor utilization when there aren’t enough resources available, Kuo-Chan Huang, associate professor in the Department of Computer Science from National Taichung University applies the concept of moldable jobs (borrowed from MPI) to HPC.  He notes that, “a moldable job approach can automatically select a most appropriate amount of processors for a job’s execution based on application speedup models and workload conditions at the moment.”

Then, the workload management system adapts to the needs of the application — throttling down as resources become scarce. Such moldable properties for HPC jobs allows the job management and scheduling system to allocate resources based on the needs of the job. This flexibility is critical in executing jobs at an efficient rate.

Huang’s team proposes two new moldable scheduling techniques achieving up to 78% and 89% performance improvement in terms of average turnaround time.


Tags: ,