April 10, 2013

Coursera Offers HPC Techniques to Scientific Computing

Ian Armas Foster

Data scientists and those adept at scientific computing are numerous, but not quite numerous enough to meet the demands of the computing marketplace.

Further, as science progresses to more complex and data-intensive questions, such as researching the beginning of the universe or getting more in-depth genomic results, it becomes imperative for more scientists and researchers to learn these HPC techniques to cut down on query times.

Randall J. Leveque, Professor of Applied Mathematics and Adjunct Professor of Mathematics at the University of Washington in Seattle, will be conducting a free course that brings the principles of parallelism in high performance computers to the people who are running applications on multi-processor laptops and desktops or on cloud services.

Leveque’s principle is that a person’s time is more valuable than a computer’s time. As such, any research query or scientific question that you can parallelize on your computer’s multiple processors or via a cloud server is a benefit. However, a fast program is of course useless if that program produces inaccurate results.

“The goal is not to teach the most advanced techniques with supercomputers, but rather techniques that you can use immediately on your own laptop, desktop, cluster, or even in the cloud,” Leveque remarked in his introductory video below.

The ten week course, which will require ten to twelve work hours per week, will cover both serial and parallel computing and the computing languages that dictate them, such as Fortran 90, OpenMP, MPI, and Python. The full list of what it is to be covered is below:

  • Working at the command line in Unix-like shells (e.g. Linux or a Mac OSX terminal).
  • Version control systems, particularly git, and the use of Github and Bitbucket repositories.
  • Work habits for documentation of your code and reproducibility of your results.
  • Interactive Python using IPython, and the IPython Notebook.
  • Python scripting and its uses in scientific computing.
  • Subtleties of computer arithmetic that can affect program correctness.
  • How numbers are stored: binary vs. ASCII representations, efficient I/O.
  • Fortran 90, a compiled language that is widely used in scientific computing.
  • Makefiles for building software and checking dependencies.
  • The high cost of data communication.  Registers, cache, main memory, and how this memory hierarchy affects code performance. 
  • OpenMP on top of Fortran for parallel programming of shared memory computers, such as a multicore laptop.
  •  MPI on top of Fortran for distributed memory parallel programming, such as on a cluster.
  • Parallel computing in IPython.
  • Debuggers, unit tests, regression tests, verification and validation of computer codes.
  • Graphics and visualization of computational results using Python.

The course, which again is free to participate in, is scheduled to start on May 1st