March 23 — The Blue Waters project at the University of Illinois is pleased to announce the offering of a graduate course Introduction to High Performance Computing that will be offered as a collaborative, online course for multiple participating institutions. We are seeking other university partners that are interested in offering the course for credit to their students. The course includes online video lectures, quizzes, and homework assignments with access to free accounts on the Blue Waters system.
Participating institutions will need to provide a local instructor that will be responsible for advising the local students and officially assigning grades. Students will complete the online course exams and exercises as part of their grade.
The instructor for the course is Dr. David E. Keyes, Director of the Extreme Computing Research and Founding Dean of the Mathematical and Computer Sciences and Engineering Division at the King Abdullah University of Science and Technology (KAUST).
Prerequisites for the graduate students include:
- Experience working in a Unix environment
- Experience developing and running scientific codes written in C or C++
- Familiarity with basic numerical algorithms and basic computer architecture
The expectations for students, faculty, and the instruction team are noted below. Interested faculty should contact Steve Gordon, organizer of the Blue Waters course program at [email protected] or by phone at 614-292-4132.
Expectations for Participants
The expectations of the “collaborating faculty” are that they will:
- Establish a “collaborating course” (possibly a special topics course) on the autumn course catalog
- Promote this course to students on their own campus
- View the recorded lectures together with their local enrolled students
- Provide office hours to advise the students on the course content
- Proctor the course exam
- Provide regular feedback on behalf of the students to Dr. Hwu on the course throughout the semester
The expectations of Dr. Keyes and the O2PEP team are that they will:
- Provide an initial live web-cast to introduce the instructor, TAs, support staff, and introduce remote participants and faculty to one another
- Provide two recorded lectures per week
- Provide exercises and activities for the students
- Provide a web space for all course related materials
- Provide regular quizzes to allow the students to assess their own progress
- Provide a mid-term exam and a final exam
- Grade all the quizzes and exams
- Provide TAs to assist all students with questions about the course content, exercises, quizzes, and other materials covered during the semester
- Conduct an evaluation of the course with the participants and collaborating faculty
Expectations of the Students
- Students must register in a “collaborating course” on their own campus
- Students will need their own laptop or desktop system
- Students are expected to view the recorded lectures as a group with their local “collaborating faculty” to learn/discuss the content as a group
- Students are expected to contact the TAs at the University of Illinois for in-depth questions about the content, exercises, or other materials
- Students will be asked to submit quizzes for self-assessment purposes
- Students will be asked to submit a mid-term and a final exam for determining a grade, with a scale applied according to their own campus grading methods
Course Description
High performance computing algorithms and software technology, with an emphasis on using distributed memory systems for scientific computing. Theoretical and practically achievable performance for processors, memory system, and network, for large-scale scientific applications. The state-of-the-art and promise of predictive computational science and engineering. Algorithmic kernels common to linear and nonlinear algebraic systems, partial differential equations, integral equations, particle methods, optimization, and statistics. Computer architecture and the stresses put on scientific applications and their underlying mathematical algorithms by emerging architecture. State-of-the-art discretization techniques, solver libraries, and execution frameworks.
Prerequisites
Experience using C/C++ in a Unix environment, familiarity with basic numerical algorithms, and familiarity with computer architecture.
Course Flavor
A good subtitle for this course would be “Algorithms as if architecture mattered.” Architecture increasingly does matter today. During decades of progress using the paradigm of bulk synchronous processing on systems that were small enough to be considered “flat” and tightly coupled, architecture could largely be abstracted away through the message passing interface (MPI), an excellent example of “separation of concerns” in computer science. One could write in a high-level language without concern about where the compiler and runtime stashed the operands, because flops were relatively slow, which made everything else, including the physical layout of the architecture, appear nearly flat. One could count flops for serial complexity estimation, and determine how many could be done concurrently (between synchronization events) for parallel complexity estimation. Today, however, flops are cheap compared to the cost of moving data, in both time and energy expenditure. Therefore, we must worry about the topology of the network and the latencies and bandwidths of every part of the memory system and network in getting the operands to the FPUs. This gives high performance computing an emphasis different from some other types of computing. The same architecture advances that make it frustrating also make it exciting! What new high performance science and engineering computing users need are an introduction to the concepts, the hardware and software environments, and selected algorithms and applications of parallel scientific computing, with an emphasis on tightly coupled computations that are capable of scaling to thousands of processors and well beyond. The course material ranges (selectively) from high-level descriptions of motivating applications to low-level details of implementation, in order to expose the algorithmic kernels and the shifting balances of computation and communication between them. The homeworks range from simple theoretical studies to running and modifying demonstration codes. Modest programming assignments using MPI and PETSc culminate in an independent project leading to an in-class report.
Instructors
The principal lecturer will be David Keyes, Professor of Applied Mathematics and Computational Science, KAUST. Guest lecturers will be invited to speak on their specialties. Lectures from Extreme Computing Research Center staff members highlighting open source scientific software will be incorporated into the course.
Goals and Syllabus
The overall goal is to acquaint students who anticipate doing independent work that may benefit from large-scale simulation with current hardware, software tools, practices, and trends in parallel scientific computing, and to provide an opportunity to build and execute sample parallel codes. The software employed in course examples is freely available. The course is also designed to make students intelligent consumers and critics of parallel scientific computing literature and conferences.
Much of the motivation for parallel scientific computing comes from simulations based on discretizations of partial differential equations (PDEs, typically described with sparse matrices), or integral equations (IEs, typically described with dense matrices), or based on interacting particles (unstructured interaction lists, often embedded in octtrees). Of course, many applications are nonlinear, but these are typically approached as a series of linearized analyses. An understanding of the underlying equations, their physical meaning, and their mathematical analysis is important in some parts of the course and opens up many possibilities for independent projects. Other material is easily abstracted away from its underlying operator equation context to that of a generic bulk-synchronous computation that interleaves flows of data with operations on that data. The intention is to provide a course of benefit to a broad clientele of graduate researchers. In addition to computer scientists and applied mathematicians, students from mechanical engineering, electrical engineering, chemical engineering, materials science, and geophysics should find it of interest and approachable if they already have sufficient background in computing to be motivated towards the high end.
Thirteen algorithmic prototypes that occur regularly in scientific computing have been identified in a famous 2006 Berkeley technical report “The Landscape of Parallel Computing Research: The View from Berkeley” (UCB/EECS-2006-183). Though ten years old, students may want to download and devour this report as representative of the motivation and flavor of the course. The Berkeley prototypes are: dense direct solvers, sparse direct solvers, spectral methods, N-body methods, structured grids / iterative solvers, unstructured grids / iterative solvers, Monte Carlo (including “MapReduce”), combinatorial logic, graph traversal, graphical models, finite state machines, dynamic programming, backtrack/branch-and-bound. The first seven are essential floating point kernels and the last six essential integer kernels. The course examines several of these kernels in detail.
Lecture Coverage Includes:
- Introduction to large-scale predictive simulations: the combined culture of CS&E and HPC
- Introduction to parallel architecture and programming models
- Introduction to MPI, PETSc, and other software frameworks for HPC
- Parallel algorithms for the solution of large, sparse linear systems and nonlinear systems with large, sparse Jacobians
- Parallel algorithms for partial differential equations
- Parallel algorithms for N-body particle dynamics
Evaluation and Grading
Evaluation consists of four components: problem sets, project, final exam, and class participation at the flipped local site. Problem sets may be undertaken cooperatively (and this is encouraged), but each student must submit the homework separately under their own name, vouching for their own responsibility for the answers. The quality of the write-up is part of the grade. It is intended that all students should be able to score well on the problem sets, because they will be announced well in advance of their due dates and students have unlimited time for their own reading and research of the topics consultations with one another. The problem sets should create an extended ongoing discussion for the class community. The project is intended to be individual. If students want to team to undertake a “bigger” project and earn the same grade for it, this should be negotiated when projects are launched in mid-course. Projects will be submitted in report form, and each project will be featured for a short presentation to the class at the end of the semester. The final exam is, of course, individual.
Frequently Asked Questions
Must I understand PDEs and Linear Algebra well to take this course?
Algorithms for partial differential equation and linear algebraic computations motivate this course and add knowledge of their mathematics adds substance to the parallel applications. However, the aspects of these subjects that are important to success in this course have to do with understanding the choreography of data and hardware. If you are comfortable with following the data in these algorithms without a theoretical understanding of how they approximate the real world (modeling) or how rapidly they converge to it (analysis), you can survive this course and even excel in it. Mathematical theorems, e.g., tying convergence of an iterative method to condition number of a matrix have a quality of subroutines: if the upstream hypotheses (inputs) are verified, the consequences (outputs) may be chained into downstream uses in this course, e.g., complexity analyses.
Must I be facile in Unix and C/C++ to take this course?
In this course, you will work with sample applications written in C and you will build and execute on Linux-based distributed systems. One can pick up what one needs without being an expert in the tools applied.
Do you have a motto for success in difficult endeavors like high performance computing?
Actually, this is not a frequently asked question, but it should be. I do have a motto, taken from the most successful college football coach in history, Bear Bryant (1913—1983), as measured by the number of career wins amassed: “It’s not the will to win, but the will to prepare to win that makes the difference.”
—
Source: Ohio Supercomputer Center