July 29, 2019 — “Hello, world!” Beginning coders have probably seen this message, the results of a simple program that introduces the basic syntax of a computer language by printing words to the screen. Students even at the university level often don’t go much beyond the programming basics like “hello, world.” This lack of coding skills can hold back researchers who find themselves needing practical methods to deal with an increasingly complex wealth of big data.
“Students were graduating from degree programs in engineering, mathematics, computer science, and not having good, hands-on programming skills,” said Charlie Dey, director of Training and Professional Development at the Texas Advanced Computing Center (TACC), part of The University of Texas at Austin.
The focus at TACC is supercomputing, put simply as computing beyond what’s possible with just one laptop or workstation. TACC provides researchers access to some of the world’s most powerful computers. What’s more, explained Dey, TACC has an educational component that in some ways makes work a little easier for its consultants in high performance computing (HPC).
“Our consultants here at TACC said, maybe we can help this current generation of scientists understand how the Linux operating system works; how to submit jobs; how to recall their output; how to view their output; and how to analyze their output. These skills were missing,” said Dey.
TACC had been teaching for several years various scientific computing classes, mainly in parallel computing, which divides up tasks into a program that runs simultaneously on multiple computer processors. “At some point, we realized that in order to do that, students need to be on top of just plain programming, before they can start the parallel part,” said Victor Eijkhout, a research scientist in the HPC Software Tools Group at TACC.
“Students were coming in with some exposure to programming, but it would be in languages like Python, R, or MATLAB,” said Eijkhout. These languages are more intuitive with a higher level of abstraction, but they just don’t run efficiently on computer clusters at large scale, Eijkhout explained. “For getting high performance on one of the big machines, you really want one of the traditional languages, such as C++ and Fortran.” That’s because C++ and Fortran have less translation steps between the software and the underlying hardware, and they also have extensive software libraries built up over decades that deal specifically with tough science and engineering problems.
TACC recognized this need for a new kind of computer training to address these challenges. In 2015 it created the Introduction to Scientific Programming class, available to all students. “What this course is about is to introduce programming concepts and some of the more mainstream languages such as C++ and Fortran used in scientific programming to a variety of students,” Dey said. “The idea is to let them learn how to use algorithms; how to use a command line; how to program scientific jobs; how to apply algorithms to scientific jobs; and in a nutshell understanding how those algorithms fit the programs and what the best practices are for developing these algorithms.”
A couple of years later in 2017, Eijkhout revamped the course material. “One of the things that I wanted to do was to have the students do a final project that would be a scientific exploration of something interesting,” he explained. “I came up, initially, with a project for simulating the spread of infectious diseases and the role of vaccination. That is a current topic of discussion. I was hoping that this would be an engaging project for the students, and it has turned out that way,” Eijkhout said.
“I thought we were just going to be doing ‘Hello world’ type things and that we weren’t really going to be going in-depth into anything that really mattered too much,” said undergrad Eric Gagliano, a computational engineering senior at UT Austin who took the Introduction to Scientific Programming class. “We did learn the basics, but we also got to apply them to problems that are real-world relevant, in a sense, because even though the coding may be simple, it has far-reaching consequences for even low-level coding models.”
Eric’s class project investigated the effect on a population of herd immunity, a level of protection enjoyed by the unimmunized because the spread of contagious disease is blocked by those who are immunized. “One thing that we did was we built a model of SRI (Susceptible Infected Recovered),” Gagliano explained. He focused mainly on parameters such as the percentage of people vaccinated in a population.
“What we would try and do was find what percent of people that are vaccinated would it take to have a herd immunity response, where some people wouldn’t get sick,” Gagliano said. Another parameter he adjusted was the degree of contagiousness of the modeled disease. “Since we had the percent of the population vaccinated, we messed around with these two parameters to see how to get a herd immunity response from these parameters,” Gagliano explained.
His results showed dramatic effects of vaccination in the simplified computer experiment, said Gagliano. “Small swings in the percentage of people who are vaccinated end up having huge effects. If there is a ten percent swing in the people vaccinated in the population, you could go from having one person who dies or recovers in a population to almost everyone, which I thought was pretty insane,” he said.
“We learned how to program in C++. That was really useful,” said Beatriz Oregui, a third-year graduate student in Physics at UT Austin. Oregui saw value in taking the intro class.
“I think that programming is like a human language. I’ve learned several other languages, and I think every time you learn a language, you learn how to express things in a way you didn’t know before. It’s the same with programming C++ or FORTRAN. You learn how to think in a way that you hadn’t thought of before. That’s amazing,” Oregui said.
“Students might think that they are just learning a computer language, but I hope that we also teach them programming as a discipline,” Eijkhout added. He emphasized that programming is not just about carelessly coding formulas that churn results. “Programs quickly get so complex that programming becomes a discipline, or a science in its own right. We try to teach them how to program correctly and how to structure their programs. One thing I like to say is that your program should read like a story about your science,” Eijkhout said.
Besides disease spread, Eijkhout has added other project topics such Google’s algorithm for determining web page rank; and congressional redistricting, where the students write code that intentionally tries to assemble voting districts in a way where the minority party gets a majority of districts.
“This is pretty sophisticated programming,” Eijkhout said. “I’ve been pleasantly surprised at how well some of the students have been doing on this. You have to realize this comes at the end of one semester of programming. Students can start from absolute zero to doing something that is fairly substantial.”
The students don’t quite have the full access to TACC systems that professional researchers are typically awarded through the UT System or through the Extreme Science and Engineering Discovery Environment (XSEDE). But they do get exposure and some access to TACC systems such as Jetstream, a production cloud environment funded by the National Science Foundation. Jetstream hosts the virtual machines used by the students for the class. What’s more, they can access the Lonestar supercomputer, which is one of the fastest supercomputers in the world.
These are great tools to get experience on immediately, said Dey. “One reason you’re learning these skills, one reason you’re programming in a Linux environment and using the command-line interface and using the Intel complier is because this is what you’ll be using when you become an actual scientist or data analyst. We want them to get a hands-on experience of that now.”
Oregui agreed. “I think it was really hands-on. TACC staff were happy to answer any questions. They lectured for 10-15 minutes, then we had to do a problem during five minutes and work in groups. That was really fun,” she said.
“One thing that was great,” added Gagliano, “was that the TACC instructors would walk around and help us. If we ran into an error with our specific program, they’d be able to sit down and help us understand why it happened and how to not have it happen in the future. I thought it was really cool to have an interactive classroom. All in all, I really liked the TACC staff and I thought they did an amazing job.”
TACC offers additional classes through UT Austin that build on each other. After the Intro class, there’s a scientific and technical computing course that delves more into useful tools, code optimization, and hands-on work on high-performance computers. The parallel programming class teaches the ins-and-outs of distributed computing that can scale up to large systems. A software engineering and design class teaches students how to build the software stack. Dey described the teaching process as structuring continuity into the instruction, which builds up from beginning to end like a story arc to engage students.
“If we can engage the students and have them actively participate in class, have them discover things that they’re proud of and have them get skills that they can take to their next level of their career, then we’ve got it made,” Dey said.
Source: Jorge Salazar, TACC