Oct. 4, 2022 — As part of the ALCF’s summer student program, over 30 undergraduate and graduate students worked alongside staff mentors to gain real-world experience with supercomputing, data science, and AI projects.
Every summer, the Argonne Leadership Computing Facility (ALCF), a U.S. Department of Energy, Office of Science user facility located at DOE’s Argonne National Laboratory, hosts a new group of students to take on real-world scientific computing projects, providing valuable opportunities to work with research teams and learn new skills.
“It’s important to provide educational and career opportunities for students to take their next steps, gain confidence, and have new experiences working on impactful research projects outside of the classroom,” says Michael Papka, ALCF director and professor of computer science at the University of Illinois Chicago. “Our summer student program gives them the chance to see possibilities of what their careers could look like.”
This year’s class of ALCF summer students, which included more than 30 students ranging from undergraduates to doctoral candidates, tackled projects aimed at using artificial intelligence (AI) to analyze bird songs, visualizing large scientific datasets, advancing high energy physics research, and more. In the summaries below, five of the students spoke about what they worked on this summer and where they think the experience will take them next.
AI Analysis of Bird Audio
Saumya Singh, a graduate student studying AI at Northwestern University, is interested in researching self-supervised learning and reinforcement learning in AI and natural language processing. This summer she worked with mentors Michael Papka and Argonne computer scientist Nicola Ferrier on a project that used AI to analyze bird song audio collected from microphones in forests to provide insights into their ecosystems.
Singh was drawn to this project because of its significance to the environment and what it can reveal about forest ecosystems. “Birds or animals are a great predictor of the environment that they’re living in,” she says.
Using a new algorithm launched by Facebook AI Research for the analysis, her project employed self-supervised learning, which means the algorithm did not require labels to be provided by researchers.
“The main thing that I feel is going to help me is self-supervised learning because the main problem that we have for any of the data science projects is the pre-processing data labeling, so it will be great if we can solve the problem,” Singh says. “I can apply it to several other projects.”
Having previously worked with images and text, this project provided the opportunity to work with sound, large datasets, and new algorithms. “All these new techniques that I worked on,” Singh says, “seemed to be really fruitful for me to continue ahead in this data science-machine learning career path.”
Command-Line Interface, Python concurrency, and AI models
Alan Wang, a computer science student at the University of Illinois, was interested in working at the ALCF because of the powerful supercomputers and software tools it makes available for research. Though mostly interested in system security, Wang’s research at the ALCF has spanned facility operations, the Python programming language, and AI.
This summer he worked on three projects with ALCF mentors Paul Rich, George Rojas, and Bill Allcock: a command-line interface project aimed at making it easier for system administrators to carry out searches on the home directory for all of ALCF; a Python concurrency project comparing the speeds and performances of different currency libraries; and a project running AI models that use the open-source machine learning frameworks PyTorch and TensorFlow on the ALCF AI Testbed’s Cerebras and SambaNova systems.
Wang says that one of the most significant things he got out of this summer was learning more about using Python. He began the internship with around five years of Python experience, saying “I thought I had everything down but not even close. So, I learned a lot of Python and got exposed to using it in a lot of different environments.” Wang also was introduced to new software tools, such as the Emacs text editor, and worked with AI for the first time.
“I was surprised how interconnected AI was with systems, so knowing both sides and having an AI background will also be extremely helpful for me in the future,” Wang says.
Benchmarking Graph Neural Networks for Science on AI accelerators
Ryien Hosseini’s work with the ALCF team was at the intersection of neural network algorithms and high performance computing. “My projects used computing resources in order to see how far we can push these algorithms known as graph neural networks for various scientific applications,” he says.
Hosseini, a graduate student in electrical and computer engineering at the University of Michigan, was interested in working at the ALCF due to the research-oriented nature of the internship, and to have access to the facility’s powerful computational resources. This summer, with ALCF mentors Filippo Simini and Venkat Vishwanath, he co-authored a workshop paper that assessed the performance of graph neural networks on NVIDIA GPUs (graphics processing units) and worked on another project that looked at the performance of graph neural networks on specialized hardware platforms.
In addition, Hosseini contributed to an effort that uses chemical docking for drug discovery. The project builds on previous work because, instead of just using neural networks to select molecules, they now use “the neural network as a pre-filter in order to choose a top percentage of candidates, and then those will go into a classical non-machine learning based algorithm, which is better at arriving at those final numerical estimates,” says Hosseini.
“I feel like I learned a lot both from thinking about high-level research ideas, high-level algorithms, and then really getting into the nitty gritty and doing the programming in order to implement those algorithms,” says Hosseini, who will be applying to PhD programs in the fall. “Having this structured, rigorous research background has really been helpful.”
High-Quality Visualizations for Large Scientific Datasets
Alina Kanayinkal is interested in computer graphics, particularly the computational side of animation. In her summer at the ALCF, she worked with Message Passing Interface, or MPI (a communication protocol for programming parallel computers), and image rendering, continuing the work she began as a student assistant to Tommy Marrinan, an Argonne scientist teaching at the University of St. Thomas.
For her summer project, Kanayinkal’s work focused on creating a workflow for rendering high-quality visualizations of large-scale datasets. Her research aims to leverage cinematic rendering tools (similar to those used by Pixar and Dreamworks) to create visualizations of scientific datasets that are too large or too time consuming to render on a single computer. While the workflow is generic enough for many types of scientific data, Kanayinkal worked with data from a coupled fluid flow and particle simulation to investigate cancer cell transport as well as a molecular dynamics simulation to investigate material friction. The ultimate goal of her studies is to develop an easier and less time-consuming way to create these visualizations.
Kanayinkal says one of her major takeaways from this summer at ALCF was realizing that research is “not a huge, scary thing. It is a big thing, but it’s not so big that it’s overwhelming.” She also has become more comfortable with learning on the fly, for instance learning MPI and the OpenEXR format for imaging applications.
Moving forward she is continuing to work with Marrinan and choosing projects that she enjoys working on, saying “if it’s something that you like, and you get frustrated, you’re just going to take a five-minute break and then come back and continue working on it rather than just being like ‘Forget it. I’m going to do something else.’”
Hyperparameter Optimization and Scaling Studies for ML Models in Physics Research
As a student at the University of Notre Dame, Sirak Negash worked with machine learning (ML) to help analyze data from particle physics experiments. This inspired him to continue pursuing machine learning studies, especially for high energy physics. He initially applied for a position as a summer research aide to gain more experience in physics research. “I was pleasantly surprised when I was contacted for a role at ALCF that involved working with a ML model in physics,” he says.
Collaborating with ALCF mentor Sam Foreman, Negash worked on determining the impact of different hyperparameter configurations on model performance and training cost for simulations of lattice quantum chromodynamics (or the strong interactions between quarks and gluons).
“I was able to complete a detailed set of studies on how scaling the lattice volume impacted the training cost when run on the ALCF’s Theta supercomputer,” Negash says.
The effort has been helpful to the ALCF because future research on quantum chromodynamics “can greatly benefit from an understanding of how the performance of these simulations is scaling with larger and larger lattice size,” he says.
After spending his summer at the ALCF, Negash says he “developed a new appreciation for science beyond the classroom and even beyond a physical lab, and the lessons and skills I have learned through this opportunity in ML research have kindled in me the desire to pursue a career in data analytics.”
Source: Emily Stevens, ALCF