Industry Program Director Brendan McGinty welcomed guests to the annual National Center for Supercomputing Applications (NCSA) Industry Conference, October 8-10, on the University of Illinois campus in Urbana (UIUC). One hundred seventy from 40 organizations attended the invitation-only, two-day event.
The program opened with a keynote address by Zhanna Golodryga, senior vice-president and chief digital and administrative officer at Phillips 66. With 18 years of experience in the oil and gas (O&G) field, Golodryga had a ring-side seat as technology revolutionized the industry that powers our world.
Phillips 66 is a leader in digital O&G field operations, and Golodryga has led that effort since 2017. Smart trucks, sensors and drones optimize machine health; simulations informed by real-time data improve the safety and quality of field staff training. Supervisory control and data acquisition, cybersecurity and pipeline integrity are constantly being improved.
Golodryga believes that data is the oil of the future. “Whoever figures it out first and best will win.” Because of this, Phillips’ workforce favors digital natives. They employ more than 200 PhD-level scientists and engineers whose research involves energy optimization. Forty percent of their team are millennials, or younger. Because they are learning, the company doesn’t subscribe to a ‘fail fast’ philosophy. “Instead, we fail, learn and move on,” she said.
Because Phillips’ compute needs are largely satisfied in the cloud (they utilize AWS, Azure and Google), they seek help from NCSA experts to guide them through the process of refining business analytics and data science for end-to-end value chain optimization. In particular, NCSA Technical Program Manager Dora Cai’s team optimized an algorithm for their Bayway Refinery in New Jersey. “Moving from a linear program to a statistical model improved the accuracy and reduced the volatility of the algorithm,” said Phillips 66 Business Transformation Manager Shawn Behounek.
The showcase continued with the following highlights:
Data Analytics: Neural Network Model Explainability was presented by Capital One Senior Software Engineer Austin Walters who described how his company works with NCSA Research Scientists Eliu Huerta and Aiman Soliman, and Data Engineer Aaron Saxton. While Capital One is based in Virginia, Walters works out of the UIUC Research Park Center for Machine Learning in Urbana.
“Capital One is a highly regulated environment that is constantly under audit,” said Walters. “It is important to identify data which could, potentially, cause our models to become biased,” he added.
To accomplish this, they use a “Smart Text Profiler” for deep learning models based on Google’s open-sourced, neural network-based technique for natural language processing pre-training called BERT (Bidirectional Encoder Representations from Transformers). This provides a level of granularity that mitigates risk. Walters said, “It quickly and reliably identifies and removes sensitive data while providing auditors with documentation for why and how such decisions are made.”
Workflow Optimization: From Concept to Clinical Setting at Mayo Clinic. This collaboration is led by Liudmila Mainzer (NCSA Technical Program Manager, and Research Assistant Professor in the Institute for Genomic Biology).
The talk was presented by Mayo Clinic IT Lead Analyst Nate Mattson and UIUC Bioinformatics/Crop Sciences Professor Matt Hudson. As partners in the “Mayo Illinois Grand Challenge Program,” NCSA provides specialized support and guidance with modularity and tools assessment, as well as production development, workflow management and HPC optimization.
Genomics research is compute, data and storage intensive which causes workflows to be extremely slow and expensive. The volume also presents data management and testing challenges.
To develop the “Mayomics Pipeline,” they used Sentieon tools and Web Interface Definition Language (WIDL). The combination made trio analyses possible which more accurately and efficiently identify genetic illnesses across three genomes.
Mattson summarized the outcome: “precision of variant calling increased (fewer false positives); recall improved (fewer false negatives); and the level of detection for indels almost doubled.” What does that mean to the rest of us? Mayo and NCSA are making precision medicine more accurate and affordable to the masses!
Scaling Up Implicit Finite Element Analysis. This project is led at NCSA by Seid Koric, NCSA technical assistant director and research professor in the UIUC Mechanical Science & Engineering/MechSE department. The talk was presented jointly by NCSA Research Scientist Erman Guleryuz and Rolls Royce Computational Sciences Manager Todd Simons, who attended remotely from his office in Indianapolis, Indiana, where his company employs around 4,000.
According to Guleryuz, “The collaboration with NCSA Legacy Partner Rolls Royce, NCSA, Cray Computing and Livermore Software Technology Corporation (LS-DYNA) set out to explore the future of implicit computations as both the scale of finite element models and the systems they run on increase.”
Rolls Royce presented a challenge associated with marine and aeronautic gas turbine engines; specifically, tip clearance, which affects engine efficiency. Adding to the largest set of implicit models known to LS-DYNA — “a general-purpose, finite element program capable of simulating complex real world problems used by the automobile, aerospace, construction, military, manufacturing and bioengineering industries”— Rolls Royce created a set of dummy engine models, 105M DOF and 200M DOF, which are the largest implicit models known to LS-DYNA.
The models were designed to run on two Cray systems: Blue Waters (NCSA) and Titan (Oak Ridge National Laboratory). High-fidelity design analysis of the new models required 50 million lines of Fortran code which took years to write, but they ultimately reduced the time-to-solution from 1,000 to 10 hours. In addition to optimizing gas turbine engines for the marine and aeronautical industries, their work will benefit many others. The four-way collaboration won a 2018 HPCwire Editors’ Choice Award.
Following lunch sponsored by Intel, NCSA Integrated Cyberinfrastructure Director Amy Schuele presented an update about computational resources recently added to the NCSA portfolio. The IBM Hardware Accelerated Learning cluster, “HAL,” was launched March 25, 2019, at the NCSA Innovative Systems Lab. HAL is a deep-learning environment suitable for artificial intelligence (AI) workflows in a variety of research domains.
New cybersecurity provisions were introduced, including a leading-edge process for identity and access management in the high-throughput Condor pool called SciTokens, “a capabilities-based authorization infrastructure for distributed scientific computing which helps scientists manage their security credentials more reliably and securely.” Additionally, security data-sharing was improved with SDAIA, a secure, fast and decentralized model for disseminating threat intelligence data.
The iForge cluster features four hardware platforms; each configured for different compute needs, and available for purchase to industry partners at varied prices per core hour. In 2020, they will offer a docker and open containers initiative. If the workflow outgrows NCSA resources, bursting into any commercial cloud is possible from iForge. A small amount of storage accompanies each compute allocation, and more is available for purchase. But if sharing isn’t appropriate for the workflow, NCSA also offers dedicated nodes, head nodes and storage.
Principal Research Scientist Jong Lee talked about NCSA’s new software directorate. More than 30 developers skilled in many program languages support a range of industry partner projects; anywhere from .25 to 6 FTE’s are hired on contracts ranging from six months to years in length.
NCSA’s cloud computing and container environments make it easier to share with others and are a good fit for a variety of platforms, languages, system requirements and scalability. Many industry partners are moving their workflows into the cloud, and Docker enables that transition.
Lead Research Programmer Rob Kooper described NCSA’s OpenStack cluster in the Innovative Systems Laboratory. This is a condo model where industry partners can invest in dedicated hardware and storage. Envisioned by NCSA Director Rob Pennington in 2015, it originally held 20 compute nodes, 24 cores, 256 GB memory and 200 TB of storage. Today it has 43 nodes, and 627 running instances; 2,858 cores are in use. “Its popularity demonstrated the need for a cost-recovery model,” said Kooper.
To conclude day one of the conference, McGinty introduced five data-intensive projects that showcase how UIUC uses NCSA resources:
● Director of Analytics and Football Technology Kingsley Osei-Asibey and NCSA Senior Project Coordinator Loretta Auvil explained how technology is used for virtual reality training, and to help coaches with prospect assessment and recruitment.
● Assistant Professor Joseph Yun (Illinois College of Media Research) described the “Social Media Macroscope: Accessible Data Science on Social Data.”
● NCSA Faculty Fellow Shelly Zhang and MechSE Scholar Diab Abueidda presented, “Machine Learning Accelerated Structural Design Optimization.”
Abueidda will soon join NCSA Industry as a postdoc. His work with Dr. Koric and Nahil Sobh (Beckman Institute) is popular since it best demonstrates the confluence of modeling, simulation and AI.
Koric explained, “Both the data generation of 15,000 variations of the topological design of a nonlinear structure, and training with a special “Deep” Convolutional Neural Network (CNN) is conducted on iForge (CPUs and GPUs respectively).” The trained, learnable parameters (weights and biases) can be transferred to any low-end computing platform, such as a laptop, and the optimal topological solutions can be found there instantly for any variation of input parameters. “We believe that similar AI-driven models will pave the way for remarkably efficient, high-fidelity design and modeling; particularly for architectured and bio-inspired 3D materials,” Koric added.
● NCSA Research Scientists Jeff Terstriep and Aiman Soliman presented, “Keras Spatial: Enabling Fast AI Experimentation for the Geospatial Community.”
● NCSA Senior Research Scientist Matias Carrasco Kind is the Data Release Scientist with the Dark Energy Survey (DES). His presentation was titled, “Scientific Platforms for Image Analysis in the Era of Cloud Computing.”
Thank you, sponsors!
The NCSA Industry Conference was sponsored by Amazon Web Services, Cray, DataDirect Networks, Dell, Intel, Indigo Ag, and Panasas.
Check out part two of the 2019 NCSA Industry Conference recap, including highlights from NCSA Director William Gropp’s keynote and details about NCSA’s new AI initiative!
Photos by Leake and Darrell Hoemann
About the Author
HPCwire Contributing Editor Elizabeth Leake is a consultant, correspondent and advocate who serves the global high performance computing (HPC) and data science industries. In 2012, she founded STEM-Trek, a global, grassroots nonprofit organization that supports workforce development opportunities for science, technology, engineering and mathematics (STEM) scholars from underserved regions and underrepresented groups.
As a program director, Leake has mentored hundreds of early-career professionals who are breaking cultural barriers in an effort to accelerate scientific and engineering discoveries. Her multinational programs have specific themes that resonate with global stakeholders, such as food security data science, blockchain for social good, cybersecurity/risk mitigation, and more. As a conference blogger and communicator, her work drew recognition when STEM-Trek received the 2016 and 2017 HPCwire Editors’ Choice Awards for Workforce Diversity Leadership.