ORNL Researchers Bridge the Gap Between R, HPC Communities

April 20, 2017

OAK RIDGE, Tenn., April 20, 2017 — The ability to realistically simulate a range of scientific phenomena, such as supernova explosions and the behavior of materials at the nanoscale, has proven a boon to researchers across the scientific spectrum.

Many now consider simulation the third pillar of scientific inquiry, alongside the centuries-old pillars of theory and experiment.

Yet for some areas of science, parallel computing’s promise remains untapped—specifically, fields such as statistics, genomics, finance, economics, sociology, and the environmental sciences, all of which rely strongly on the R programming language. That’s a shame, says Oak Ridge National Laboratory’s George Ostrouchov, who is heading up the Programming with Big Data in R (pbdR) project to bring these untapped domains into the high-performance computing fold.

These “untapped domains” represent an enormous potential user base for world-class computers such as those owned by the Department of Energy and an enormous opportunity for the power of HPC to accelerate research breakthroughs across the statistical sciences.

Ostrouchov and his colleagues have started the ball rolling with a paper in the journal Big Data Research that serves as a tutorial on how to achieve scalable performance with R on leadership computing resources such as ORNL’s Titan, currently the fastest computer in the country for open science. “70-80 percent of statisticians use R,” said Ostrouchov, “and we want to make HPC tools usable for the statistics community.”

The goal of pbdR is to make the tools familiar to R-based communities compatible with HPC, as opposed to the much more taxing option of having these communities change the way in which they do research. Whereas traditional simulation science produces data, R-based research areas seek to use and understand data.

“These communities don’t know HPC, so by providing these tools at least part of their workflow is in a familiar environment,” said Ostrouchov. “We want to make it easier for these communities to accelerate their science.”

Ostrouchov is a statistician by training, but his work at ORNL has brought him into contact with the most powerful machines and some of the brightest minds in the HPC community. His previous experience with R, and his more recent experience with HPC, gave him some ideas on what might work and what wouldn’t, and which pieces were most likely to fit together.

After exploring the potential of R on world-class resources such as Titan for the Department of Energy’s Office of Science and the now retired Kraken for the National Science Foundation, Ostrouchov and his colleagues Wei-Chen Chen, Drew Schmidt and Pragneshkumar Patel have made great strides in merging the two seemingly disparate platforms, and by extension two very different cultures.

The evolution of R

R’s real strength lies in data exploration and the creation of graphics to explain complex datasets, supported by an unmatched variety of transparent and understandable machine learning tools. “It’s probably the gold standard for graphics in data exploration,” said Ostrouchov. Much like other popular languages such as Python and MATLAB it’s scripted – as opposed to compiled as in the case of C and Fortran.

This presents a unique set of challenges for running effectively on HPC platforms, particularly given that all scripted languages load libraries dynamically during runtime, a process which can bog down file systems when thousands of parallel library requests are made.

Fortunately, Ostrouchov’s BDR co-author Mike Matheson has developed a set of partial solutions that enable libraries to load almost seamlessly up to 10,000 cores thus far on Titan. These solutions are still being optimized, meaning that the 10,000-core metric will almost certainly increase in the future.

Thus far the overhead of using a scripted language to drive the libraries has proven remarkably small, approaching the performance of the underlying linear algebra code known as ScaLAPACK used by other codes to perform matrix calculations. “In theory,” said Ostrouchov, “there’s no reason that R couldn’t match the performance of the leading science codes on Titan.”

Equally important is the fact that the pbdR team has made it possible to run R on HPC systems without changes to the serial code in matrix computations, meaning much less work for programmers looking to make the jump; the same code will do the same thing on a single-processor matrix or a multi-processor matrix such as those employed across Titan, or any other world-class HPC resource for that matter.

Portability was always a top priority, said Ostrouchov, adding that the same code will work on nearly any HPC resource, no matter the architecture; one need only swap out the libraries.

The pbdR team’s achievements bode well for the future of R and HPC, but bringing together these two very different communities will take time, and a few pioneers such as ORNL computational biologist Dan Jacobson who, along with a team including graduate research assistant and PhD student at the University of Tennessee’s Bredesen Center Piet Jones, is using R on Titan to advance the state of the art in genomics and bioenergy.

The team has used the pbdR team’s streamlined R bindings for MPI, a messaging framework that enables the many compute nodes in a parallel machine such as Titan to communicate, to distribute gene expression data to multiple nodes for rapid analyses. This technique will enable a better understanding of the biological functions assigned to individual genes and help discover what metabolites are driving certain observations.

“We need to know what is influencing a biological function, whether this be a gene, regulatory element, metabolite or something else,” said Jones, adding that these analyses help researchers better understand pleiotropy, or the idea that genes have multiple functions, and epistasis, in which the interactions of multiple genes results in a certain characteristic.

Their various projects allow for multiple comparisons using different techniques, and by extension allow them to tackle ever bigger problems in genomics.

Jacobson is also now collaborating with other institutions to use R to study plant microbial interfaces for bioenergy applications, work that he can later apply to clinical datasets for a scientific win-win across very different domains.

It will no doubt be the first of many as the R programming community becomes more comfortable with this whole new world of massive computing capability.

Titan is part of the Oak Ridge Leadership Computing Facility, a DOE Office of Science User Facility located at ORNL.

About Oak Ridge National Laboratory

Oak Ridge National Laboratory is supported by the DOE’s Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.


Source: Scott Jones, ORNL Communications

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

DoE Awards 24 ASCR Leadership Computing Challenge (ALCC) Projects

June 28, 2017

On Monday, the U.S. Department of Energy’s (DOE’s) ASCR Leadership Computing Challenge (ALCC) program awarded 24 projects a total of 2.1 billion core-hours at the Argonne Leadership Computing Facility (ALCF). The o Read more…

By HPCwire Staff

STEM-Trekker Badisa Mosesane Attends CERN Summer Student Program

June 27, 2017

Badisa Mosesane, an undergraduate scholar who studies computer science at the University of Botswana in Gaborone, recently joined other students from developing nations around the world in Geneva, Switzerland to particip Read more…

By Elizabeth Leake, STEM-Trek

The EU Human Brain Project Reboots but Supercomputing Still Needed

June 26, 2017

The often contentious, EU-funded Human Brain Project whose initial aim was fixed firmly on full-brain simulation is now in the midst of a reboot targeting a more modest goal – development of informatics tools and data/ Read more…

By John Russell

DOE Launches Chicago Quantum Exchange

June 26, 2017

While many of us were preoccupied with ISC 2017 last week, the launch of the Chicago Quantum Exchange went largely unnoticed. So what is such a thing? It is a Department of Energy sponsored collaboration between the Univ Read more…

By John Russell

HPE Extreme Performance Solutions

Optimized HPC Solutions Driving Performance, Efficiency, and Scale

Technology is transforming nearly every human and business process, from driving business growth, to translating documents in real time, to enhancing decision-making in areas like financial services and scientific research. Read more…

UMass Dartmouth Reports on HPC Day 2017 Activities

June 26, 2017

UMass Dartmouth's Center for Scientific Computing & Visualization Research (CSCVR) organized and hosted the third annual "HPC Day 2017" on May 25th. This annual event showcases on-going scientific research in Massach Read more…

By Gaurav Khanna

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “pre-exascale” award), parsed out additional information ab Read more…

By Tiffany Trader

Tsinghua Crowned Eight-Time Student Cluster Champions at ISC

June 22, 2017

Always a hard-fought competition, the Student Cluster Competition awards were announced Wednesday, June 21, at the ISC High Performance Conference 2017. Amid whoops and hollers from the crowd, Thomas Sterling presented t Read more…

By Kim McMahon

GPUs, Power9, Figure Prominently in IBM’s Bet on Weather Forecasting

June 22, 2017

IBM jumped into the weather forecasting business roughly a year and a half ago by purchasing The Weather Company. This week at ISC 2017, Big Blue rolled out plans to push deeper into climate science and develop more gran Read more…

By John Russell

DoE Awards 24 ASCR Leadership Computing Challenge (ALCC) Projects

June 28, 2017

On Monday, the U.S. Department of Energy’s (DOE’s) ASCR Leadership Computing Challenge (ALCC) program awarded 24 projects a total of 2.1 billion core-hour Read more…

By HPCwire Staff

DOE Launches Chicago Quantum Exchange

June 26, 2017

While many of us were preoccupied with ISC 2017 last week, the launch of the Chicago Quantum Exchange went largely unnoticed. So what is such a thing? It is a D Read more…

By John Russell

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “ Read more…

By Tiffany Trader

Tsinghua Crowned Eight-Time Student Cluster Champions at ISC

June 22, 2017

Always a hard-fought competition, the Student Cluster Competition awards were announced Wednesday, June 21, at the ISC High Performance Conference 2017. Amid wh Read more…

By Kim McMahon

GPUs, Power9, Figure Prominently in IBM’s Bet on Weather Forecasting

June 22, 2017

IBM jumped into the weather forecasting business roughly a year and a half ago by purchasing The Weather Company. This week at ISC 2017, Big Blue rolled out pla Read more…

By John Russell

Intersect 360 at ISC: HPC Industry at $44B by 2021

June 22, 2017

The care, feeding and sustained growth of the HPC industry increasingly is in the hands of the commercial market sector – in particular, it’s the hyperscale Read more…

By Doug Black

At ISC – Goh on Go: Humans Can’t Scale, the Data-Centric Learning Machine Can

June 22, 2017

I've seen the future this week at ISC, it’s on display in prototype or Powerpoint form, and it’s going to dumbfound you. The future is an AI neural network Read more…

By Doug Black

Cray Brings AI and HPC Together on Flagship Supers

June 20, 2017

Cray took one more step toward the convergence of big data and high performance computing (HPC) today when it announced that it’s adding a full suite of big d Read more…

By Alex Woodie

Quantum Bits: D-Wave and VW; Google Quantum Lab; IBM Expands Access

March 21, 2017

For a technology that’s usually characterized as far off and in a distant galaxy, quantum computing has been steadily picking up steam. Just how close real-wo Read more…

By John Russell

Trump Budget Targets NIH, DOE, and EPA; No Mention of NSF

March 16, 2017

President Trump’s proposed U.S. fiscal 2018 budget issued today sharply cuts science spending while bolstering military spending as he promised during the cam Read more…

By John Russell

HPC Compiler Company PathScale Seeks Life Raft

March 23, 2017

HPCwire has learned that HPC compiler company PathScale has fallen on difficult times and is asking the community for help or actively seeking a buyer for its a Read more…

By Tiffany Trader

Google Pulls Back the Covers on Its First Machine Learning Chip

April 6, 2017

This week Google released a report detailing the design and performance characteristics of the Tensor Processing Unit (TPU), its custom ASIC for the inference Read more…

By Tiffany Trader

CPU-based Visualization Positions for Exascale Supercomputing

March 16, 2017

In this contributed perspective piece, Intel’s Jim Jeffers makes the case that CPU-based visualization is now widely adopted and as such is no longer a contrarian view, but is rather an exascale requirement. Read more…

By Jim Jeffers, Principal Engineer and Engineering Leader, Intel

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Nvidia’s Mammoth Volta GPU Aims High for AI, HPC

May 10, 2017

At Nvidia's GPU Technology Conference (GTC17) in San Jose, Calif., this morning, CEO Jensen Huang announced the company's much-anticipated Volta architecture a Read more…

By Tiffany Trader

Facebook Open Sources Caffe2; Nvidia, Intel Rush to Optimize

April 18, 2017

From its F8 developer conference in San Jose, Calif., today, Facebook announced Caffe2, a new open-source, cross-platform framework for deep learning. Caffe2 is the successor to Caffe, the deep learning framework developed by Berkeley AI Research and community contributors. Read more…

By Tiffany Trader

Leading Solution Providers

MIT Mathematician Spins Up 220,000-Core Google Compute Cluster

April 21, 2017

On Thursday, Google announced that MIT math professor and computational number theorist Andrew V. Sutherland had set a record for the largest Google Compute Engine (GCE) job. Sutherland ran the massive mathematics workload on 220,000 GCE cores using preemptible virtual machine instances. Read more…

By Tiffany Trader

Google Debuts TPU v2 and will Add to Google Cloud

May 25, 2017

Not long after stirring attention in the deep learning/AI community by revealing the details of its Tensor Processing Unit (TPU), Google last week announced the Read more…

By John Russell

Russian Researchers Claim First Quantum-Safe Blockchain

May 25, 2017

The Russian Quantum Center today announced it has overcome the threat of quantum cryptography by creating the first quantum-safe blockchain, securing cryptocurrencies like Bitcoin, along with classified government communications and other sensitive digital transfers. Read more…

By Doug Black

US Supercomputing Leaders Tackle the China Question

March 15, 2017

Joint DOE-NSA report responds to the increased global pressures impacting the competitiveness of U.S. supercomputing. Read more…

By Tiffany Trader

Groq This: New AI Chips to Give GPUs a Run for Deep Learning Money

April 24, 2017

CPUs and GPUs, move over. Thanks to recent revelations surrounding Google’s new Tensor Processing Unit (TPU), the computing world appears to be on the cusp of Read more…

By Alex Woodie

DOE Supercomputer Achieves Record 45-Qubit Quantum Simulation

April 13, 2017

In order to simulate larger and larger quantum systems and usher in an age of “quantum supremacy,” researchers are stretching the limits of today’s most advanced supercomputers. Read more…

By Tiffany Trader

Six Exascale PathForward Vendors Selected; DoE Providing $258M

June 15, 2017

The much-anticipated PathForward awards for hardware R&D in support of the Exascale Computing Project were announced today with six vendors selected – AMD Read more…

By John Russell

Messina Update: The US Path to Exascale in 16 Slides

April 26, 2017

Paul Messina, director of the U.S. Exascale Computing Project, provided a wide-ranging review of ECP’s evolving plans last week at the HPC User Forum. Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Share This