XSEDE14 Workshop Wrestles with Reproducibility

By Faith Singer-Villalobos

August 19, 2014

Imagine that you are trying to create a new sauce for a special dish, or the perfect adhesive for a new aircraft, or you’re flying a helicopter looking for victims of a natural disaster — and you succeed at each of these. This is wonderful news for your dinner guests, or the company that will use the new adhesive, and especially for the victims of the natural disaster. But the question is — Could you do it again and get the same results? Or, did you just get lucky the first time?

At the XSEDE14 conference in Atlanta, a roomful of computational veterans from inside and outside the NSF Extreme Science and Engineering Discovery Environment (XSEDE) participated in a full-day workshop on the topic of reproducibility, and clearly, there is a lot at stake.

“There is a growing awareness in the computational research community that this question of ‘can we do it again’ is becoming important for us in new ways, and the stakes are high — computational research is helping to save lives, answering policy questions, and making an impact on the world,” said Doug James, an HPC researcher at the Texas Advanced Computing Center, in his opening remarks for the workshop.

People have been thinking about reproducibility for a long time – it is one thing to reproduce a small scale lab experiment, or a computation on your desktop, but it is an entirely different matter to reproduce something that the Hubble Space Telescope did over five years at the cost of hundreds of millions of dollars, for example.

So, what is reproducibility? One working definition might resemble this: the ability to repeat an experiment to the degree necessary to assess the correctness and importance of the results. Practices that promote reproducibility include anything that makes a researcher more organized, provides a better audit trail, allows a researcher to track source code, and to know what data sources were used.

Victoria Stodden of Columbia University, who led a roundtable on the topic of reproducibility in 2009 and an ICERM workshop on Reproducibility in Computational and Experimental Mathematics in 2012, gave the keynote address at the XSEDE14 workshop. She raised the issue of a credibility crisis.

“Reproducibility has hit the popular press over the last several months,” Stodden said, citing recent coverage by The Economist (October 2013) and editorials in Nature and Science. Issues around the importance of reproducibility were catalyzed by the clinical trials scandal at Duke University in computational genomics where mistakes in the research were uncovered in 2010 in The Cancer Letter.

“This really goes to the heart of how important reproducibility issues are, and how we need to reconstruct the pipeline of thinking, reasoning and observation that a scientist does, but for the computational aspects, too, where many of these decisions are being manifest.”

Stodden also touched on separate discussions going on regarding different aspects of reproducibility such as statistical reproducibility, which questions the research decisions about the statistics and data analysis, and empirical reproducibility, which focuses on the reporting standards for the physical experiment, but does not focus on the computational steps.

Everyone in the room agreed that computational research is now in a position where complexity and mission criticality take on new import, and the community needs to develop confidence in the results of that research. But what should our priorities be? Training? Better tools? New steps in proposals and submissions?

NCSA Director Ed Seidel shared his view that there are three levels where things have to happen to get momentum moving in right direction: 1) campus level; 2) national level; and 3) publisher level.

Seidel said that local campuses have to think about how they can begin to support local data services, not just repositories, so there is a local structure. “This is a policy issue that vice chancellors for research and provosts need to take seriously…and there are organizations in place like Internet2 and Educause that span the research universities across the country that can help,” Seidel said. “It’s important to frame it not just as data but more around reproducibility; scope the problem beyond data and the data infrastructure.”

In addition, Seidel cited the XSEDE initiative as being a good organization for aiding the reproducibility process. XSEDE was instrumental in starting the National Data Service Consortium, aimed at organizing a number of individual efforts for data services around tools to create data collections to get Digital Object Identifiers or ‘DOIs’ associated with them and to provide linking services to publishers. While typically thought of as pointers to data collections, DOIs can also attach to code. This is a crucial part of reproducibility.

Professional societies and journals can play a part as well. Many are starting to require links to the data referenced in a publication. But reproducible practices must start in the research group.

Victoria Stodden, Assistant Professor, Department of Statistics, Columbia University and Lorena Barba, Assistant Professor, California Institute of Technology
Victoria Stodden, Assistant Professor, Department of Statistics, Columbia University and Lorena Barba, Assistant Professor, California Institute of Technology

Lorena Barba of George Washington University and a leading advocate of reproducible science said, “Conducting research reproducibly doesn’t mean someone else will reproduce the results, but that you are doing it as if someone would do this. By providing full documentation, access to input data and source code, the community will have confidence in your results and will label them as reproducible even if they are, in fact, not reproduced.”

Many other people added to the conversation including Mark Fahey of the National Institute of Computational Sciences. According to Fahey, the centers need to step up and take some responsibility for providing documentation about how users build and run their codes. Fahey said, “Centers can automatically collect information for each code built and each run of the code, and this information can be made available back to the researcher for publications if desired. There are already two prototypes (ALTD and Lariat) at a variety of computing centers around the world that collect a good portion of this information, and a new improved infrastructure is in development called XALT funded by NSF.”

Recommendations

At the outset of the workshop, the group committed to a key deliverable: recommendations in the form of priorities and initiatives for organizations and communities.

“It’s been implicit that ‘Of course, this is what people do, system administrators and researchers check to ensure that codes gets the same results after systems upgrades and when porting to new platforms’ but reproducibility has never been a formal enterprise,” said Nancy Wilkins-Diehr of the San Diego Supercomputer Center, who summarized the workshop and helped facilitate suggestions for moving forward.

“This is a good time to do this. Computational science is a respected contributor of the scientific knowledge base. Important decisions are now based on simulation. While this is gratifying, it has very real implications for our responsibilities as well,” she said.

The participants intend to move forward with humility, however. “The vision for the recommendations is to honor the reality of a diverse set of viewpoints and include ideas that might be outside of the box,” James concluded. Everyone agrees that there is a need to promote confidence-building tools and methodologies that do not adversely affect performance.

Recommendations will be ready in the September 2014 timeframe — please refer to xsede.org/reproducibility to read them. In addition, you can send comments and suggestions to help@xsede.org. The Help Desk will send any and all inquiries to the XSEDE team working on this initiative.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

ASC17 Makes Splash at Wuxi Supercomputing Center

April 24, 2017

A record-breaking twenty student teams plus scores of company representatives, media professionals, staff and student volunteers transformed a formerly empty hall inside the Wuxi Supercomputing Center into a bustling hub of HPC activity, kicking off day one of 2017 Asia Student Supercomputer Challenge (ASC17). Read more…

By Tiffany Trader

Musk’s Latest Startup Eyes Brain-Computer Links

April 21, 2017

Elon Musk, the auto and space entrepreneur and severe critic of artificial intelligence, is forming a new venture that reportedly will seek to develop an interface between the human brain and computers. Read more…

By George Leopold

MIT Mathematician Spins Up 220,000-Core Google Compute Cluster

April 21, 2017

On Thursday, Google announced that MIT math professor and computational number theorist Andrew V. Sutherland had set a record for the largest Google Compute Engine (GCE) job. Sutherland ran the massive mathematics workload on 220,000 GCE cores using preemptible virtual machine instances. Read more…

By Tiffany Trader

NERSC Cori Shows the World How Many-Cores for the Masses Works

April 21, 2017

As its mission, the high performance computing center for the U.S. Department of Energy Office of Science, NERSC (the National Energy Research Supercomputer Center), supports a broad spectrum of forefront scientific research across diverse areas that includes climate, material science, chemistry, fusion energy, high-energy physics and many others. Read more…

By Rob Farber

HPE Extreme Performance Solutions

Remote Visualization Optimizing Life Sciences Operations and Care Delivery

As patients continually demand a better quality of care and increasingly complex workloads challenge healthcare organizations to innovate, investing in the right technologies is key to ensuring growth and success. Read more…

Nvidia P100 Shows 1.3-2.3x Speedup Over K80 GPU on Financial Apps

April 20, 2017

When it comes to the true performance of the latest silicon, every end user knows that the best processor is the one that works best for their application. Read more…

By Tiffany Trader

Quantum Adds Global Smarts to StorNext File System

April 20, 2017

Companies that use Quantum’s StorNext platform to store massive amounts of data this week got a glimpse of new storage capabilities that should make it easier to access their data horde from anywhere in the world. Read more…

By Alex Woodie

Scaling an HPC Career in Nepal Can Be a Steep Climb

April 20, 2017

Umesh Upadhyaya works as an IT Associate at the International Centre for Integrated Mountain Development (ICIMOD) in Nepal, which supports the country’s one and only HPC facility. He is directly involved in an initiative that focuses on climate change and atmosphere modeling Read more…

By Nages Sieslack

Hyperion (IDC) Paints a Bullish Picture of HPC Future

April 20, 2017

Hyperion Research – formerly IDC’s HPC group – yesterday painted a fascinating and complicated portrait of the HPC community’s health and prospects at the HPC User Forum held in Albuquerque, NM. HPC sales are up and growing ($22 billion, all HPC segments, 2016). Read more…

By John Russell

ASC17 Makes Splash at Wuxi Supercomputing Center

April 24, 2017

A record-breaking twenty student teams plus scores of company representatives, media professionals, staff and student volunteers transformed a formerly empty hall inside the Wuxi Supercomputing Center into a bustling hub of HPC activity, kicking off day one of 2017 Asia Student Supercomputer Challenge (ASC17). Read more…

By Tiffany Trader

NERSC Cori Shows the World How Many-Cores for the Masses Works

April 21, 2017

As its mission, the high performance computing center for the U.S. Department of Energy Office of Science, NERSC (the National Energy Research Supercomputer Center), supports a broad spectrum of forefront scientific research across diverse areas that includes climate, material science, chemistry, fusion energy, high-energy physics and many others. Read more…

By Rob Farber

Hyperion (IDC) Paints a Bullish Picture of HPC Future

April 20, 2017

Hyperion Research – formerly IDC’s HPC group – yesterday painted a fascinating and complicated portrait of the HPC community’s health and prospects at the HPC User Forum held in Albuquerque, NM. HPC sales are up and growing ($22 billion, all HPC segments, 2016). Read more…

By John Russell

Knights Landing Processor with Omni-Path Makes Cloud Debut

April 18, 2017

HPC cloud specialist Rescale is partnering with Intel and HPC resource provider R Systems to offer first-ever cloud access to Xeon Phi "Knights Landing" processors. The infrastructure is based on the 68-core Intel Knights Landing processor with integrated Omni-Path fabric (the 7250F Xeon Phi). Read more…

By Tiffany Trader

CERN openlab Explores New CPU/FPGA Processing Solutions

April 14, 2017

Through a CERN openlab project known as the ‘High-Throughput Computing Collaboration,’ researchers are investigating the use of various Intel technologies in data filtering and data acquisition systems. Read more…

By Linda Barney

DOE Supercomputer Achieves Record 45-Qubit Quantum Simulation

April 13, 2017

In order to simulate larger and larger quantum systems and usher in an age of “quantum supremacy,” researchers are stretching the limits of today’s most advanced supercomputers. Read more…

By Tiffany Trader

Penguin Takes a Run at the Big Cloud Providers

April 12, 2017

HPC specialist Penguin Computing recently re-ran benchmarks from a study of its larger brethren and says the results show its ‘public cloud’ – Penguin on Demand (POD) – is among the leaders in cost and performance. Read more…

By John Russell

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Google Pulls Back the Covers on Its First Machine Learning Chip

April 6, 2017

This week Google released a report detailing the design and performance characteristics of the Tensor Processing Unit (TPU), its custom ASIC for the inference phase of neural networks (NN). Read more…

By Tiffany Trader

Quantum Bits: D-Wave and VW; Google Quantum Lab; IBM Expands Access

March 21, 2017

For a technology that’s usually characterized as far off and in a distant galaxy, quantum computing has been steadily picking up steam. Read more…

By John Russell

Trump Budget Targets NIH, DOE, and EPA; No Mention of NSF

March 16, 2017

President Trump’s proposed U.S. fiscal 2018 budget issued today sharply cuts science spending while bolstering military spending as he promised during the campaign. Read more…

By John Russell

HPC Compiler Company PathScale Seeks Life Raft

March 23, 2017

HPCwire has learned that HPC compiler company PathScale has fallen on difficult times and is asking the community for help or actively seeking a buyer for its assets. Read more…

By Tiffany Trader

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

CPU-based Visualization Positions for Exascale Supercomputing

March 16, 2017

In this contributed perspective piece, Intel’s Jim Jeffers makes the case that CPU-based visualization is now widely adopted and as such is no longer a contrarian view, but is rather an exascale requirement. Read more…

By Jim Jeffers, Principal Engineer and Engineering Leader, Intel

For IBM/OpenPOWER: Success in 2017 = (Volume) Sales

January 11, 2017

To a large degree IBM and the OpenPOWER Foundation have done what they said they would – assembling a substantial and growing ecosystem and bringing Power-based products to market, all in about three years. Read more…

By John Russell

TSUBAME3.0 Points to Future HPE Pascal-NVLink-OPA Server

February 17, 2017

Since our initial coverage of the TSUBAME3.0 supercomputer yesterday, more details have come to light on this innovative project. Of particular interest is a new board design for NVLink-equipped Pascal P100 GPUs that will create another entrant to the space currently occupied by Nvidia's DGX-1 system, IBM's "Minsky" platform and the Supermicro SuperServer (1028GQ-TXR). Read more…

By Tiffany Trader

Leading Solution Providers

Tokyo Tech’s TSUBAME3.0 Will Be First HPE-SGI Super

February 16, 2017

In a press event Friday afternoon local time in Japan, Tokyo Institute of Technology (Tokyo Tech) announced its plans for the TSUBAME3.0 supercomputer, which will be Japan’s “fastest AI supercomputer,” Read more…

By Tiffany Trader

IBM Wants to be “Red Hat” of Deep Learning

January 26, 2017

IBM today announced the addition of TensorFlow and Chainer deep learning frameworks to its PowerAI suite of deep learning tools, which already includes popular offerings such as Caffe, Theano, and Torch. Read more…

By John Russell

Is Liquid Cooling Ready to Go Mainstream?

February 13, 2017

Lost in the frenzy of SC16 was a substantial rise in the number of vendors showing server oriented liquid cooling technologies. Three decades ago liquid cooling was pretty much the exclusive realm of the Cray-2 and IBM mainframe class products. That’s changing. We are now seeing an emergence of x86 class server products with exotic plumbing technology ranging from Direct-to-Chip to servers and storage completely immersed in a dielectric fluid. Read more…

By Steve Campbell

BioTeam’s Berman Charts 2017 HPC Trends in Life Sciences

January 4, 2017

Twenty years ago high performance computing was nearly absent from life sciences. Today it’s used throughout life sciences and biomedical research. Genomics and the data deluge from modern lab instruments are the main drivers, but so is the longer-term desire to perform predictive simulation in support of Precision Medicine (PM). There’s even a specialized life sciences supercomputer, ‘Anton’ from D.E. Shaw Research, and the Pittsburgh Supercomputing Center is standing up its second Anton 2 and actively soliciting project proposals. There’s a lot going on. Read more…

By John Russell

HPC Startup Advances Auto-Parallelization’s Promise

January 23, 2017

The shift from single core to multicore hardware has made finding parallelism in codes more important than ever, but that hasn’t made the task of parallel programming any easier. Read more…

By Tiffany Trader

HPC Technique Propels Deep Learning at Scale

February 21, 2017

Researchers from Baidu’s Silicon Valley AI Lab (SVAIL) have adapted a well-known HPC communication technique to boost the speed and scale of their neural network training and now they are sharing their implementation with the larger deep learning community. Read more…

By Tiffany Trader

US Supercomputing Leaders Tackle the China Question

March 15, 2017

Joint DOE-NSA report responds to the increased global pressures impacting the competitiveness of U.S. supercomputing. Read more…

By Tiffany Trader

IDG to Be Bought by Chinese Investors; IDC to Spin Out HPC Group

January 19, 2017

US-based publishing and investment firm International Data Group, Inc. (IDG) will be acquired by a pair of Chinese investors, China Oceanwide Holdings Group Co., Ltd. Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Share This