NERSC Readying for Delivery of Cori Phase 2 Knights Landing-Based System in July

June 21, 2016

June 21 — For the past year, staff at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have been preparing users of 20 leading science applications for the arrival of the second phase of its newest supercomputer, Cori. Cori consists of more than 9,300 nodes containing Intel’s Xeon Phi Knights Landing processor, which was officially unveiled today, June 20, at the International Supercomputer Conference in Germany. The first compute cabinets are scheduled to arrive in July.

When fully installed, Cori will be the largest system for open science based on Knights Landing processors. The Knights Landing nodes will comprise phase two of the Cori system.

To ensure that a significant number of its 6,000 users could make the most effective use of this new manycore architecture, NERSC staff selected 20 leading applications for the NERSC Exascale Scientific Applications Program (NESAP), a collaborative effort that partners NERSC, Intel and Cray experts with code teams across the U.S. Lessons learned from working with the 20 NESAP codes are being used to develop an optimization strategy that the rest of the user base can quickly adopt.

NERSC staff will be presenting five papers at the Intel Xeon Phi User Group (IXPUG) workshop at the International Supercomputer Conference in Frankfurt on Thursday, June 23. The papers cover the general optimization strategy and applications EMGEO, MFDN, Chroma/QPhiX, WARP and BerkeleyGW. Details on these applications can be found in the case studies area of the NERSC website.

“Application readiness efforts are critical for enabling ground-breaking science on our HPC systems as we move toward exascale. For the past year we’ve been working with these 20 teams to optimize their codes for Cori so that when the machine arrives they are ready to take advantage of the many capabilities the new hardware offers,” said Jack Deslippe, head of NERSC’s Application Performance Group. “As the primary computing center for DOE’s Office of Science, we have an understanding of a broad user base utilizing over 600 apps at NERSC, as well as strong working relationships with Cray and Intel. This puts us in a unique position to provide a venue for computational scientists to engage industry experts around application optimization and to come up with optimization strategies that scale to the wider HPC community.”

Under NESAP, a member of NERSC’s Application Readiness team assists the application teams with code profiling and optimization. Team members have also held a series of “dungeon sessions” with Intel and Cray engineers to optimize the codes. The resulting optimizations have been tested using nine Xeon Phi processor nodes installed at NERSC.

“Optimization is not always a straightforward process, so we’ve set up a system to help keep users from getting lost in the weeds,” Deslippe said. “A number of the applications are ready now and we’re making progress on the others. This process we’ve set up can be used by nearly all of the 600 projects running at NERSC.”

NERSC frames the optimization process around the roofline performance model developed at Berkeley Lab. This sets expectations for what performance a developer can expect from their algorithm and which features of the Xeon Phi processor they should target:

  • Thread/process scaling across the 68 cores and 272 hardware threads available on each KNL processor
  • Vector parallelism which enables 32 flops per every cycle across two VPUs (vector processing units) on each core
  • 16GB of on chip MCDRAM (volatile memory) that can be used to accelerate memory bandwidth sensitive applications

Over the last year, NERSC has been gaining experience exploiting the Xeon Phi hardware. NERSC is a central point for collating this experience and sharing it with the user community. In addition to NERSC staff expertise, other resources available to the community include:

  • Training sessions that include multi-day hands-on sessions as well as shorter online presentations
  • A series of case studies for the codes being optimized and ported
  • A library of tools to support optimization and porting

Different applications tend to require different optimization approaches. Though NERSC staff only recently gained access to Knight’s Landing processors, a number of optimized applications are already projected to perform well on the Cori system; the QCD code Chroma, for example, is expected to perform twice as well on the Cori Phase 2 compared to the Haswell-based Phase 1 system.

Here is an overview of two of the applications that have been optimized for the Knights Landing architecture:

WARP: One example where NESAP efforts have paid off is in the accelerator modeling application, WARP, which uses a Particle In Cell (PIC) approach utilizing the PICsar mini-app/library. WARP developers at Berkeley Lab worked closely with NERSC staff as well as engineers at Cray and Intel for over a year to optimize the application targeting the Cori Phase 2 system. Improvements included algorithmic changes to increase data-reuse in cache and MCDRAM on Knights Landing. Activity peaked in a “dungeon session” aimed at improving the effective use of the large KNL vector units within PICsar and has been accelerated due to efforts by NESAP post-doc Mathieu Lobet. Without optimization, PICsar performance on Cori Phase 2 nodes was expected to lag behind the Intel Haswell-based Phase 1 nodes by a factor of 3. With optimization, Cori Phase 2 nodes are projected to out-perform Phase 2 nodes for the key sparse-matrix math kernels in MFDN, with more optimization in the works.

MFDn: The MFDn (Many Fermion Dynamics) application, led by James Vary of Iowa State University, is used for computing the nuclear structure for a number isotopes of interest and predicting nuclear reaction rates and cross sections. The targeted use case for MFDN requires using all available memory on the utilized nodes (expected to be a significant fraction of the Cori system). This requires either the use of the KNL MCDRAM as a cache or the explicit management of data on the MCDRAM via “FASTMEM” directives. With the help of NERSC staff member Brandon Cook and Cray and Intel staff, MFDN developers were able to beat the performance of KNL cache and achieve a 60 percent performance advantage on Cori Phase 2 compared to Haswell based Cori Phase 1 for critical sparse matrix math steps in their algorithm.

For complete details and a number of other Cori Phase 2 optimization examples, read more optimization case studies.


Source: NERSC

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

What’s After Exascale? The Internet of Workflows Says HPE’s Nicolas Dubé

July 29, 2021

With the race to exascale computing in its final leg, it’s natural to wonder what the Post Exascale Era will look like. Nicolas Dubé, VP and chief technologist for HPE’s HPC business unit, agrees and shared his vision at Supercomputing Frontiers Europe 2021 held last week. The next big thing, he told the virtual audience at SFE21, is something that will connect HPC and (broadly) all of IT – into what Dubé calls The Internet of Workflows. Read more…

How UK Scientists Developed Transformative, HPC-Powered Coronavirus Sequencing System

July 29, 2021

In November 2020, the COVID-19 Genomics UK Consortium (COG-UK) won the HPCwire Readers’ Choice Award for Best HPC Collaboration for its CLIMB-COVID sequencing project. Launched in March 2020, CLIMB-COVID has now resulted in the sequencing of over 675,000 coronavirus genomes – an increasingly critical task as variants like Delta threaten the tenuous prospect of a return to normalcy in much of the world. Read more…

KAUST Leverages Mixed Precision for Geospatial Data

July 28, 2021

For many computationally intensive tasks, exacting precision is not necessary for every step of the entire task to obtain a suitably precise result. The alternative is mixed-precision computing: using high precision wher Read more…

Oak Ridge Supercomputer Enables Next-Gen Jet Turbine Research

July 27, 2021

Air travel is notoriously carbon-inefficient, with many airlines going as far as to offer purchasable carbon offsets to ease the guilt over large-footprint travel. But even over just the last decade, major aircraft model Read more…

IBM and University of Tokyo Roll Out Quantum System One in Japan

July 27, 2021

IBM and the University of Tokyo today unveiled an IBM Quantum System One as part of the IBM-Japan quantum program announced in 2019. The system is the second IBM Quantum System One assembled outside the U.S. and follows Read more…

AWS Solution Channel

Data compression with increased performance and lower costs

Many customers associate a performance cost with data compression, but that’s not the case with Amazon FSx for Lustre. With FSx for Lustre, data compression reduces storage costs and increases aggregate file system throughput. Read more…

Intel Unveils New Node Names; Sapphire Rapids Is Now an ‘Intel 7’ CPU

July 27, 2021

What's a preeminent chip company to do when its process node technology lags the competition by (roughly) one generation, but outmoded naming conventions make it seem like it's two nodes behind? For Intel, the response was to change how it refers to its nodes with the aim of better reflecting its positioning within the leadership semiconductor manufacturing space. Intel revealed its new node nomenclature, and... Read more…

What’s After Exascale? The Internet of Workflows Says HPE’s Nicolas Dubé

July 29, 2021

With the race to exascale computing in its final leg, it’s natural to wonder what the Post Exascale Era will look like. Nicolas Dubé, VP and chief technologist for HPE’s HPC business unit, agrees and shared his vision at Supercomputing Frontiers Europe 2021 held last week. The next big thing, he told the virtual audience at SFE21, is something that will connect HPC and (broadly) all of IT – into what Dubé calls The Internet of Workflows. Read more…

How UK Scientists Developed Transformative, HPC-Powered Coronavirus Sequencing System

July 29, 2021

In November 2020, the COVID-19 Genomics UK Consortium (COG-UK) won the HPCwire Readers’ Choice Award for Best HPC Collaboration for its CLIMB-COVID sequencing project. Launched in March 2020, CLIMB-COVID has now resulted in the sequencing of over 675,000 coronavirus genomes – an increasingly critical task as variants like Delta threaten the tenuous prospect of a return to normalcy in much of the world. Read more…

IBM and University of Tokyo Roll Out Quantum System One in Japan

July 27, 2021

IBM and the University of Tokyo today unveiled an IBM Quantum System One as part of the IBM-Japan quantum program announced in 2019. The system is the second IB Read more…

Intel Unveils New Node Names; Sapphire Rapids Is Now an ‘Intel 7’ CPU

July 27, 2021

What's a preeminent chip company to do when its process node technology lags the competition by (roughly) one generation, but outmoded naming conventions make it seem like it's two nodes behind? For Intel, the response was to change how it refers to its nodes with the aim of better reflecting its positioning within the leadership semiconductor manufacturing space. Intel revealed its new node nomenclature, and... Read more…

Will Approximation Drive Post-Moore’s Law HPC Gains?

July 26, 2021

“Hardware-based improvements are going to get more and more difficult,” said Neil Thompson, an innovation scholar at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL). “I think that’s something that this crowd will probably, actually, be already familiar with.” Thompson, speaking... Read more…

With New Owner and New Roadmap, an Independent Omni-Path Is Staging a Comeback

July 23, 2021

Put on a shelf by Intel in 2019, Omni-Path faced a uncertain future, but under new custodian Cornelis Networks, OmniPath is looking to make a comeback as an independent high-performance interconnect solution. A "significant refresh" – called Omni-Path Express – is coming later this year according to the company. Cornelis Networks formed last September as a spinout of Intel's Omni-Path division. Read more…

Chameleon’s HPC Testbed Sharpens Its Edge, Presses ‘Replay’

July 22, 2021

“One way of saying what I do for a living is to say that I develop scientific instruments,” said Kate Keahey, a senior fellow at the University of Chicago a Read more…

Summer Reading: “High-Performance Computing Is at an Inflection Point”

July 21, 2021

At last month’s 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART), a group of researchers led by Martin Schulz of the Leibniz Supercomputing Center (Munich) presented a “position paper” in which they argue HPC architectural landscape... Read more…

AMD Chipmaker TSMC to Use AMD Chips for Chipmaking

May 8, 2021

TSMC has tapped AMD to support its major manufacturing and R&D workloads. AMD will provide its Epyc Rome 7702P CPUs – with 64 cores operating at a base cl Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

Berkeley Lab Debuts Perlmutter, World’s Fastest AI Supercomputer

May 27, 2021

A ribbon-cutting ceremony held virtually at Berkeley Lab's National Energy Research Scientific Computing Center (NERSC) today marked the official launch of Perlmutter – aka NERSC-9 – the GPU-accelerated supercomputer built by HPE in partnership with Nvidia and AMD. Read more…

Ahead of ‘Dojo,’ Tesla Reveals Its Massive Precursor Supercomputer

June 22, 2021

In spring 2019, Tesla made cryptic reference to a project called Dojo, a “super-powerful training computer” for video data processing. Then, in summer 2020, Tesla CEO Elon Musk tweeted: “Tesla is developing a [neural network] training computer called Dojo to process truly vast amounts of video data. It’s a beast! … A truly useful exaflop at de facto FP32.” Read more…

Google Launches TPU v4 AI Chips

May 20, 2021

Google CEO Sundar Pichai spoke for only one minute and 42 seconds about the company’s latest TPU v4 Tensor Processing Units during his keynote at the Google I Read more…

CentOS Replacement Rocky Linux Is Now in GA and Under Independent Control

June 21, 2021

The Rocky Enterprise Software Foundation (RESF) is announcing the general availability of Rocky Linux, release 8.4, designed as a drop-in replacement for the soon-to-be discontinued CentOS. The GA release is launching six-and-a-half months after Red Hat deprecated its support for the widely popular, free CentOS server operating system. The Rocky Linux development effort... Read more…

CERN Is Betting Big on Exascale

April 1, 2021

The European Organization for Nuclear Research (CERN) involves 23 countries, 15,000 researchers, billions of dollars a year, and the biggest machine in the worl Read more…

Iran Gains HPC Capabilities with Launch of ‘Simorgh’ Supercomputer

May 18, 2021

Iran is said to be developing domestic supercomputing technology to advance the processing of scientific, economic, political and military data, and to strengthen the nation’s position in the age of AI and big data. On Sunday, Iran unveiled the Simorgh supercomputer, which will deliver.... Read more…

Leading Solution Providers

Contributors

HPE Launches Storage Line Loaded with IBM’s Spectrum Scale File System

April 6, 2021

HPE today launched a new family of storage solutions bundled with IBM’s Spectrum Scale Erasure Code Edition parallel file system (description below) and featu Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

GTC21: Nvidia Launches cuQuantum; Dips a Toe in Quantum Computing

April 13, 2021

Yesterday Nvidia officially dipped a toe into quantum computing with the launch of cuQuantum SDK, a development platform for simulating quantum circuits on GPU-accelerated systems. As Nvidia CEO Jensen Huang emphasized in his keynote, Nvidia doesn’t plan to build... Read more…

Microsoft to Provide World’s Most Powerful Weather & Climate Supercomputer for UK’s Met Office

April 22, 2021

More than 14 months ago, the UK government announced plans to invest £1.2 billion ($1.56 billion) into weather and climate supercomputing, including procuremen Read more…

Quantum Roundup: IBM, Rigetti, Phasecraft, Oxford QC, China, and More

July 13, 2021

IBM yesterday announced a proof for a quantum ML algorithm. A week ago, it unveiled a new topology for its quantum processors. Last Friday, the Technical Univer Read more…

Q&A with Jim Keller, CTO of Tenstorrent, and an HPCwire Person to Watch in 2021

April 22, 2021

As part of our HPCwire Person to Watch series, we are happy to present our interview with Jim Keller, president and chief technology officer of Tenstorrent. One of the top chip architects of our time, Keller has had an impactful career. Read more…

Senate Debate on Bill to Remake NSF – the Endless Frontier Act – Begins

May 18, 2021

The U.S. Senate today opened floor debate on the Endless Frontier Act which seeks to remake and expand the National Science Foundation by creating a technology Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire