A Sneak Peek at the Next-Gen Exascale Operating System
There are several scattered pieces in the exascale software stack being developed and clicked together worldwide. Central to that jigsaw effort is the eventual operating system to power such machines.
This week the Department of Energy snapped in a $9.75 million investment to help round out the picture of what such as OS will look like. The grant went to Argonne National Lab for a multi-institutional project (including Pacific Northwest and Lawrence Livermore labs, as well as other universities) aimed at developing a prototype exascale operating system and associated runtime software.
To better understand“Argo”, the exascale OS effort, we spoke with Pete Beckman, Director of the Exascale Technology and Computing Institute and chief architect of the Argo project. Beckman says that as we look forward to the features of these ultra-scale machines, power management, massive concurrency and heterogeneity, as well as overall resiliency, can all be addressed at the OS level.
These are not unfamiliar concerns, but attacking them at the operating system lends certain benefits, argues Beckman. For instance, fine-tuning power control and management at the core operational and workload level becomes possible with a pared-down, purpose-built and HPC-optimized OS.
Outside of power, the team describes the “allowance for massive concurrency, [met by] a hierarchical framework for power and fault management, as well as a “beacon” mechanism that allows resource managers and optimizers to communicate and control the platform.
Beckman and team describe this hierarchy as an “enclave”–in this model the OS is more hierarchical in nature than we traditionally think of it as. In other words, it’s easy to think of a node-level OS–with Argo, there is a global OS that runs across the machine. This, combined with the platform-neutral design of Argo, will make it flexible enough to change with architectures and manageable at both a system and workload level–all packaged in familiar Linux wrappings.
As shown above, these “enclaves,” are defined as a set of resources dedicated to a particular service, and capable of introspection and autonomic response. As Argonne describes, “They can shape-shift the system configuration of nodes and the allocation of power to different nodes or to migrate data or computations from one node to another.” On the reliability front, the enclaves that tackle failure can do so “by means of global restart and other enclaves supporting finer-level recovery.”
The recognizable Linux core of Argo will have been enhanced and modified to meet the needs of more dynamic, next-gen applications. While development on those prototype applications are ongoing, Beckman and the distributed team plan to test Argo’s ability to dive into a host of common HPC applications. Again, all of this will be Linux-flavored, but with an HPC shell that narrows the focus on the problems at hand.
As a side note, leveraging the positive elements of Linux and building into it a robustness and eye on taking on critical power management, concurrency and resiliency features seems like a good idea. If the trend holds, Linux itself will continue to enjoy the lion’s share (by far–96% according to reporting from yesterday) of the OS market on the Top500.
It’s more about refining the role of Linux versus rebuilding it, Beckman explains. While Linux currently is tasked with managing a multi-user, multi-program balancing act with its resources, doling them out fairly, the Argo approach would hone in on the parts of code that need to blaze–wicking away some of the resource balancing functions. “We can rewrite some of those pieces and design runtime systems that are specifically adapted to run those bits of code fast and not try to deal with the balancing of many users ad many programs.”
The idea is to have part of the chip be capable of running the Linux kernel for the basics; things like control systems, booting, command and interface functions, debugging and the like–but as Beckman says, “for the HPC part, we can specialize and have a special component that lives in the chip.”
In that case, there are some middleware pieces that are hidden inside Argo. Beckman said that that software will move closer to the OS. And just as that happens, more software will be tied to the chips–whatever those happen to look like in the far-flung future (let’s say 2020 to be fair). This is all Argonne’s domain–they’ve been one of the leading labs that have worked on the marriage between processor and file systems, runtimes, message passing and other software workings. Beckman expects a merging between many lines–middleware and OS, and of course, both of those with the processor.
“Bringing together these multiple views and the corresponding software components through a whole-system approach distinguishes our strategy from existing designs,” said Beckman. “We believe it is essential for addressing the key exascale challenges of power, parallelism, memory hierarchy, and resilience.”
As of now, numerous groups are working on various pieces of the power puzzle in particular. This is an especially important issue going forward. Although power consumption has always been a concern, Beckman says that the approach now is to optimize systems in advance, “turn the power on, and accept that they will draw what they’re going to draw.” In addition to the other work being done inside the stack to create efficient supers, there is a role for the operating system to play in orchestrating “smart” use of power for certain parts of the computation or parts of the machine.
- Click to share on Twitter (Opens in new window)
- Share on Facebook (Opens in new window)
- Click to share on Google+ (Opens in new window)
- Click to share on Pocket (Opens in new window)
- Click to share on Reddit (Opens in new window)
- Click to share on Pinterest (Opens in new window)
- Click to share on Tumblr (Opens in new window)
- Click to share on StumbleUpon (Opens in new window)
Feeds by Topic
- Developer Tools
Feeds by Industry
July 2, 2015
- ORNL Researchers Named Corporate Fellows
- ALCC Awards 24 Projects 1.7 Billion Core Hours at ALCF
- AMD and Dell Support Bioinformatics Studies at University of Warsaw in Poland
- Blue Waters Simulations Suggest There Are Fewer Faint Galaxies Than Expected
- Mellanox to Release Second Quarter 2015 Financial Results on July 22
July 1, 2015
- Version 7.1 of Bright Cluster Manager to be Released at ISC
- Agenda Announced for PBS Works User Group Meeting
- Bright Computing Welcomes PC Link Egypt to Partner Network
- GLOBALFOUNDRIES Completes Acquisition of IBM Microelectronics Business
- CoolIT Systems Reaches 50th Patent Milestone
- Altera Develops Storage Reference Design Based on Arria SoCs
- IBM Joins iRODS Consortium
June 30, 2015
- Japan Atomic Energy Agency Chooses SGI ICE X for New Supercomputer System
- SuperMUC Phase 2 Officially Inaugurated
- Peter Coveney to Keynote ISC Cloud & Big Data Conference
- DDN’s Latest WOS Object Storage Platform Available
- Researchers Use XSEDE and TACC Resources to Calculate Future Damage From Hurricanes
- NCSA Debuts “Solar Superstorms” Documentary
June 29, 2015
Most Read Features
- IBM Power8 Outperforms x86 on Financial Benchmarks
- HP Removes Memristors from Its ‘Machine’ Roadmap Until Further Notice
- HP Launches HPC & Big Data Global Business Unit
- Japan Preps for HPC-Big Data Convergence
- Fixating on Exascale Performance Only Is a Bad Idea
- IBM, NVIDIA and Mellanox Launch Design Center for HPC
- Tracking the Trajectory to Exascale and Beyond
- Four Challenges Facing Exascale Application Prep
- Is IBM Getting Openness Right? Yes, Says GM Doug Balog
- New Sandia Director Jill Hruby Transcends Glass Ceiling
- More Features…
Most Read Short Takes
- CPU Benchmarking: Haswell Versus POWER8
- Computer Model Explains Mystery of Missing Malaysian Airlines Flight
- A Road to Practical Extreme Scale CFD?
- The Necessary Marriage of Big Data with Exascale
- DDN, IBM Lead Large HPC Storage Supplier Pack
- Champagne Ushers In New Year of Simulations for RIKEN
- Senate and House America COMPETES Bills Face Off
- Unlocking the Mysteries of Space
- Sizing Up Earth’s Biosphere Supercomputer
- Graphene Promises Ultimate On-Chip Interconnect Scaling
- More Short Takes…
Most Read Off The Wire
- D-Wave Systems Breaks 1000 Qubit Quantum Computing Barrier
- Top Supercomputers of India List Released
- Dell Unveils the PowerEdge C6320
- New Mellanox Performance Benchmarks Released
- Japan’s Renowned HPC Expert to Chair ISC Program in 2016
- IBM Introduces SuperVessel
- Mellanox Introduces 25/100 Gigabit Open Ethernet-Based Switch
- Innovative Architecture Brings Mont-Blanc One Step Further to Exascale
- RWTH Aachen to Receive 22 Million Euros for Supercomputer
- Stanford Engineers Team With ARL to Set Computational Record
- More Off The Wire…
- Read more…
- Read more…
- Read more…
- Read more…
- Read more…
- Read more…
- 3/24/15 | Altair, Bright Computing, EMC, Numascale, and Platfora | This 49-page in-depth report takes a look at how two very different industries are scaling familiar advanced computing concepts to new heights, Read more…
- 3/3/15 | Anita Borg Institute | Download the new Anita Borg Institute paper highlighting recommendations to retain women in computing. The report examines peer-reviewed academic Read more…
- Read more…
- Read more…
HPC Job Bank
There are no upcoming events at this time.