A Sneak Peek at the Next-Gen Exascale Operating System
There are several scattered pieces in the exascale software stack being developed and clicked together worldwide. Central to that jigsaw effort is the eventual operating system to power such machines.
This week the Department of Energy snapped in a $9.75 million investment to help round out the picture of what such as OS will look like. The grant went to Argonne National Lab for a multi-institutional project (including Pacific Northwest and Lawrence Livermore labs, as well as other universities) aimed at developing a prototype exascale operating system and associated runtime software.
To better understand“Argo”, the exascale OS effort, we spoke with Pete Beckman, Director of the Exascale Technology and Computing Institute and chief architect of the Argo project. Beckman says that as we look forward to the features of these ultra-scale machines, power management, massive concurrency and heterogeneity, as well as overall resiliency, can all be addressed at the OS level.
These are not unfamiliar concerns, but attacking them at the operating system lends certain benefits, argues Beckman. For instance, fine-tuning power control and management at the core operational and workload level becomes possible with a pared-down, purpose-built and HPC-optimized OS.
Outside of power, the team describes the “allowance for massive concurrency, [met by] a hierarchical framework for power and fault management, as well as a “beacon” mechanism that allows resource managers and optimizers to communicate and control the platform.
Beckman and team describe this hierarchy as an “enclave”–in this model the OS is more hierarchical in nature than we traditionally think of it as. In other words, it’s easy to think of a node-level OS–with Argo, there is a global OS that runs across the machine. This, combined with the platform-neutral design of Argo, will make it flexible enough to change with architectures and manageable at both a system and workload level–all packaged in familiar Linux wrappings.
As shown above, these “enclaves,” are defined as a set of resources dedicated to a particular service, and capable of introspection and autonomic response. As Argonne describes, “They can shape-shift the system configuration of nodes and the allocation of power to different nodes or to migrate data or computations from one node to another.” On the reliability front, the enclaves that tackle failure can do so “by means of global restart and other enclaves supporting finer-level recovery.”
The recognizable Linux core of Argo will have been enhanced and modified to meet the needs of more dynamic, next-gen applications. While development on those prototype applications are ongoing, Beckman and the distributed team plan to test Argo’s ability to dive into a host of common HPC applications. Again, all of this will be Linux-flavored, but with an HPC shell that narrows the focus on the problems at hand.
As a side note, leveraging the positive elements of Linux and building into it a robustness and eye on taking on critical power management, concurrency and resiliency features seems like a good idea. If the trend holds, Linux itself will continue to enjoy the lion’s share (by far–96% according to reporting from yesterday) of the OS market on the Top500.
It’s more about refining the role of Linux versus rebuilding it, Beckman explains. While Linux currently is tasked with managing a multi-user, multi-program balancing act with its resources, doling them out fairly, the Argo approach would hone in on the parts of code that need to blaze–wicking away some of the resource balancing functions. “We can rewrite some of those pieces and design runtime systems that are specifically adapted to run those bits of code fast and not try to deal with the balancing of many users ad many programs.”
The idea is to have part of the chip be capable of running the Linux kernel for the basics; things like control systems, booting, command and interface functions, debugging and the like–but as Beckman says, “for the HPC part, we can specialize and have a special component that lives in the chip.”
In that case, there are some middleware pieces that are hidden inside Argo. Beckman said that that software will move closer to the OS. And just as that happens, more software will be tied to the chips–whatever those happen to look like in the far-flung future (let’s say 2020 to be fair). This is all Argonne’s domain–they’ve been one of the leading labs that have worked on the marriage between processor and file systems, runtimes, message passing and other software workings. Beckman expects a merging between many lines–middleware and OS, and of course, both of those with the processor.
“Bringing together these multiple views and the corresponding software components through a whole-system approach distinguishes our strategy from existing designs,” said Beckman. “We believe it is essential for addressing the key exascale challenges of power, parallelism, memory hierarchy, and resilience.”
As of now, numerous groups are working on various pieces of the power puzzle in particular. This is an especially important issue going forward. Although power consumption has always been a concern, Beckman says that the approach now is to optimize systems in advance, “turn the power on, and accept that they will draw what they’re going to draw.” In addition to the other work being done inside the stack to create efficient supers, there is a role for the operating system to play in orchestrating “smart” use of power for certain parts of the computation or parts of the machine.
Feeds by Topic
- Developer Tools
Feeds by Industry
October 1, 2014
- Carnegie Mellon Leads New NSF Project to Improve Learning
- Silicon Mechanics to Sponsor Massachusetts Green Team in Student Cluster Competition
- Penguin Computing Unveils New Artica 3200xlp Switch
- Altera and ARM Expand Partnership for SoC Development Tools
- NICE Software Offerings to be Hosted in the UberCloud Marketplace
- RIKEN Selects Fujitsu to Develop New Supercomputer
- Lenovo Completes Initial Closing for Acquisition of IBM’s x86 Server Business
- Iowa State University to Utilize Cori Supercomputer for Research
- Projects Headed by UW Researchers Selected for Supercomputer Use
September 30, 2014
- NoMachine Technology Helps Promote Research at Tokamak Facilities
- Altera Announces Availability of MAX 10 FPGAs
- SC14 to Deliver Diverse Lineup of Invited Talks
- STFC Selects Mellanox Ethernet Technology to Upgrade JASMIN Cluster
- CU-Boulder Receives HPC Offering from Ace Computers
- ARM Cortex-A57-Based Hadoop Demonstration Achieved on AMD Opteron A-Series
September 29, 2014
- Lenovo Set to Close Acquisition of IBM’s x86 Server Business
- ISC 2015 Now Open for Submissions and Other Participation Opportunities
- Mellanox InfiniBand Offerings Deployed by Yahoo! Japan
- Cray Awarded $26M Supercomputer Contract from the DoD HPCMP
- Bull Unveils New Range of Enterprise Servers
Most Read Features
- Intel ‘Haswell’ Xeon E5s Aimed Squarely at HPC
- Will 2015 Be the Year of the FPGA?
- CPUs Outperform GPUs in Financial Markets Benchmark
- What’s Still Missing for HPC Users in Manufacturing
- A Return to Roots for New Cray CTO
- New Haswells Spark Open Season for HPC Systems
- CEO of HPC’s Silent Giant Offers Outlook
- Cray Strikes Balance with Next-Generation XC40 Supercomputer
- DDN’s IME Software Scales I/O Performance on the Rocky Road to Exascale
- Moonshot Moves HPC Closer to ARM’s Reach
- More Features…
Most Read Short Takes
- A New Era in HPC
- Myths and Misconceptions about HPC for Engineering
- House Bill Takes Aim at Exascale
- Europe Launches Ultrascale Computing Initiative
- Google Expands Quantum Computing Program
- GPUs Advance Deep Learning
- Haswell-EP Performance Deep Dive
- NERSC Reveals 44 NESAP Code Teams
- Aerospace Explores the Compute Horizon
- NERSC Highlights Exascale-Energy Connection
- More Short Takes…
Most Read Off The Wire
- HPC Wales Awarded ISO 9001 Certification
- Steve Scott Rejoins Cray as Senior Vice President and CTO
- Top Supercomputers of India List Released
- Fujitsu to Provide KAU with a New HPC System
- HPC 2015 Issues Call for Papers
- Next-Generation Cray Supercomputers to Include Intel Xeon Processor E5-2600 v3 Product Family
- Bill Blake Joins D-Wave Systems
- HP to Acquire Eucalyptus
- Satoshi Matsuoka Recognized with 2014 Sidney Fernbach Award
- Seagate Announces Availability of ClusterStor 9000
- More Off The Wire…
- Read more…
- Read more…
- Read more…
- Read more…
- Read more…
- Read more…
- 9/26/14 | Altair | The University of Nottingham chose Altair’s PBS Works suite as the integrated workload management solution for their HPC system comprising over Read more…
- 9/4/14 | Extreme Networks | When it comes to networking, switches and routers are breaking free of their physical constraints to create readily reconfigurable software Read more…
- Read more…
- Read more…