Seagate-led SAGE Project Delivers Update on Exascale Goals

By John Russell

November 29, 2016

Roughly a year and a half after its launch, the SAGE exascale storage project led by Seagate has delivered a substantive interim report – Data Storage for Extreme Scale. It outlines technical details of progress to date and architectural plans moving forward. Of particular note is progress on co-design for use cases and applications expected to benefit most from exascale. There’s also been a fair amount of work to be able to accommodate big data and traditional HPC workflows in the same environment.

“We’ve tried to give ourselves lofty goals,” said Malcolm Muggeridge, senior engineering director at Seagate based in the U.K. who is leading the initiative. “We would like to become the platform of choice in exascale for storage solutions and will have the technology addressing that space in the 2022 timeframe. The main piece of work that has been completed [so far] is co-design activities.”

You may recall that SAGE (StorAGe for Exascale Data Centric Computing (SAGE) system aims to implement a Big Data/Extreme Computing (BDEC) and High Performance Data Analytics (HPDA) capable infrastructure suitable for Extreme scales – including Exascale and beyond. SAGE is one of 15 projects recently funded under Horizon 2020. Direct funding is actually through the European Technology Platforms (ETP) organization – “industry-led stakeholder groups recognized by the European Commission as key actors in driving innovation, knowledge transfer and European competitiveness. ETPs develop research and innovation agendas and roadmaps for action at EU and national level to be supported by both private and public funding.”

sage-seagate-architectureThe new white paper is a fairly extensive document that follows a nine-month formal project review last June and includes work completed since. Among the topics covered are: platform requirements; systems architecture; platform components; and ecosystem elements. Launched in September of 2015, SAGE tackles eight research areas: “the study of the 1) application use cases co-designing solutions to address 2) Percipient Storage Methods, 3) Advanced Object Storage, and 4) tools for I/O optimization, supporting 5) next generation storage media and developing a supporting ecosystem of 6) Extreme Data Management, 7) Programming techniques and 8) Extreme Data Analysis tools.”

According to the report, the SAGE storage system will be capable of efficiently storing and retrieving immense volumes of data at extreme scales, with the added functionality of “percipience” or the ability to accept and perform user defined computations integral to the storage system. SAGE will be built around the Mero object storage software platform and its supporting ecosystem of tools and techniques, that will work together to provide the required functionalities and scaling desired by extreme scale workflows.

One important goal is accommodating new storage technologies, such as non-volatile RAM (NVRAM). Leveraging object storage to assist ‘in-memory, closer-to-memory” computing is another. In an earlier interview Sai Narasimhamurthy, Seagate research staff engineer responsible for coordinating the technical work, told HPCwire that the stack would “have memory at the top, various NVRAM technologies in the middle, of course you have your flash technology as well as part of the stack, and then you have scratch disks and then archival disks.”

“You could have an object, or a piece of it, lying in high speed memory, a piece of it in NVRAM, and a piece of the object lying in scratch based upon the usage profile of the object,” explained Narasimhamurthy. “The view of the object is transparent to the application, it’s just I0 to an object, but on the back end you could have various types of layout which could be very interesting because you could optimize your layout for performance or for resiliency. You could do all sorts of things.”

sage-seagate-codesignClearly there are big goals for the project. Co-design is a critical early element in defining functional requirements, emphasized Muggeridge, “We have carefully selected use cases that reflect these data-centric applications. The use cases provide specific inputs that are designed to fine tune/modify the framework for the SAGE architecture.”

Muggeridge noted there is range of requirements drivers. The report calls out: inputs from the BDEC community and the US Department of Energy labs; data needs for big science, as exemplified by the Square Kilometer Array and the Human Brain Project; and Extreme scale I/O requirements drafted by the ETP; and extreme scale data needs highlighted by the HPDA community. The information was gathered mostly through workshops.

Top-level objectives have also been established and are largely familiar. One calls for the ability “to store and retrieve extreme volumes of data approaching orders of ~Exabyte for a given problem”. Another is the ability to manage workflows that include data from simulations and instruments. Not surprisingly, data IO rates, data integrity, data analytics, among other capabilities are being targeted. Indeed the first part of the project has been largely ‘definitional’ with a roll out of demonstrations planned for the next year.

Use of co-design principles to inform these objectives is a distinguishing feature of the project. SAGE has selected several use cases (applications) and spelled out in detail the parameters being measured. Use cases “cover a broad range of domains, including data from some of the world’s largest scientific experiments (including one of the world’s largest nuclear fusion facilities and one of the largest synchrotrons in Europe), aside from extremely data-centric HPC codes.” Below is a table with the uses cases selected.

sage-seagate-use-cases

So far, SAGE has gathered the first formal list of inputs from all of the specified use cases. “This phase included gathering inputs on formal I/O characterization, SAGE architecture analysis, data retention characterization and data scaling analysis, which was an analytical study of how data and I/O requirements of the use cases would scale on a future basis.”

sage-seagate-metrics

The SAGE system is built on multiple tiers of storage device hardware technology (see figure below). SAGE does not require a specific type of storage device technology, but typically it would include at least one NVRAM tier (Intel 3DxPoint technology is a strong contender at the moment), at least one flash tier and at least one disk tier. Together, these tiers are housed in standard form-factor enclosures and provide their own compute capability, enabled by standard x86 embedded processing components. Moving up the system stack, compute capability increases for faster, lower latency devices.

Mero, the object storage software first developed by Xyratex and now being extended by Seagate, is layered on top of this hardware stack, providing fundamental management of object I/O and storage across tiers. Essentially, Mero forms the core of the SAGE system. Mero is presented to users through the Clovis API. Everything above Clovis forms the SAGE ecosystem components.

sage-seagate-system-stack

Much remains to be done but it seems as if SAGE is making steady progress. Demonstrations, some at the Julich Supercomputing Centre, are expected over the next year or so. This latest paper is best read in full for current technical details of SAGE plans.

Link to new SAGE paper (Data Storage for Extreme Scale): http://sagestorage.eu/sites/default/files/Sage%20White%20Paper%20v1.0.pdf

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Democratization of HPC Part 3: Ninth Graders Tap HPC in the Cloud to Design Flying Boats

October 18, 2018

This is the third in a series of articles demonstrating the growing acceptance of high-performance computing (HPC) in new user communities and application areas. In this article we present UberCloud use case #208 on how Read more…

By Wolfgang Gentzsch and Håkon Bull Hove

Penguin Computing Launches Consultancy for Piecing AI Strategies Together

October 18, 2018

AI stands before the HPC industry as a beacon of great expectations, yet market research repeatedly shows that AI adoption is commonly stuck in the talking phase, on the near side of a difficult chasm to cross. In respon Read more…

By Tiffany Trader

When Water Quality—Not Quantity—Hinders HPC Cooling

October 18, 2018

Attention has been paid to the sheer quantity of water consumed by supercomputers’ cooling towers – and rightly so, as they can require thousands of gallons per minute to cool. But in the background, another factor can emerge, bottlenecking efficiency and raising costs: water quality. Read more…

By Oliver Peckham

HPE Extreme Performance Solutions

One Small Step Toward Mars: One Giant Leap for Supercomputing

Since the days of the Space Race between the U.S. and the former Soviet Union, we have continually sought ways to perform experiments in space. Read more…

IBM Accelerated Insights

Paper Offers ‘Proof’ of Quantum Advantage on Some Problems

October 18, 2018

Is quantum computing worth all the effort being poured into it or should we just wait for classical computing to catch up? An IBM blog today posed those questions and, you won’t be surprised, offers a firm “it’s wo Read more…

By John Russell

Penguin Computing Launches Consultancy for Piecing AI Strategies Together

October 18, 2018

AI stands before the HPC industry as a beacon of great expectations, yet market research repeatedly shows that AI adoption is commonly stuck in the talking phas Read more…

By Tiffany Trader

When Water Quality—Not Quantity—Hinders HPC Cooling

October 18, 2018

Attention has been paid to the sheer quantity of water consumed by supercomputers’ cooling towers – and rightly so, as they can require thousands of gallons per minute to cool. But in the background, another factor can emerge, bottlenecking efficiency and raising costs: water quality. Read more…

By Oliver Peckham

Paper Offers ‘Proof’ of Quantum Advantage on Some Problems

October 18, 2018

Is quantum computing worth all the effort being poured into it or should we just wait for classical computing to catch up? An IBM blog today posed those questio Read more…

By John Russell

Dell EMC to Supply U Michigan’s Great Lakes Cluster

October 16, 2018

The University of Michigan (U-M) today announced Dell EMC is the lead vendor for U-M’s $4.8 million Great Lakes HPC cluster scheduled for deployment in first Read more…

By John Russell

Houston to Field Massive, ‘Geophysically Configured’ Cloud Supercomputer

October 11, 2018

Based on some news stories out today, one might get the impression that the next system to crack number one on the Top500 would be an industrial oil and gas mon Read more…

By Tiffany Trader

Nvidia Platform Pushes GPUs into Machine Learning, High Performance Data Analytics

October 10, 2018

GPU leader Nvidia, generally associated with deep learning, autonomous vehicles and other higher-end enterprise and scientific workloads (and gaming, of course) Read more…

By Doug Black

Federal Investment in Exascale – What It Really Means

October 10, 2018

Earlier this month, the EuroHPC JU (Joint Undertaking) reached critical mass, and it seems all EU and affiliated member states, bar the UK (unsurprisingly), have or will sign on. The EuroHPC JU was born from a recognition that individual EU member states, and the EU as a whole, were significantly underinvesting in HPC compared to the US, China and Japan, who all have their own exascale investment and delivery strategies (NSCI, 13th 5 Year Plan, Post-K, etc). Read more…

By Dairsie Latimer

NERSC-9 Clues Found in NERSC 2017 Annual Report

October 8, 2018

If you’re eager to find out who’ll supply NERSC’s next-gen supercomputer, codenamed NERSC-9, here’s a project update to tide you over until the winning bid and system details are revealed. The upcoming system is referenced several times in the recently published 2017 NERSC annual report. Read more…

By Tiffany Trader

TACC Wins Next NSF-funded Major Supercomputer

July 30, 2018

The Texas Advanced Computing Center (TACC) has won the next NSF-funded big supercomputer beating out rivals including the National Center for Supercomputing Ap Read more…

By John Russell

IBM at Hot Chips: What’s Next for Power

August 23, 2018

With processor, memory and networking technologies all racing to fill in for an ailing Moore’s law, the era of the heterogeneous datacenter is well underway, Read more…

By Tiffany Trader

Requiem for a Phi: Knights Landing Discontinued

July 25, 2018

On Monday, Intel made public its end of life strategy for the Knights Landing "KNL" Phi product set. The announcement makes official what has already been wide Read more…

By Tiffany Trader

CERN Project Sees Orders-of-Magnitude Speedup with AI Approach

August 14, 2018

An award-winning effort at CERN has demonstrated potential to significantly change how the physics based modeling and simulation communities view machine learni Read more…

By Rob Farber

House Passes $1.275B National Quantum Initiative

September 17, 2018

Last Thursday the U.S. House of Representatives passed the National Quantum Initiative Act (NQIA) intended to accelerate quantum computing research and developm Read more…

By John Russell

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

By John Russell

D-Wave Breaks New Ground in Quantum Simulation

July 16, 2018

Last Friday D-Wave scientists and colleagues published work in Science which they say represents the first fulfillment of Richard Feynman’s 1982 notion that Read more…

By John Russell

Leading Solution Providers

HPC on Wall Street 2018 Booth Video Tours Playlist

Arista

Dell EMC

IBM

Intel

RStor

VMWare

TACC’s ‘Frontera’ Supercomputer Expands Horizon for Extreme-Scale Science

August 29, 2018

The National Science Foundation and the Texas Advanced Computing Center announced today that a new system, called Frontera, will overtake Stampede 2 as the fast Read more…

By Tiffany Trader

HPE No. 1, IBM Surges, in ‘Bucking Bronco’ High Performance Server Market

September 27, 2018

Riding healthy U.S. and global economies, strong demand for AI-capable hardware and other tailwind trends, the high performance computing server market jumped 28 percent in the second quarter 2018 to $3.7 billion, up from $2.9 billion for the same period last year, according to industry analyst firm Hyperion Research. Read more…

By Doug Black

Intel Announces Cooper Lake, Advances AI Strategy

August 9, 2018

Intel's chief datacenter exec Navin Shenoy kicked off the company's Data-Centric Innovation Summit Wednesday, the day-long program devoted to Intel's datacenter Read more…

By Tiffany Trader

GPUs Power Five of World’s Top Seven Supercomputers

June 25, 2018

The top 10 echelon of the newly minted Top500 list boasts three powerful new systems with one common engine: the Nvidia Volta V100 general-purpose graphics proc Read more…

By Tiffany Trader

Germany Celebrates Launch of Two Fastest Supercomputers

September 26, 2018

The new high-performance computer SuperMUC-NG at the Leibniz Supercomputing Center (LRZ) in Garching is the fastest computer in Germany and one of the fastest i Read more…

By Tiffany Trader

MLPerf – Will New Machine Learning Benchmark Help Propel AI Forward?

May 2, 2018

Let the AI benchmarking wars begin. Today, a diverse group from academia and industry – Google, Baidu, Intel, AMD, Harvard, and Stanford among them – releas Read more…

By John Russell

Aerodynamic Simulation Reveals Best Position in a Peloton of Cyclists

July 5, 2018

Eindhoven University of Technology (TU/e) and KU Leuven research group conducts the largest numerical simulation ever done in the sport industry and cycling discipline. The goal was to understand the aerodynamic interactions in the peloton, i.e., the main pack of cyclists in a race. Read more…

Houston to Field Massive, ‘Geophysically Configured’ Cloud Supercomputer

October 11, 2018

Based on some news stories out today, one might get the impression that the next system to crack number one on the Top500 would be an industrial oil and gas mon Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This