Networking, Data Experts Design a Better Portal for Scientific Discovery

By Jon Bashor

January 29, 2018

Jan. 29, 2018 — These days, it’s easy to overlook the fact that the World Wide Web was created nearly 30 years ago primarily to help researchers access and share scientific data. Over the years, the web has evolved into a tool that helps us eat, shop, travel, watch movies and even monitor our homes.

The Science DMZ includes multiple DTNs that provide for high-speed transfer between network and storage. Portal functions run on a portal server, located on the institution’s enterprise network. The DTNs need only speak the API of the data management service (Globus in this case).

Meanwhile, scientific instruments have become much more powerful, generating massive datasets, and international collaborations have proliferated. In this new era, the web has become an essential part of the scientific process, but the most common method of sharing research data remains firmly attached to the earliest days of the web. This can be a huge impediment to scientific discovery.

That’s why a team of networking experts from the Department of Energy’s Energy Sciences Network (ESnet), with the Globus team from the University of Chicago and Argonne National Laboratory, has designed a new approach that makes data sharing faster, more reliable and more secure. In an article published Jan. 15 in Peer J Comp Sci, the team describes their “The Modern Research Data Portal: a design pattern for networked, data-intensive science.”

“Both the size of datasets and the quantity of data objects has exploded, but the typical design of a data portal hasn’t really changed,” said co-author Eli Dart, a network engineer with the Department of Energy’s Energy Sciences Network, or ESnet. “Our new design preserves that ease of use, but easily scales up to handle the huge amounts of data associated with today’s science.”

Data portals, sometimes called science gateways, are web-based interfaces for access data storage and computing systems, allowing authorized users to access data and perform shared computations. As science becomes increasingly data-driven and collaborative, data portals are advancing research in materials, physics, astrophysics, cosmology, climate science and other fields.

The traditional portal is driven by a web server that is connected to a storage system and a database and processes users’ requests for data. While this simple design was straightforward to develop 25 years ago, it has increasingly become an obstacle to performance, usability and security.

“The problem with using old technology is that these portals don’t provide fast access to the data and they aren’t very flexible,” said lead author Ian Foster, who is the Arthur Holly Compton Professor at the University of Chicago and Director of the Data Science and Learning Division at Argonne National Laboratory. “Since each portal is developed as its own silo, the organization therefore must implement, and then manage and support, multiple complete software stacks to support each portal.”

The new portal design is built on two approaches developed to simplify and speed up transfers of large datasets.

  • The Science DMZ, which Dart developed, is a high-performance network design that connects large-scale data servers directly to high-speed networks and is increasingly used by research institutions to better manage data transfers.
  • Globus is a cloud-based service to which developers of data portals and other science services can outsource responsibility for complex tasks like authentication, authorization, data movement, and data sharing. Globus can be used, in particular, to drive data transfers into and out of Science DMZs.

Kyle Chard, Foster, David Shiffett, Steven Tuecke and Jason Williams are co-authors of the paper and helped develop Globus at Argonne National Laboratory and the University of Chicago. In their paper, the authors note that the concept became feasible in 2015 as Globus and the Science DMZ became mature technologies.

“Together, Globus and the Science DMZ give researchers a powerful toolbox for conducting their research,” Dart said.

One portal incorporating the new design is the Research Data Archive managed by the National Center for Atmospheric Research, which contains a large and diverse collection of meteorological and oceanographic observations, operational and reanalysis model outputs, and remote sensing datasets to support atmospheric and geosciences research.

For example, a scientist working at a university could download data from the National Center for Atmospheric Research (NCAR) in Colorado and then use it to run simulations at DOE and NSF supercomputing centers in California and Illinois, and finally move the data to her home institution for analysis. To illustrate how the design works, Dart selected a 460-gigabyte dataset at NCAR, initiated a Globus transfer to DOE’s National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory, logged in to his storage account and started the transfer. Four minutes later, the 5,141 files had been seamlessly transferred.

How the design works

The Modern Research Data Portal takes the single-server model of the traditional portal design and divides it among three distinct components.

  • A portal web server handles the search for and access to the specified data, and similar tasks.
  • The data servers, often called Data Transfer Nodes, are connected to high-speed networks through a specialized enclave, in this case the Science DMZ. The Science DMZ provides a dedicated, secure link to the data servers, but avoids common performance bottlenecks caused by typical designs not optimized for high-speed transfers.
  • Globus manages the authentication, data access and data transfers. Globus makes it possible for users to manage data irrespective of the location or storage system on which data reside and supports data transfer, sharing, and publication directly from those storage systems.

“The design pattern thus defines distinct roles for the web server, which manages who is allowed to do what; data servers, where authorized operations are performed on data; and external services, which orchestrate data access,” the authors wrote.

Globus is already used by tens of thousands of researchers worldwide with endpoints at more than 360 sites, so many researchers are familiar with its capabilities and rely on it on a regular basis. In fact, about 80 percent of major research universities and national labs in the U.S. use Globus.

At the same time, more than 100 research universities across the country have deployed Science DMZs, thanks to funding support through the National Science Foundation’s Campus Cyberinfrastructure Program.

A critical component of the system is “a little agent called Globus Connect, which is much like the Google Drive or Dropbox agents one would install on their own PCs,” Chard said. Globus Connect allows the Globus service to move data to and from the computer using high performance protocols and also HTTPS for direct access. It also allows users to share data dynamically with their peers.

According to Chard, the design provides research organizations with easy-to-use technology tools similar to those used by business startups to streamline development.

“If we look to industry, startup businesses can now build upon a suite of services to simplify what they need to build and manage themselves,” Chard said. “In a research setting, Globus has developed a stack of such capabilities that are needed by any research portal. Recently, we (Globus) have developed interfaces to make it trivial for developers to build upon these capabilities as a platform.”

“As a result of this design, users have a platform that allows them to easily place and transfer data without having to scale up the human effort as the amount of data scales up,” Dart said.

ESnet is a DOE Office of Science User Facility. Argonne and Lawrence Berkeley national laboratories are supported by the Office of Science of the U.S. Department of Energy. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time.  For more information, please visit science.energy.gov.

About Computing Sciences at Berkeley Lab

The Lawrence Berkeley National Laboratory (Berkeley LabComputing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy’s research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.

ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 6,000 scientists at national laboratories and universities, including those at Berkeley Lab’s Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are DOE Office of Science User Facilities.

Lawrence Berkeley National Laboratory addresses the world’s most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab’s scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.

DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Pfizer HPC Engineer Aims to Automate Software Stack Testing

January 17, 2019

Seeking to reign in the tediousness of manual software testing, Pfizer HPC Engineer Shahzeb Siddiqui is developing an open source software tool called buildtest, aimed at automating software stack testing by providing the community with a central repository of tests for common HPC apps and the ability to automate execution of testing. Read more…

By Tiffany Trader

Senegal Prepares to Take Delivery of Atos Supercomputer

January 16, 2019

In just a few months time, Senegal will be operating the second largest HPC system in sub-Saharan Africa. The Minister of Higher Education, Research and Innovation Mary Teuw Niane made the announcement on Monday (Jan. 14 Read more…

By Tiffany Trader

Google Cloud Platform Extends GPU Instance Options

January 16, 2019

If it's Nvidia GPUs you're after to power your AI/HPC/visualization workload, Google Cloud has them, now claiming "broadest GPU availability." Each of the three big public cloud vendors has by turn touted the latest and Read more…

By Tiffany Trader

HPE Extreme Performance Solutions

HPE Systems With Intel Omni-Path: Architected for Value and Accessible High-Performance Computing

Today’s high-performance computing (HPC) and artificial intelligence (AI) users value high performing clusters. And the higher the performance that their system can deliver, the better. Read more…

IBM Accelerated Insights

Resource Management in the Age of Artificial Intelligence

New challenges demand fresh approaches

Fueled by GPUs, big data, and rapid advances in software, the AI revolution is upon us. Read more…

STAC Floats ML Benchmark for Financial Services Workloads

January 16, 2019

STAC (Securities Technology Analysis Center) recently released an ‘exploratory’ benchmark for machine learning which it hopes will evolve into a firm benchmark or suite of benchmarking tools to compare the performanc Read more…

By John Russell

Google Cloud Platform Extends GPU Instance Options

January 16, 2019

If it's Nvidia GPUs you're after to power your AI/HPC/visualization workload, Google Cloud has them, now claiming "broadest GPU availability." Each of the three Read more…

By Tiffany Trader

STAC Floats ML Benchmark for Financial Services Workloads

January 16, 2019

STAC (Securities Technology Analysis Center) recently released an ‘exploratory’ benchmark for machine learning which it hopes will evolve into a firm benchm Read more…

By John Russell

A Big Data Journey While Seeking to Catalog our Universe

January 16, 2019

It turns out, astronomers have lots of photos of the sky but seek knowledge about what the photos mean. Sound familiar? Big data problems are often characterize Read more…

By James Reinders

Intel Bets Big on 2-Track Quantum Strategy

January 15, 2019

Quantum computing has lived so long in the future it’s taken on a futuristic life of its own, with a Gartner-style hype cycle that includes triggers of innovation, inflated expectations and – though a useful quantum system is still years away – anticipatory troughs of disillusionment. Read more…

By Doug Black

IBM Quantum Update: Q System One Launch, New Collaborators, and QC Center Plans

January 10, 2019

IBM made three significant quantum computing announcements at CES this week. One was introduction of IBM Q System One; it’s really the integration of IBM’s Read more…

By John Russell

IBM’s New Global Weather Forecasting System Runs on GPUs

January 9, 2019

Anyone who has checked a forecast to decide whether or not to pack an umbrella knows that weather prediction can be a mercurial endeavor. It is a Herculean task: the constant modeling of incredibly complex systems to a high degree of accuracy at a local level within very short spans of time. Read more…

By Oliver Peckham

The Case Against ‘The Case Against Quantum Computing’

January 9, 2019

It’s not easy to be a physicist. Richard Feynman (basically the Jimi Hendrix of physicists) once said: “The first principle is that you must not fool yourse Read more…

By Ben Criger

The Deep500 – Researchers Tackle an HPC Benchmark for Deep Learning

January 7, 2019

How do you know if an HPC system, particularly a larger-scale system, is well-suited for deep learning workloads? Today, that’s not an easy question to answer Read more…

By John Russell

Quantum Computing Will Never Work

November 27, 2018

Amid the gush of money and enthusiastic predictions being thrown at quantum computing comes a proposed cold shower in the form of an essay by physicist Mikhail Read more…

By John Russell

Cray Unveils Shasta, Lands NERSC-9 Contract

October 30, 2018

Cray revealed today the details of its next-gen supercomputing architecture, Shasta, selected to be the next flagship system at NERSC. We've known of the code-name "Shasta" since the Argonne slice of the CORAL project was announced in 2015 and although the details of that plan have changed considerably, Cray didn't slow down its timeline for Shasta. Read more…

By Tiffany Trader

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

AMD Sets Up for Epyc Epoch

November 16, 2018

It’s been a good two weeks, AMD’s Gary Silcott and Andy Parma told me on the last day of SC18 in Dallas at the restaurant where we met to discuss their show news and recent successes. Heck, it’s been a good year. Read more…

By Tiffany Trader

The Case Against ‘The Case Against Quantum Computing’

January 9, 2019

It’s not easy to be a physicist. Richard Feynman (basically the Jimi Hendrix of physicists) once said: “The first principle is that you must not fool yourse Read more…

By Ben Criger

US Leads Supercomputing with #1, #2 Systems & Petascale Arm

November 12, 2018

The 31st Supercomputing Conference (SC) - commemorating 30 years since the first Supercomputing in 1988 - kicked off in Dallas yesterday, taking over the Kay Ba Read more…

By Tiffany Trader

Contract Signed for New Finnish Supercomputer

December 13, 2018

After the official contract signing yesterday, configuration details were made public for the new BullSequana system that the Finnish IT Center for Science (CSC Read more…

By Tiffany Trader

Nvidia’s Jensen Huang Delivers Vision for the New HPC

November 14, 2018

For nearly two hours on Monday at SC18, Jensen Huang, CEO of Nvidia, presented his expansive view of the future of HPC (and computing in general) as only he can do. Animated. Backstopped by a stream of data charts, product photos, and even a beautiful image of supernovae... Read more…

By John Russell

Leading Solution Providers

SC 18 Virtual Booth Video Tour

Advania @ SC18 AMD @ SC18
ASRock Rack @ SC18
DDN Storage @ SC18
HPE @ SC18
IBM @ SC18
Lenovo @ SC18 Mellanox Technologies @ SC18
NVIDIA @ SC18
One Stop Systems @ SC18
Oracle @ SC18 Panasas @ SC18
Supermicro @ SC18 SUSE @ SC18 TYAN @ SC18
Verne Global @ SC18

HPE No. 1, IBM Surges, in ‘Bucking Bronco’ High Performance Server Market

September 27, 2018

Riding healthy U.S. and global economies, strong demand for AI-capable hardware and other tailwind trends, the high performance computing server market jumped 28 percent in the second quarter 2018 to $3.7 billion, up from $2.9 billion for the same period last year, according to industry analyst firm Hyperion Research. Read more…

By Doug Black

House Passes $1.275B National Quantum Initiative

September 17, 2018

Last Thursday the U.S. House of Representatives passed the National Quantum Initiative Act (NQIA) intended to accelerate quantum computing research and developm Read more…

By John Russell

HPC Reflections and (Mostly Hopeful) Predictions

December 19, 2018

So much ‘spaghetti’ gets tossed on walls by the technology community (vendors and researchers) to see what sticks that it is often difficult to peer through Read more…

By John Russell

Intel Confirms 48-Core Cascade Lake-AP for 2019

November 4, 2018

As part of the run-up to SC18, taking place in Dallas next week (Nov. 11-16), Intel is doling out info on its next-gen Cascade Lake family of Xeon processors, specifically the “Advanced Processor” version (Cascade Lake-AP), architected for high-performance computing, artificial intelligence and infrastructure-as-a-service workloads. Read more…

By Tiffany Trader

Germany Celebrates Launch of Two Fastest Supercomputers

September 26, 2018

The new high-performance computer SuperMUC-NG at the Leibniz Supercomputing Center (LRZ) in Garching is the fastest computer in Germany and one of the fastest i Read more…

By Tiffany Trader

Houston to Field Massive, ‘Geophysically Configured’ Cloud Supercomputer

October 11, 2018

Based on some news stories out today, one might get the impression that the next system to crack number one on the Top500 would be an industrial oil and gas mon Read more…

By Tiffany Trader

Microsoft to Buy Mellanox?

December 20, 2018

Networking equipment powerhouse Mellanox could be an acquisition target by Microsoft, according to a published report in an Israeli financial publication. Microsoft has reportedly gone so far as to engage Goldman Sachs to handle negotiations with Mellanox. Read more…

By Doug Black

The Deep500 – Researchers Tackle an HPC Benchmark for Deep Learning

January 7, 2019

How do you know if an HPC system, particularly a larger-scale system, is well-suited for deep learning workloads? Today, that’s not an easy question to answer Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This