Networking, Data Experts Design a Better Portal for Scientific Discovery

By Jon Bashor

January 29, 2018

Jan. 29, 2018 — These days, it’s easy to overlook the fact that the World Wide Web was created nearly 30 years ago primarily to help researchers access and share scientific data. Over the years, the web has evolved into a tool that helps us eat, shop, travel, watch movies and even monitor our homes.

The Science DMZ includes multiple DTNs that provide for high-speed transfer between network and storage. Portal functions run on a portal server, located on the institution’s enterprise network. The DTNs need only speak the API of the data management service (Globus in this case).

Meanwhile, scientific instruments have become much more powerful, generating massive datasets, and international collaborations have proliferated. In this new era, the web has become an essential part of the scientific process, but the most common method of sharing research data remains firmly attached to the earliest days of the web. This can be a huge impediment to scientific discovery.

That’s why a team of networking experts from the Department of Energy’s Energy Sciences Network (ESnet), with the Globus team from the University of Chicago and Argonne National Laboratory, has designed a new approach that makes data sharing faster, more reliable and more secure. In an article published Jan. 15 in Peer J Comp Sci, the team describes their “The Modern Research Data Portal: a design pattern for networked, data-intensive science.”

“Both the size of datasets and the quantity of data objects has exploded, but the typical design of a data portal hasn’t really changed,” said co-author Eli Dart, a network engineer with the Department of Energy’s Energy Sciences Network, or ESnet. “Our new design preserves that ease of use, but easily scales up to handle the huge amounts of data associated with today’s science.”

Data portals, sometimes called science gateways, are web-based interfaces for access data storage and computing systems, allowing authorized users to access data and perform shared computations. As science becomes increasingly data-driven and collaborative, data portals are advancing research in materials, physics, astrophysics, cosmology, climate science and other fields.

The traditional portal is driven by a web server that is connected to a storage system and a database and processes users’ requests for data. While this simple design was straightforward to develop 25 years ago, it has increasingly become an obstacle to performance, usability and security.

“The problem with using old technology is that these portals don’t provide fast access to the data and they aren’t very flexible,” said lead author Ian Foster, who is the Arthur Holly Compton Professor at the University of Chicago and Director of the Data Science and Learning Division at Argonne National Laboratory. “Since each portal is developed as its own silo, the organization therefore must implement, and then manage and support, multiple complete software stacks to support each portal.”

The new portal design is built on two approaches developed to simplify and speed up transfers of large datasets.

  • The Science DMZ, which Dart developed, is a high-performance network design that connects large-scale data servers directly to high-speed networks and is increasingly used by research institutions to better manage data transfers.
  • Globus is a cloud-based service to which developers of data portals and other science services can outsource responsibility for complex tasks like authentication, authorization, data movement, and data sharing. Globus can be used, in particular, to drive data transfers into and out of Science DMZs.

Kyle Chard, Foster, David Shiffett, Steven Tuecke and Jason Williams are co-authors of the paper and helped develop Globus at Argonne National Laboratory and the University of Chicago. In their paper, the authors note that the concept became feasible in 2015 as Globus and the Science DMZ became mature technologies.

“Together, Globus and the Science DMZ give researchers a powerful toolbox for conducting their research,” Dart said.

One portal incorporating the new design is the Research Data Archive managed by the National Center for Atmospheric Research, which contains a large and diverse collection of meteorological and oceanographic observations, operational and reanalysis model outputs, and remote sensing datasets to support atmospheric and geosciences research.

For example, a scientist working at a university could download data from the National Center for Atmospheric Research (NCAR) in Colorado and then use it to run simulations at DOE and NSF supercomputing centers in California and Illinois, and finally move the data to her home institution for analysis. To illustrate how the design works, Dart selected a 460-gigabyte dataset at NCAR, initiated a Globus transfer to DOE’s National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory, logged in to his storage account and started the transfer. Four minutes later, the 5,141 files had been seamlessly transferred.

How the design works

The Modern Research Data Portal takes the single-server model of the traditional portal design and divides it among three distinct components.

  • A portal web server handles the search for and access to the specified data, and similar tasks.
  • The data servers, often called Data Transfer Nodes, are connected to high-speed networks through a specialized enclave, in this case the Science DMZ. The Science DMZ provides a dedicated, secure link to the data servers, but avoids common performance bottlenecks caused by typical designs not optimized for high-speed transfers.
  • Globus manages the authentication, data access and data transfers. Globus makes it possible for users to manage data irrespective of the location or storage system on which data reside and supports data transfer, sharing, and publication directly from those storage systems.

“The design pattern thus defines distinct roles for the web server, which manages who is allowed to do what; data servers, where authorized operations are performed on data; and external services, which orchestrate data access,” the authors wrote.

Globus is already used by tens of thousands of researchers worldwide with endpoints at more than 360 sites, so many researchers are familiar with its capabilities and rely on it on a regular basis. In fact, about 80 percent of major research universities and national labs in the U.S. use Globus.

At the same time, more than 100 research universities across the country have deployed Science DMZs, thanks to funding support through the National Science Foundation’s Campus Cyberinfrastructure Program.

A critical component of the system is “a little agent called Globus Connect, which is much like the Google Drive or Dropbox agents one would install on their own PCs,” Chard said. Globus Connect allows the Globus service to move data to and from the computer using high performance protocols and also HTTPS for direct access. It also allows users to share data dynamically with their peers.

According to Chard, the design provides research organizations with easy-to-use technology tools similar to those used by business startups to streamline development.

“If we look to industry, startup businesses can now build upon a suite of services to simplify what they need to build and manage themselves,” Chard said. “In a research setting, Globus has developed a stack of such capabilities that are needed by any research portal. Recently, we (Globus) have developed interfaces to make it trivial for developers to build upon these capabilities as a platform.”

“As a result of this design, users have a platform that allows them to easily place and transfer data without having to scale up the human effort as the amount of data scales up,” Dart said.

ESnet is a DOE Office of Science User Facility. Argonne and Lawrence Berkeley national laboratories are supported by the Office of Science of the U.S. Department of Energy. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time.  For more information, please visit science.energy.gov.

About Computing Sciences at Berkeley Lab

The Lawrence Berkeley National Laboratory (Berkeley LabComputing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy’s research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.

ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 6,000 scientists at national laboratories and universities, including those at Berkeley Lab’s Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are DOE Office of Science User Facilities.

Lawrence Berkeley National Laboratory addresses the world’s most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab’s scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.

DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Fluid HPC: How Extreme-Scale Computing Should Respond to Meltdown and Spectre

February 15, 2018

The Meltdown and Spectre vulnerabilities are proving difficult to fix, and initial experiments suggest security patches will cause significant performance penalties to HPC applications. Even as these patches are rolled o Read more…

By Pete Beckman

Intel Touts Silicon Spin Qubits for Quantum Computing

February 14, 2018

Debate around what makes a good qubit and how best to manufacture them is a sprawling topic. There are many insistent voices favoring one or another approach. Referencing a paper published today in Nature, Intel has offe Read more…

By John Russell

Brookhaven Ramps Up Computing for National Security Effort

February 14, 2018

Last week, Dan Coats, the director of Director of National Intelligence for the U.S., warned the Senate Intelligence Committee that Russia was likely to meddle in the 2018 mid-term U.S. elections, much as it stands accused of doing in the 2016 Presidential election. Read more…

By John Russell

HPE Extreme Performance Solutions

Safeguard Your HPC Environment with the World’s Most Secure Industry Standard Servers

Today’s organizations operate in an environment with ever-evolving threats, and in order to protect themselves they must continuously bolster their security strategy. Hewlett Packard Enterprise (HPE) and Intel® are addressing modern security challenges with the world’s most secure industry standard servers powered by the latest generation of Intel® Xeon® Scalable processors. Read more…

AI Cloud Competition Heats Up: Google’s TPUs, Amazon Building AI Chip

February 12, 2018

Competition in the white hot AI (and public cloud) market pits Google against Amazon this week, with Google offering AI hardware on its cloud platform intended to make it easier, faster and cheaper to train and run machi Read more…

By Doug Black

Fluid HPC: How Extreme-Scale Computing Should Respond to Meltdown and Spectre

February 15, 2018

The Meltdown and Spectre vulnerabilities are proving difficult to fix, and initial experiments suggest security patches will cause significant performance penal Read more…

By Pete Beckman

Brookhaven Ramps Up Computing for National Security Effort

February 14, 2018

Last week, Dan Coats, the director of Director of National Intelligence for the U.S., warned the Senate Intelligence Committee that Russia was likely to meddle in the 2018 mid-term U.S. elections, much as it stands accused of doing in the 2016 Presidential election. Read more…

By John Russell

AI Cloud Competition Heats Up: Google’s TPUs, Amazon Building AI Chip

February 12, 2018

Competition in the white hot AI (and public cloud) market pits Google against Amazon this week, with Google offering AI hardware on its cloud platform intended Read more…

By Doug Black

Russian Nuclear Engineers Caught Cryptomining on Lab Supercomputer

February 12, 2018

Nuclear scientists working at the All-Russian Research Institute of Experimental Physics (RFNC-VNIIEF) have been arrested for using lab supercomputing resources to mine crypto-currency, according to a report in Russia’s Interfax News Agency. Read more…

By Tiffany Trader

The Food Industry’s Next Journey — from Mars to Exascale

February 12, 2018

Global food producer and one of the world's leading chocolate companies Mars Inc. has a unique perspective on the impact that exascale computing will have on the food industry. Read more…

By Scott Gibson, Oak Ridge National Laboratory

Singularity HPC Container Start-Up – Sylabs – Emerges from Stealth

February 8, 2018

The driving force behind Singularity, the popular HPC container technology, is bringing the open source platform to the enterprise with the launch of a new vent Read more…

By George Leopold

Dell EMC Debuts PowerEdge Servers with AMD EPYC Chips

February 6, 2018

AMD notched another EPYC processor win today with Dell EMC’s introduction of three PowerEdge servers (R6415, R7415, and R7425) based on the EPYC 7000-series p Read more…

By John Russell

‘Next Generation’ Universe Simulation Is Most Advanced Yet

February 5, 2018

The research group that gave us the most detailed time-lapse simulation of the universe’s evolution in 2014, spanning 13.8 billion years of cosmic evolution, is back in the spotlight with an even more advanced cosmological model that is providing new insights into how black holes influence the distribution of dark matter, how heavy elements are produced and distributed, and where magnetic fields originate. Read more…

By Tiffany Trader

Inventor Claims to Have Solved Floating Point Error Problem

January 17, 2018

"The decades-old floating point error problem has been solved," proclaims a press release from inventor Alan Jorgensen. The computer scientist has filed for and Read more…

By Tiffany Trader

Japan Unveils Quantum Neural Network

November 22, 2017

The U.S. and China are leading the race toward productive quantum computing, but it's early enough that ultimate leadership is still something of an open questi Read more…

By Tiffany Trader

AMD Showcases Growing Portfolio of EPYC and Radeon-based Systems at SC17

November 13, 2017

AMD’s charge back into HPC and the datacenter is on full display at SC17. Having launched the EPYC processor line in June along with its MI25 GPU the focus he Read more…

By John Russell

Researchers Measure Impact of ‘Meltdown’ and ‘Spectre’ Patches on HPC Workloads

January 17, 2018

Computer scientists from the Center for Computational Research, State University of New York (SUNY), University at Buffalo have examined the effect of Meltdown Read more…

By Tiffany Trader

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

IBM Begins Power9 Rollout with Backing from DOE, Google

December 6, 2017

After over a year of buildup, IBM is unveiling its first Power9 system based on the same architecture as the Department of Energy CORAL supercomputers, Summit a Read more…

By Tiffany Trader

Fast Forward: Five HPC Predictions for 2018

December 21, 2017

What’s on your list of high (and low) lights for 2017? Volta 100’s arrival on the heels of the P100? Appearance, albeit late in the year, of IBM’s Power9? Read more…

By John Russell

Russian Nuclear Engineers Caught Cryptomining on Lab Supercomputer

February 12, 2018

Nuclear scientists working at the All-Russian Research Institute of Experimental Physics (RFNC-VNIIEF) have been arrested for using lab supercomputing resources to mine crypto-currency, according to a report in Russia’s Interfax News Agency. Read more…

By Tiffany Trader

Leading Solution Providers

Chip Flaws ‘Meltdown’ and ‘Spectre’ Loom Large

January 4, 2018

The HPC and wider tech community have been abuzz this week over the discovery of critical design flaws that impact virtually all contemporary microprocessors. T Read more…

By Tiffany Trader

Perspective: What Really Happened at SC17?

November 22, 2017

SC is over. Now comes the myriad of follow-ups. Inboxes are filled with templated emails from vendors and other exhibitors hoping to win a place in the post-SC thinking of booth visitors. Attendees of tutorials, workshops and other technical sessions will be inundated with requests for feedback. Read more…

By Andrew Jones

How Meltdown and Spectre Patches Will Affect HPC Workloads

January 10, 2018

There have been claims that the fixes for the Meltdown and Spectre security vulnerabilities, named the KPTI (aka KAISER) patches, are going to affect applicatio Read more…

By Rosemary Francis

GlobalFoundries, Ayar Labs Team Up to Commercialize Optical I/O

December 4, 2017

GlobalFoundries (GF) and Ayar Labs, a startup focused on using light, instead of electricity, to transfer data between chips, today announced they've entered in Read more…

By Tiffany Trader

Tensors Come of Age: Why the AI Revolution Will Help HPC

November 13, 2017

Thirty years ago, parallel computing was coming of age. A bitter battle began between stalwart vector computing supporters and advocates of various approaches to parallel computing. IBM skeptic Alan Karp, reacting to announcements of nCUBE’s 1024-microprocessor system and Thinking Machines’ 65,536-element array, made a public $100 wager that no one could get a parallel speedup of over 200 on real HPC workloads. Read more…

By John Gustafson & Lenore Mullin

Flipping the Flops and Reading the Top500 Tea Leaves

November 13, 2017

The 50th edition of the Top500 list, the biannual publication of the world’s fastest supercomputers based on public Linpack benchmarking results, was released Read more…

By Tiffany Trader

V100 Good but not Great on Select Deep Learning Aps, Says Xcelerit

November 27, 2017

Wringing optimum performance from hardware to accelerate deep learning applications is a challenge that often depends on the specific application in use. A benc Read more…

By John Russell

2017 Gordon Bell Prize Finalists Named

October 23, 2017

The three finalists for this year’s Gordon Bell Prize in High Performance Computing have been announced. They include two papers on projects run on China’s Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Share This