Profile of a Data Science Pioneer

By Karen Green, RENCI

June 28, 2016

As he approaches retirement, Reagan Moore reflects on SRB, iRODS, and the ongoing challenge of helping scientists manage their data.

In 1994, Reagan Moore managed the production computing systems at the San Diego Supercomputer Center (SDSC), a job that entailed running and maintaining huge Cray computing systems as well as networking, archival storage, security, job scheduling, and visualization systems.

At the time, research was evolving from analyses done by individuals on single computers into a collaborative activity using distributed, interconnected and heterogeneous resources. With those changes came challenges. As Moore recalls, the software needed to manage data and interactions in a widely distributed environment didn’t exist.

Moore-UGM-1600x
Reagan Moore addresses attendees at the iRODS User Group Meeting, held June 8 and 9 in Chapel Hill, NC.

“The systems at that time were things like AFS (Andrew File System), but it had major restrictions,” said Moore. AFS was implemented as modifications to the operating system kernel. To implement AFS for the National Science Foundation’s National Partnership for Advanced Computational Infrastructure (NPACI) program, which SDSC managed in the 1990s, required partitioning of user IDs to reserve IDs for each NPACI site.

“Every time you updated a site’s kernel you had to reinstall the AFS mods and preserve the user IDs,” Moore recalled. “With sites that used different operating systems, this became difficult.”

Moore saw the technical challenges as an opportunity for research in distributed data management. He secured funding from the Defense Advanced Research Projects Agency (DARPA), and with a team of talented visionaries and software developers created the Storage Resource Broker (SRB).

From SRB to iRODS

Over time, SRB evolved into iRODS, the integrated Rule Oriented Data System and Moore, now a professor in the School of Information and Library Science (SILS) at the University of North Carolina at Chapel Hill and a data scientist at UNC’s Renaissance Computing Institute (RENCI), stands on the brink of retirement. iRODS, the middleware platform that started as the SRB, now boasts more than 20,000 end users spanning six continents and manages more than 100 petabytes of data. The iRODS Consortium, established in 2014 to sustain the continued development of iRODS, now includes 17 members as well as four partner organizations that help with iRODS deployments and support services.

It’s a software and enabling science success story that developed over two decades and involved much hard work as well as an aggressive goal.

Moore-Ahalt-1600x
Reagan Moore, left, with RENCI Director Stan Ahalt after receiving recognition for long and successful career at the recent iRODS User Group meeting in Chapel Hill, NC.

“Reagan is a visionary,” said Arcot Rajasekar, who started working with Moore in the mid 1990s and made the move from SDSC to UNC-Chapel Hill with him in 2008. “He was talking about massive data analysis and data intensive computing a full 15 years before the phrase ‘big data’ was coined. These days the word ‘policy’ in data management, curation, sharing and analysis is becoming mainstream. But Reagan was talking about it a long while back.”

Rajasekar, also a professor in UNC’s SILS and a RENCI data scientist, was a key member of the original Data Intensive Computing Environments (DICE) research group, the team established to develop the SRB. Other members were system architect Mike Wan, principle developer Wayne Schroeder, and technical manager Chaitan Baru. Over 20 years, the DICE group landed 34 research grants.

“The way we approached the problem was through a very large number of collaborations instead of one large project,” Moore remembers. “The research communities provided the requirements; we took their requirements and translated them into generic data management infrastructure.”

Toward rule-oriented data management

Moore gives credit to Rajasekar for inventing the idea of rule-oriented data management. iRODS developed because SRB users wanted to enforce different constraints for different data collections while using a common infrastructure. Moore remembers working with the data group of the UK’s e-Science Program and learning they needed to guarantee files could not be deleted from one data collection. For another collection, they wanted the system administrator to be able to delete and replace bad data, and for a third, they required the collection owner to be able to delete and add data at will.

“What Rajasekar did was to extract the policy that controls the deletion operation from the software and put the rule in a rule base,” said Moore. “Then we could make rules appropriate to each collection.”

That was the birth of policy-based data management, which allows users to define their own policies and procedures for enforcing management decisions, automating administrative tasks, and validating assessment criteria. As Moore says, “There are three reasons people go to policy-based data management. One is that there are management decisions they need to enforce properly. Another is they are dealing with distributed data at multiple administrative domains on multiple types of software systems. A third is that the collection has grown so large it can no longer be managed at a single site.”

Tenacity and dedication to his craft are traits that Moore’s longtime colleagues know well. According to Baru, now senior advisor for data science in the National Science Foundation’s Computer and Information Science and Engineering (CISE) directorate, Moore sees his job as a mission.

“We used to say that he loved his work and travel so much that he used his airline mileage credits for even more business travel,” said Baru. “He was also the master of stretching the travel dollar. He introduced me to that specific parking lot down Pacific Coast Highway in San Diego that had the cheapest daily rate. To this day, I think of that as ‘Reagan’s lot.’”

The Future: Virtualized Data Flows and SDN

With retirement just around the corner, Moore, always humble and soft spoken, acknowledges his role in changing research from a cottage industry into an endeavor focused on distributed, often large-scale collaborative projects.

“We started out trying to virtualize properties of collections. Most of the world wanted to virtualize storage; we wanted to virtualize the data you were putting into the storage so you could manage collection properties independently of the choice of storage technology,” he said.

Moore-Coposky-1600x
Reagan Moore, left, is congratulated for his years of service by Jason Coposky, interim executive director of the iRODS Consortium, at the annual iRODS User Group Meeting in June. In the background are Helen Tibbo, a professional in the UNC School of Information and Library Science, and Chaitan Baru, senior advisor for data science in the NSF’s CISE directorate.

Next came virtualizing workflows that are executed on compute systems, a process that allows iRODS users to name their workflows, apply access controls, re-execute analyses, track provenance, and generally make it easier for someone else to reapply the same analysis on their own data—all essential capabilities for reproducible research. The next step forward in comprehensive data management, said Moore, is virtualizing data flows.

“I want to be able to describe how data moves across the network, what the sources are, what the destinations are, and apply operations on data in flight,” he said. “That’s what is happening now with the advent of software defined networking. They are putting policies into the network.”

In July 2014, it didn’t seem likely Moore would have the chance to see the future of policy-based data management or even enjoy his retirement. While on a business trip, he suffered massive heart failure. He was resuscitated three times and spent the next six months facing a major challenge: How to stay away from the work he loves and concentrate on rest and recuperation.

“If I were a cat, I’d be on my fourth life, so now seems to be a good time to retire,” he said. Not surprisingly, he has a longstanding hobby to keep him busy. Moore started doing his family genealogy 26 years ago and decided he needed to derive the properties of a complete genealogy in order to know when the project was complete.

“I built a 252,000 person research genealogy, wrote a graph database so I could analyze it, and derived the properties that define when a genealogy is complete. Now I have to start marketing it so other people can take advantage of the results.”

Meanwhile the praises for his contributions to science keep coming in.

“Professor Moore is a visionary pioneer in defining and creating distributed digital library infrastructure,” said Gary Marchionini, Dean and Cary C. Boshamer Professor at the UNC’’s SILS. “He is internationally recognized for his work that makes it possible for data scientists and archivists to instantiate data management policies in code that automates preservation activities. The information science community has been strongly influenced by his work over the past quarter century.”

Added Robert Chadduck of the NSF’s Division of Advanced Cyberinfrastructure, “While I continue to value and be enriched by Reagan’s too-many-to-count contributions to technologies and to scientific advances…I also value his shared contributions to understanding the history and perpetuity of all of us as people as documented in his life contributions to the genealogical record embodying his family.”

And finally, from Wayne Schroeder, the software engineer who worked with Moore in the original DICE group:

“I enjoyed working for Reagan. I liked his fairness, his no-nonsense approach, his can-do attitude, and of course his brilliant mind. He set up an environment where we were free to creatively design and implement software that was both research itself and of practical use to scientific and archival communities.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Nvidia Debuts Clara AI Toolkit with Pre-Trained Models for Radiology Use

March 19, 2019

AI’s push into healthcare got a boost yesterday with Nvidia’s release of the Clara Deploy AI toolkit which includes 13 pre-trained models for use in radiology. Clara, you may recall, is Nvidia’s biomedical platform Read more…

By John Russell

DARPA, NSF Seek Real-Time ML Processor

March 18, 2019

A new U.S. research initiative seeks to develop a processor capable of real-time learning while operating with the “efficiency of the human brain.” The National Science Foundation (NSF) and the Defense Advanced Re Read more…

By George Leopold

It’s Official: Aurora on Track to Be First U.S. Exascale Computer in 2021

March 18, 2019

The U.S. Department of Energy along with Intel and Cray confirmed today that an Intel/Cray supercomputer, "Aurora," capable of sustained performance of one exaflops, will be delivered by the end of 2021 to Argonne Nation Read more…

By Tiffany Trader

HPE Extreme Performance Solutions

HPE and Intel® Omni-Path Architecture: How to Power a Cloud

Learn how HPE and Intel® Omni-Path Architecture provide critical infrastructure for leading Nordic HPC provider’s HPCFLOW cloud service.

powercloud_blog.jpgFor decades, HPE has been at the forefront of high-performance computing, and we’ve powered some of the fastest and most robust supercomputers in the world. Read more…

IBM Accelerated Insights

The Spark That Ignited A New World of Real-Time Analytics

High Performance Computing has always been about Big Data. It’s not uncommon for research datasets to contain millions of files and many terabytes, even petabytes of data, or more. Read more…

NASA’s Pleiades Simulates Launch Abort Scenarios

March 15, 2019

NASA is using flow simulations running on its Pleiades supercomputer to help design the agency’s next manned spacecraft, Orion. Crew safety is paramount, so NASA engineers are using the HPC cluster to simulate and v Read more…

By George Leopold

Nvidia Debuts Clara AI Toolkit with Pre-Trained Models for Radiology Use

March 19, 2019

AI’s push into healthcare got a boost yesterday with Nvidia’s release of the Clara Deploy AI toolkit which includes 13 pre-trained models for use in radiolo Read more…

By John Russell

It’s Official: Aurora on Track to Be First U.S. Exascale Computer in 2021

March 18, 2019

The U.S. Department of Energy along with Intel and Cray confirmed today that an Intel/Cray supercomputer, "Aurora," capable of sustained performance of one exaf Read more…

By Tiffany Trader

Why Nvidia Bought Mellanox: ‘Future Datacenters Will Be…Like High Performance Computers’

March 14, 2019

“Future datacenters of all kinds will be built like high performance computers,” said Nvidia CEO Jensen Huang during a phone briefing on Monday after Nvidia revealed scooping up the high performance networking company Mellanox for $6.9 billion. Read more…

By Tiffany Trader

Oil and Gas Supercloud Clears Out Remaining Knights Landing Inventory: All 38,000 Wafers

March 13, 2019

The McCloud HPC service being built by Australia’s DownUnder GeoSolutions (DUG) outside Houston is set to become the largest oil and gas cloud in the world th Read more…

By Tiffany Trader

Quick Take: Trump’s 2020 Budget Spares DoE-funded HPC but Slams NSF and NIH

March 12, 2019

U.S. President Donald Trump’s 2020 budget request, released yesterday, proposes deep cuts in many science programs but seems to spare HPC funding by the Depar Read more…

By John Russell

Nvidia Wins Mellanox Stakes for $6.9 Billion

March 11, 2019

The long-rumored acquisition of Mellanox came to fruition this morning with GPU chipmaker Nvidia’s announcement that it has purchased the high-performance net Read more…

By Doug Black

Optalysys Rolls Commercial Optical Processor

March 7, 2019

Optalysys, Ltd., a U.K. company seeking to advance it optical co-processor technology, moved a step closer this week with the unveiling of what it claims is th Read more…

By George Leopold

Intel Responds to White House AI Initiative

March 6, 2019

The Trump Administration’s release last month of the “American AI Initiative,” aimed at prioritizing federal R&D investments in machine intelligence, Read more…

By Doug Black

Quantum Computing Will Never Work

November 27, 2018

Amid the gush of money and enthusiastic predictions being thrown at quantum computing comes a proposed cold shower in the form of an essay by physicist Mikhail Read more…

By John Russell

The Case Against ‘The Case Against Quantum Computing’

January 9, 2019

It’s not easy to be a physicist. Richard Feynman (basically the Jimi Hendrix of physicists) once said: “The first principle is that you must not fool yourse Read more…

By Ben Criger

ClusterVision in Bankruptcy, Fate Uncertain

February 13, 2019

ClusterVision, European HPC specialists that have built and installed over 20 Top500-ranked systems in their nearly 17-year history, appear to be in the midst o Read more…

By Tiffany Trader

Intel Reportedly in $6B Bid for Mellanox

January 30, 2019

The latest rumors and reports around an acquisition of Mellanox focus on Intel, which has reportedly offered a $6 billion bid for the high performance interconn Read more…

By Doug Black

Looking for Light Reading? NSF-backed ‘Comic Books’ Tackle Quantum Computing

January 28, 2019

Still baffled by quantum computing? How about turning to comic books (graphic novels for the well-read among you) for some clarity and a little humor on QC. The Read more…

By John Russell

Why Nvidia Bought Mellanox: ‘Future Datacenters Will Be…Like High Performance Computers’

March 14, 2019

“Future datacenters of all kinds will be built like high performance computers,” said Nvidia CEO Jensen Huang during a phone briefing on Monday after Nvidia revealed scooping up the high performance networking company Mellanox for $6.9 billion. Read more…

By Tiffany Trader

Contract Signed for New Finnish Supercomputer

December 13, 2018

After the official contract signing yesterday, configuration details were made public for the new BullSequana system that the Finnish IT Center for Science (CSC Read more…

By Tiffany Trader

Deep500: ETH Researchers Introduce New Deep Learning Benchmark for HPC

February 5, 2019

ETH researchers have developed a new deep learning benchmarking environment – Deep500 – they say is “the first distributed and reproducible benchmarking s Read more…

By John Russell

Leading Solution Providers

SC 18 Virtual Booth Video Tour

Advania @ SC18 AMD @ SC18
ASRock Rack @ SC18
DDN Storage @ SC18
HPE @ SC18
IBM @ SC18
Lenovo @ SC18 Mellanox Technologies @ SC18
NVIDIA @ SC18
One Stop Systems @ SC18
Oracle @ SC18 Panasas @ SC18
Supermicro @ SC18 SUSE @ SC18 TYAN @ SC18
Verne Global @ SC18

IBM Quantum Update: Q System One Launch, New Collaborators, and QC Center Plans

January 10, 2019

IBM made three significant quantum computing announcements at CES this week. One was introduction of IBM Q System One; it’s really the integration of IBM’s Read more…

By John Russell

IBM Bets $2B Seeking 1000X AI Hardware Performance Boost

February 7, 2019

For now, AI systems are mostly machine learning-based and “narrow” – powerful as they are by today's standards, they're limited to performing a few, narro Read more…

By Doug Black

The Deep500 – Researchers Tackle an HPC Benchmark for Deep Learning

January 7, 2019

How do you know if an HPC system, particularly a larger-scale system, is well-suited for deep learning workloads? Today, that’s not an easy question to answer Read more…

By John Russell

HPC Reflections and (Mostly Hopeful) Predictions

December 19, 2018

So much ‘spaghetti’ gets tossed on walls by the technology community (vendors and researchers) to see what sticks that it is often difficult to peer through Read more…

By John Russell

Arm Unveils Neoverse N1 Platform with up to 128-Cores

February 20, 2019

Following on its Neoverse roadmap announcement last October, Arm today revealed its next-gen Neoverse microarchitecture with compute and throughput-optimized si Read more…

By Tiffany Trader

Move Over Lustre & Spectrum Scale – Here Comes BeeGFS?

November 26, 2018

Is BeeGFS – the parallel file system with European roots – on a path to compete with Lustre and Spectrum Scale worldwide in HPC environments? Frank Herold Read more…

By John Russell

France to Deploy AI-Focused Supercomputer: Jean Zay

January 22, 2019

HPE announced today that it won the contract to build a supercomputer that will drive France’s AI and HPC efforts. The computer will be part of GENCI, the Fre Read more…

By Tiffany Trader

Microsoft to Buy Mellanox?

December 20, 2018

Networking equipment powerhouse Mellanox could be an acquisition target by Microsoft, according to a published report in an Israeli financial publication. Microsoft has reportedly gone so far as to engage Goldman Sachs to handle negotiations with Mellanox. Read more…

By Doug Black

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This