Profile of a Data Science Pioneer

By Karen Green, RENCI

June 28, 2016

As he approaches retirement, Reagan Moore reflects on SRB, iRODS, and the ongoing challenge of helping scientists manage their data.

In 1994, Reagan Moore managed the production computing systems at the San Diego Supercomputer Center (SDSC), a job that entailed running and maintaining huge Cray computing systems as well as networking, archival storage, security, job scheduling, and visualization systems.

At the time, research was evolving from analyses done by individuals on single computers into a collaborative activity using distributed, interconnected and heterogeneous resources. With those changes came challenges. As Moore recalls, the software needed to manage data and interactions in a widely distributed environment didn’t exist.

Moore-UGM-1600x
Reagan Moore addresses attendees at the iRODS User Group Meeting, held June 8 and 9 in Chapel Hill, NC.

“The systems at that time were things like AFS (Andrew File System), but it had major restrictions,” said Moore. AFS was implemented as modifications to the operating system kernel. To implement AFS for the National Science Foundation’s National Partnership for Advanced Computational Infrastructure (NPACI) program, which SDSC managed in the 1990s, required partitioning of user IDs to reserve IDs for each NPACI site.

“Every time you updated a site’s kernel you had to reinstall the AFS mods and preserve the user IDs,” Moore recalled. “With sites that used different operating systems, this became difficult.”

Moore saw the technical challenges as an opportunity for research in distributed data management. He secured funding from the Defense Advanced Research Projects Agency (DARPA), and with a team of talented visionaries and software developers created the Storage Resource Broker (SRB).

From SRB to iRODS

Over time, SRB evolved into iRODS, the integrated Rule Oriented Data System and Moore, now a professor in the School of Information and Library Science (SILS) at the University of North Carolina at Chapel Hill and a data scientist at UNC’s Renaissance Computing Institute (RENCI), stands on the brink of retirement. iRODS, the middleware platform that started as the SRB, now boasts more than 20,000 end users spanning six continents and manages more than 100 petabytes of data. The iRODS Consortium, established in 2014 to sustain the continued development of iRODS, now includes 17 members as well as four partner organizations that help with iRODS deployments and support services.

It’s a software and enabling science success story that developed over two decades and involved much hard work as well as an aggressive goal.

Moore-Ahalt-1600x
Reagan Moore, left, with RENCI Director Stan Ahalt after receiving recognition for long and successful career at the recent iRODS User Group meeting in Chapel Hill, NC.

“Reagan is a visionary,” said Arcot Rajasekar, who started working with Moore in the mid 1990s and made the move from SDSC to UNC-Chapel Hill with him in 2008. “He was talking about massive data analysis and data intensive computing a full 15 years before the phrase ‘big data’ was coined. These days the word ‘policy’ in data management, curation, sharing and analysis is becoming mainstream. But Reagan was talking about it a long while back.”

Rajasekar, also a professor in UNC’s SILS and a RENCI data scientist, was a key member of the original Data Intensive Computing Environments (DICE) research group, the team established to develop the SRB. Other members were system architect Mike Wan, principle developer Wayne Schroeder, and technical manager Chaitan Baru. Over 20 years, the DICE group landed 34 research grants.

“The way we approached the problem was through a very large number of collaborations instead of one large project,” Moore remembers. “The research communities provided the requirements; we took their requirements and translated them into generic data management infrastructure.”

Toward rule-oriented data management

Moore gives credit to Rajasekar for inventing the idea of rule-oriented data management. iRODS developed because SRB users wanted to enforce different constraints for different data collections while using a common infrastructure. Moore remembers working with the data group of the UK’s e-Science Program and learning they needed to guarantee files could not be deleted from one data collection. For another collection, they wanted the system administrator to be able to delete and replace bad data, and for a third, they required the collection owner to be able to delete and add data at will.

“What Rajasekar did was to extract the policy that controls the deletion operation from the software and put the rule in a rule base,” said Moore. “Then we could make rules appropriate to each collection.”

That was the birth of policy-based data management, which allows users to define their own policies and procedures for enforcing management decisions, automating administrative tasks, and validating assessment criteria. As Moore says, “There are three reasons people go to policy-based data management. One is that there are management decisions they need to enforce properly. Another is they are dealing with distributed data at multiple administrative domains on multiple types of software systems. A third is that the collection has grown so large it can no longer be managed at a single site.”

Tenacity and dedication to his craft are traits that Moore’s longtime colleagues know well. According to Baru, now senior advisor for data science in the National Science Foundation’s Computer and Information Science and Engineering (CISE) directorate, Moore sees his job as a mission.

“We used to say that he loved his work and travel so much that he used his airline mileage credits for even more business travel,” said Baru. “He was also the master of stretching the travel dollar. He introduced me to that specific parking lot down Pacific Coast Highway in San Diego that had the cheapest daily rate. To this day, I think of that as ‘Reagan’s lot.’”

The Future: Virtualized Data Flows and SDN

With retirement just around the corner, Moore, always humble and soft spoken, acknowledges his role in changing research from a cottage industry into an endeavor focused on distributed, often large-scale collaborative projects.

“We started out trying to virtualize properties of collections. Most of the world wanted to virtualize storage; we wanted to virtualize the data you were putting into the storage so you could manage collection properties independently of the choice of storage technology,” he said.

Moore-Coposky-1600x
Reagan Moore, left, is congratulated for his years of service by Jason Coposky, interim executive director of the iRODS Consortium, at the annual iRODS User Group Meeting in June. In the background are Helen Tibbo, a professional in the UNC School of Information and Library Science, and Chaitan Baru, senior advisor for data science in the NSF’s CISE directorate.

Next came virtualizing workflows that are executed on compute systems, a process that allows iRODS users to name their workflows, apply access controls, re-execute analyses, track provenance, and generally make it easier for someone else to reapply the same analysis on their own data—all essential capabilities for reproducible research. The next step forward in comprehensive data management, said Moore, is virtualizing data flows.

“I want to be able to describe how data moves across the network, what the sources are, what the destinations are, and apply operations on data in flight,” he said. “That’s what is happening now with the advent of software defined networking. They are putting policies into the network.”

In July 2014, it didn’t seem likely Moore would have the chance to see the future of policy-based data management or even enjoy his retirement. While on a business trip, he suffered massive heart failure. He was resuscitated three times and spent the next six months facing a major challenge: How to stay away from the work he loves and concentrate on rest and recuperation.

“If I were a cat, I’d be on my fourth life, so now seems to be a good time to retire,” he said. Not surprisingly, he has a longstanding hobby to keep him busy. Moore started doing his family genealogy 26 years ago and decided he needed to derive the properties of a complete genealogy in order to know when the project was complete.

“I built a 252,000 person research genealogy, wrote a graph database so I could analyze it, and derived the properties that define when a genealogy is complete. Now I have to start marketing it so other people can take advantage of the results.”

Meanwhile the praises for his contributions to science keep coming in.

“Professor Moore is a visionary pioneer in defining and creating distributed digital library infrastructure,” said Gary Marchionini, Dean and Cary C. Boshamer Professor at the UNC’’s SILS. “He is internationally recognized for his work that makes it possible for data scientists and archivists to instantiate data management policies in code that automates preservation activities. The information science community has been strongly influenced by his work over the past quarter century.”

Added Robert Chadduck of the NSF’s Division of Advanced Cyberinfrastructure, “While I continue to value and be enriched by Reagan’s too-many-to-count contributions to technologies and to scientific advances…I also value his shared contributions to understanding the history and perpetuity of all of us as people as documented in his life contributions to the genealogical record embodying his family.”

And finally, from Wayne Schroeder, the software engineer who worked with Moore in the original DICE group:

“I enjoyed working for Reagan. I liked his fairness, his no-nonsense approach, his can-do attitude, and of course his brilliant mind. He set up an environment where we were free to creatively design and implement software that was both research itself and of practical use to scientific and archival communities.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

InfiniBand Still Tops in Supercomputing

July 19, 2018

In the competitive global HPC landscape, system and processor vendors, nations and end user sites certainly get a lot of attention--deservedly so--but more than ever, the network plays a crucial role. While fast, perform Read more…

By Tiffany Trader

HPC for Life: Genomics, Brain Research, and Beyond

July 19, 2018

During the past few decades, the life sciences have witnessed one landmark discovery after another with the aid of HPC, paving the way toward a new era of personalized treatments based on an individual’s genetic makeup Read more…

By Warren Froelich

WCRP’s New Strategic Plan for Climate Research Highlights the Importance of HPC

July 19, 2018

As climate modeling increasingly leverages exascale computing and researchers warn of an impending computing gap in climate research, the World Climate Research Programme (WCRP) is developing its new Strategic Plan – and high-performance computing is slated to play a critical role. Read more…

By Oliver Peckham

HPE Extreme Performance Solutions

Introducing the First Integrated System Management Software for HPC Clusters from HPE

How do you manage your complex, growing cluster environments? Answer that big challenge with the new HPC cluster management solution: HPE Performance Cluster Manager. Read more…

IBM Accelerated Insights

Are Your Software Licenses Impeding Your Productivity?

In my previous article, Improving chip yield rates with cognitive manufacturing, I highlighted the costs associated with semiconductor manufacturing, and how cognitive methods can yield benefits in both design and manufacture.  Read more…

U.S. Exascale Computing Project Releases Software Technology Progress Report

July 19, 2018

As is often noted the race to exascale computing isn’t just about hardware. This week the U.S. Exascale Computing Project (ECP) released its latest Software Technology (ST) Capability Assessment Report detailing progress so far. Read more…

By John Russell

InfiniBand Still Tops in Supercomputing

July 19, 2018

In the competitive global HPC landscape, system and processor vendors, nations and end user sites certainly get a lot of attention--deservedly so--but more than Read more…

By Tiffany Trader

HPC for Life: Genomics, Brain Research, and Beyond

July 19, 2018

During the past few decades, the life sciences have witnessed one landmark discovery after another with the aid of HPC, paving the way toward a new era of perso Read more…

By Warren Froelich

D-Wave Breaks New Ground in Quantum Simulation

July 16, 2018

Last Friday D-Wave scientists and colleagues published work in Science which they say represents the first fulfillment of Richard Feynman’s 1982 notion that Read more…

By John Russell

AI Thought Leaders on Capitol Hill

July 14, 2018

On Thursday, July 12, the House Committee on Science, Space, and Technology heard from four academic and industry leaders – representatives from Berkeley Lab, Argonne Lab, GE Global Research and Carnegie Mellon University – on the opportunities springing from the intersection of machine learning and advanced-scale computing. Read more…

By Tiffany Trader

HPC Serves as a ‘Rosetta Stone’ for the Information Age

July 12, 2018

In an age defined and transformed by its data, several large-scale scientific instruments around the globe might be viewed as a ‘mother lode’ of precious data. With names seemingly created for a ‘techno-speak’ glossary, these interferometers, cyclotrons, sequencers, solenoids, satellite altimeters, and cryo-electron microscopes are churning out data in previously unthinkable and seemingly incomprehensible quantities -- billions, trillions and quadrillions of bits and bytes of electro-magnetic code. Read more…

By Warren Froelich

Tsinghua Powers Through ISC18 Field

July 10, 2018

Tsinghua University topped all other competitors at the ISC18 Student Cluster Competition with an overall score of 88.43 out of 100. This gives Tsinghua their s Read more…

By Dan Olds

HPE, EPFL Launch Blue Brain 5 Supercomputer

July 10, 2018

HPE and the Ecole Polytechnique Federale de Lausannne (EPFL) Blue Brain Project yesterday introduced Blue Brain 5, a new supercomputer built by HPE, which displ Read more…

By John Russell

Pumping New Life into HPC Clusters, the Case for Liquid Cooling

July 10, 2018

High Performance Computing (HPC) faces some daunting challenges in the coming years as traditional, industry-standard systems push the boundaries of data center Read more…

By Scott Tease

Leading Solution Providers

SC17 Booth Video Tours Playlist

Altair @ SC17

Altair

AMD @ SC17

AMD

ASRock Rack @ SC17

ASRock Rack

CEJN @ SC17

CEJN

DDN Storage @ SC17

DDN Storage

Huawei @ SC17

Huawei

IBM @ SC17

IBM

IBM Power Systems @ SC17

IBM Power Systems

Intel @ SC17

Intel

Lenovo @ SC17

Lenovo

Mellanox Technologies @ SC17

Mellanox Technologies

Microsoft @ SC17

Microsoft

Penguin Computing @ SC17

Penguin Computing

Pure Storage @ SC17

Pure Storage

Supericro @ SC17

Supericro

Tyan @ SC17

Tyan

Univa @ SC17

Univa

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This