Profile of a Data Science Pioneer

By Karen Green, RENCI

June 28, 2016

As he approaches retirement, Reagan Moore reflects on SRB, iRODS, and the ongoing challenge of helping scientists manage their data.

In 1994, Reagan Moore managed the production computing systems at the San Diego Supercomputer Center (SDSC), a job that entailed running and maintaining huge Cray computing systems as well as networking, archival storage, security, job scheduling, and visualization systems.

At the time, research was evolving from analyses done by individuals on single computers into a collaborative activity using distributed, interconnected and heterogeneous resources. With those changes came challenges. As Moore recalls, the software needed to manage data and interactions in a widely distributed environment didn’t exist.

Moore-UGM-1600x
Reagan Moore addresses attendees at the iRODS User Group Meeting, held June 8 and 9 in Chapel Hill, NC.

“The systems at that time were things like AFS (Andrew File System), but it had major restrictions,” said Moore. AFS was implemented as modifications to the operating system kernel. To implement AFS for the National Science Foundation’s National Partnership for Advanced Computational Infrastructure (NPACI) program, which SDSC managed in the 1990s, required partitioning of user IDs to reserve IDs for each NPACI site.

“Every time you updated a site’s kernel you had to reinstall the AFS mods and preserve the user IDs,” Moore recalled. “With sites that used different operating systems, this became difficult.”

Moore saw the technical challenges as an opportunity for research in distributed data management. He secured funding from the Defense Advanced Research Projects Agency (DARPA), and with a team of talented visionaries and software developers created the Storage Resource Broker (SRB).

From SRB to iRODS

Over time, SRB evolved into iRODS, the integrated Rule Oriented Data System and Moore, now a professor in the School of Information and Library Science (SILS) at the University of North Carolina at Chapel Hill and a data scientist at UNC’s Renaissance Computing Institute (RENCI), stands on the brink of retirement. iRODS, the middleware platform that started as the SRB, now boasts more than 20,000 end users spanning six continents and manages more than 100 petabytes of data. The iRODS Consortium, established in 2014 to sustain the continued development of iRODS, now includes 17 members as well as four partner organizations that help with iRODS deployments and support services.

It’s a software and enabling science success story that developed over two decades and involved much hard work as well as an aggressive goal.

Moore-Ahalt-1600x
Reagan Moore, left, with RENCI Director Stan Ahalt after receiving recognition for long and successful career at the recent iRODS User Group meeting in Chapel Hill, NC.

“Reagan is a visionary,” said Arcot Rajasekar, who started working with Moore in the mid 1990s and made the move from SDSC to UNC-Chapel Hill with him in 2008. “He was talking about massive data analysis and data intensive computing a full 15 years before the phrase ‘big data’ was coined. These days the word ‘policy’ in data management, curation, sharing and analysis is becoming mainstream. But Reagan was talking about it a long while back.”

Rajasekar, also a professor in UNC’s SILS and a RENCI data scientist, was a key member of the original Data Intensive Computing Environments (DICE) research group, the team established to develop the SRB. Other members were system architect Mike Wan, principle developer Wayne Schroeder, and technical manager Chaitan Baru. Over 20 years, the DICE group landed 34 research grants.

“The way we approached the problem was through a very large number of collaborations instead of one large project,” Moore remembers. “The research communities provided the requirements; we took their requirements and translated them into generic data management infrastructure.”

Toward rule-oriented data management

Moore gives credit to Rajasekar for inventing the idea of rule-oriented data management. iRODS developed because SRB users wanted to enforce different constraints for different data collections while using a common infrastructure. Moore remembers working with the data group of the UK’s e-Science Program and learning they needed to guarantee files could not be deleted from one data collection. For another collection, they wanted the system administrator to be able to delete and replace bad data, and for a third, they required the collection owner to be able to delete and add data at will.

“What Rajasekar did was to extract the policy that controls the deletion operation from the software and put the rule in a rule base,” said Moore. “Then we could make rules appropriate to each collection.”

That was the birth of policy-based data management, which allows users to define their own policies and procedures for enforcing management decisions, automating administrative tasks, and validating assessment criteria. As Moore says, “There are three reasons people go to policy-based data management. One is that there are management decisions they need to enforce properly. Another is they are dealing with distributed data at multiple administrative domains on multiple types of software systems. A third is that the collection has grown so large it can no longer be managed at a single site.”

Tenacity and dedication to his craft are traits that Moore’s longtime colleagues know well. According to Baru, now senior advisor for data science in the National Science Foundation’s Computer and Information Science and Engineering (CISE) directorate, Moore sees his job as a mission.

“We used to say that he loved his work and travel so much that he used his airline mileage credits for even more business travel,” said Baru. “He was also the master of stretching the travel dollar. He introduced me to that specific parking lot down Pacific Coast Highway in San Diego that had the cheapest daily rate. To this day, I think of that as ‘Reagan’s lot.’”

The Future: Virtualized Data Flows and SDN

With retirement just around the corner, Moore, always humble and soft spoken, acknowledges his role in changing research from a cottage industry into an endeavor focused on distributed, often large-scale collaborative projects.

“We started out trying to virtualize properties of collections. Most of the world wanted to virtualize storage; we wanted to virtualize the data you were putting into the storage so you could manage collection properties independently of the choice of storage technology,” he said.

Moore-Coposky-1600x
Reagan Moore, left, is congratulated for his years of service by Jason Coposky, interim executive director of the iRODS Consortium, at the annual iRODS User Group Meeting in June. In the background are Helen Tibbo, a professional in the UNC School of Information and Library Science, and Chaitan Baru, senior advisor for data science in the NSF’s CISE directorate.

Next came virtualizing workflows that are executed on compute systems, a process that allows iRODS users to name their workflows, apply access controls, re-execute analyses, track provenance, and generally make it easier for someone else to reapply the same analysis on their own data—all essential capabilities for reproducible research. The next step forward in comprehensive data management, said Moore, is virtualizing data flows.

“I want to be able to describe how data moves across the network, what the sources are, what the destinations are, and apply operations on data in flight,” he said. “That’s what is happening now with the advent of software defined networking. They are putting policies into the network.”

In July 2014, it didn’t seem likely Moore would have the chance to see the future of policy-based data management or even enjoy his retirement. While on a business trip, he suffered massive heart failure. He was resuscitated three times and spent the next six months facing a major challenge: How to stay away from the work he loves and concentrate on rest and recuperation.

“If I were a cat, I’d be on my fourth life, so now seems to be a good time to retire,” he said. Not surprisingly, he has a longstanding hobby to keep him busy. Moore started doing his family genealogy 26 years ago and decided he needed to derive the properties of a complete genealogy in order to know when the project was complete.

“I built a 252,000 person research genealogy, wrote a graph database so I could analyze it, and derived the properties that define when a genealogy is complete. Now I have to start marketing it so other people can take advantage of the results.”

Meanwhile the praises for his contributions to science keep coming in.

“Professor Moore is a visionary pioneer in defining and creating distributed digital library infrastructure,” said Gary Marchionini, Dean and Cary C. Boshamer Professor at the UNC’’s SILS. “He is internationally recognized for his work that makes it possible for data scientists and archivists to instantiate data management policies in code that automates preservation activities. The information science community has been strongly influenced by his work over the past quarter century.”

Added Robert Chadduck of the NSF’s Division of Advanced Cyberinfrastructure, “While I continue to value and be enriched by Reagan’s too-many-to-count contributions to technologies and to scientific advances…I also value his shared contributions to understanding the history and perpetuity of all of us as people as documented in his life contributions to the genealogical record embodying his family.”

And finally, from Wayne Schroeder, the software engineer who worked with Moore in the original DICE group:

“I enjoyed working for Reagan. I liked his fairness, his no-nonsense approach, his can-do attitude, and of course his brilliant mind. He set up an environment where we were free to creatively design and implement software that was both research itself and of practical use to scientific and archival communities.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Intel Debuts Pohoiki Beach, Its 8M Neuron Neuromorphic Development System

July 17, 2019

Neuromorphic computing has received less fanfare of late than quantum computing whose mystery has captured public attention and which seems to have generated more efforts (academic, government, and commercial) but whose Read more…

By John Russell

Goonhilly Unveils New Immersion-Cooled Platform, Doubles Down on Sustainability Mission

July 16, 2019

Goonhilly Earth Station has opened its new datacenter – an enhancement to its existing tier 3 facility – in Cornwall, England, touting an ambitious commitment to holistic sustainability as well as launching a managed Read more…

By Oliver Peckham

New CMU AI Poker Bot – Pluribus – Humbles the Pros Again

July 15, 2019

Remember Libratus, the Carnegie Mellon University developed AI poker bot that’s been humbling poker professionals at Texas hold’em for a couple of years. Well, say hello to Pluribus, an upgraded bot, which has now be Read more…

By John Russell

HPE Extreme Performance Solutions

Bring the Combined Power of HPC and AI to Your Business Transformation

A growing number of commercial businesses are implementing HPC solutions to derive actionable business insights, to run higher performance applications and to gain a competitive advantage. Read more…

IBM Accelerated Insights

Smarter Technology Revs Up Red Bull Racing

In 21st century business, companies that effectively leverage their information resources – thrive. As it turns out, the same is true in Formula One racing. Read more…

ISC19 Cluster Competition: Application Results, Finally!

July 15, 2019

Our exhaustive coverage of the ISC19 Student Cluster Competition continues as we discuss the application scores below. While the scores were typically high, some of the apps, like SWIFT and OpenFOAM, really pushed the st Read more…

By Dan Olds

Intel Debuts Pohoiki Beach, Its 8M Neuron Neuromorphic Development System

July 17, 2019

Neuromorphic computing has received less fanfare of late than quantum computing whose mystery has captured public attention and which seems to have generated mo Read more…

By John Russell

Goonhilly Unveils New Immersion-Cooled Platform, Doubles Down on Sustainability Mission

July 16, 2019

Goonhilly Earth Station has opened its new datacenter – an enhancement to its existing tier 3 facility – in Cornwall, England, touting an ambitious commitme Read more…

By Oliver Peckham

New CMU AI Poker Bot – Pluribus – Humbles the Pros Again

July 15, 2019

Remember Libratus, the Carnegie Mellon University developed AI poker bot that’s been humbling poker professionals at Texas hold’em for a couple of years. We Read more…

By John Russell

ISC19 Cluster Competition: Application Results, Finally!

July 15, 2019

Our exhaustive coverage of the ISC19 Student Cluster Competition continues as we discuss the application scores below. While the scores were typically high, som Read more…

By Dan Olds

Nvidia Expands DGX-Ready AI Program to 19 Countries

July 11, 2019

Nvidia’s DGX-Ready Data Center Program, announced in January and designed to provide colo and public cloud-like options to access the company’s GPU-powered Read more…

By Doug Black

Argonne Team Makes Record Globus File Transfer

July 10, 2019

A team of scientists at Argonne National Laboratory has broken a data transfer record by moving a staggering 2.9 petabytes of data for a research project.  The data – from three large cosmological simulations – was generated and stored on the Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF)... Read more…

By Oliver Peckham

Nvidia, Google Tie in Second MLPerf Training ‘At-Scale’ Round

July 10, 2019

Results for the second round of the AI benchmarking suite known as MLPerf were published today with Google Cloud and Nvidia each picking up three wins in the at Read more…

By Tiffany Trader

Applied Materials Embedding New Memory Technologies in Chips

July 9, 2019

Applied Materials, the $17 billion Santa Clara-based materials engineering company for the semiconductor industry, today announced manufacturing systems enablin Read more…

By Doug Black

High Performance (Potato) Chips

May 5, 2006

In this article, we focus on how Procter & Gamble is using high performance computing to create some common, everyday supermarket products. Tom Lange, a 27-year veteran of the company, tells us how P&G models products, processes and production systems for the betterment of consumer package goods. Read more…

By Michael Feldman

Cray, AMD to Extend DOE’s Exascale Frontier

May 7, 2019

Cray and AMD are coming back to Oak Ridge National Laboratory to partner on the world’s largest and most expensive supercomputer. The Department of Energy’s Read more…

By Tiffany Trader

Graphene Surprises Again, This Time for Quantum Computing

May 8, 2019

Graphene is fascinating stuff with promise for use in a seeming endless number of applications. This month researchers from the University of Vienna and Institu Read more…

By John Russell

AMD Verifies Its Largest 7nm Chip Design in Ten Hours

June 5, 2019

AMD announced last week that its engineers had successfully executed the first physical verification of its largest 7nm chip design – in just ten hours. The AMD Radeon Instinct Vega20 – which boasts 13.2 billion transistors – was tested using a TSMC-certified Calibre nmDRC software platform from Mentor. Read more…

By Oliver Peckham

TSMC and Samsung Moving to 5nm; Whither Moore’s Law?

June 12, 2019

With reports that Taiwan Semiconductor Manufacturing Co. (TMSC) and Samsung are moving quickly to 5nm manufacturing, it’s a good time to again ponder whither goes the venerable Moore’s law. Shrinking feature size has of course been the primary hallmark of achieving Moore’s law... Read more…

By John Russell

Deep Learning Competitors Stalk Nvidia

May 14, 2019

There is no shortage of processing architectures emerging to accelerate deep learning workloads, with two more options emerging this week to challenge GPU leader Nvidia. First, Intel researchers claimed a new deep learning record for image classification on the ResNet-50 convolutional neural network. Separately, Israeli AI chip startup Hailo.ai... Read more…

By George Leopold

Nvidia Embraces Arm, Declares Intent to Accelerate All CPU Architectures

June 17, 2019

As the Top500 list was being announced at ISC in Frankfurt today with an upgraded petascale Arm supercomputer in the top third of the list, Nvidia announced its Read more…

By Tiffany Trader

Top500 Purely Petaflops; US Maintains Performance Lead

June 17, 2019

With the kick-off of the International Supercomputing Conference (ISC) in Frankfurt this morning, the 53rd Top500 list made its debut, and this one's for petafl Read more…

By Tiffany Trader

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Intel Launches Cascade Lake Xeons with Up to 56 Cores

April 2, 2019

At Intel's Data-Centric Innovation Day in San Francisco (April 2), the company unveiled its second-generation Xeon Scalable (Cascade Lake) family and debuted it Read more…

By Tiffany Trader

Cray – and the Cray Brand – to Be Positioned at Tip of HPE’s HPC Spear

May 22, 2019

More so than with most acquisitions of this kind, HPE’s purchase of Cray for $1.3 billion, announced last week, seems to have elements of that overused, often Read more…

By Doug Black and Tiffany Trader

A Behind-the-Scenes Look at the Hardware That Powered the Black Hole Image

June 24, 2019

Two months ago, the first-ever image of a black hole took the internet by storm. A team of scientists took years to produce and verify the striking image – an Read more…

By Oliver Peckham

Announcing four new HPC capabilities in Google Cloud Platform

April 15, 2019

When you’re running compute-bound or memory-bound applications for high performance computing or large, data-dependent machine learning training workloads on Read more…

By Wyatt Gorman, HPC Specialist, Google Cloud; Brad Calder, VP of Engineering, Google Cloud; Bart Sano, VP of Platforms, Google Cloud

It’s Official: Aurora on Track to Be First US Exascale Computer in 2021

March 18, 2019

The U.S. Department of Energy along with Intel and Cray confirmed today that an Intel/Cray supercomputer, "Aurora," capable of sustained performance of one exaf Read more…

By Tiffany Trader

Why Nvidia Bought Mellanox: ‘Future Datacenters Will Be…Like High Performance Computers’

March 14, 2019

“Future datacenters of all kinds will be built like high performance computers,” said Nvidia CEO Jensen Huang during a phone briefing on Monday after Nvidia revealed scooping up the high performance networking company Mellanox for $6.9 billion. Read more…

By Tiffany Trader

Chinese Company Sugon Placed on US ‘Entity List’ After Strong Showing at International Supercomputing Conference

June 26, 2019

After more than a decade of advancing its supercomputing prowess, operating the world’s most powerful supercomputer from June 2013 to June 2018, China is keep Read more…

By Tiffany Trader

In Wake of Nvidia-Mellanox: Xilinx to Acquire Solarflare

April 25, 2019

With echoes of Nvidia’s recent acquisition of Mellanox, FPGA maker Xilinx has announced a definitive agreement to acquire Solarflare Communications, provider Read more…

By Doug Black

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This