Visit additional Tabor Communication Publications
December 10, 2008
Federation capability supports nuanced data sharing for collaborative research
Dec. 10 -- The Data-Intensive Cyber Environments (DICE) group has announced the release of version 2.0 of iRODS, the Integrated Rule-Oriented Data System. The new version of the award-winning software adds a number of important features, including federation of independent iRODS installations which lets them "talk" to each other, supporting large-scale collaboration by giving users seamless access to data distributed across different iRODS systems.
Core development of the open source iRODS data system is led by the Advanced Center for Data Intensive Cyber Environments at the Institute for Neural Computation at the University of California, San Diego and the National Center for Data Intensive Cyber Environments at the School of Information and Library Science at the University of North Carolina at Chapel Hill. Download of version 2.0, user information, and release notes are freely available as open source software from the iRODS wiki.
"A major new feature in iRODS 2.0 is the ability to federate two or more independent iRODS data grids," said Reagan Moore, director of the Data-Intensive Computing Environments group and professor in SILS at UNC. "Federation lets communities maintain independent iRODS installations, while choosing to share some or all of their data under explicit management policies." iRODS does this by mapping the policies to computer-actionable rules that control all remote operations as well as data exchange between separate iRODS systems or Zones. Additional federation iRODS Rules are applied on top of the local Rules at each iRODS data grid.
There will be an iRODS workshop Feb. 2-5, 2009, that will bring together both users new to iRODS with others already using iRODS in a range of applications. Online registration is free and open through the Jan. 10, 2009, deadline, more information at http://diceresearch.org.
iRODS moves beyond the single-site repository model, which is based on the traditional hard copy paradigm, to implement a new paradigm that harnesses the full power of cyberinfrastructure and the virtual world to free digital data collections from the constraints of space -- whether physical, administrative, or disciplinary -- and time, through long-term preservation. This approach gives users an adaptable and extensible system with the integrated capabilities required for the full range of digital data management applications, from highly customizable sharing in data grids, to publication of data in digital libraries, sensor stream aggregation for real-time data systems, and long term preservation of digital data for use in standard reference collections.
New features in iRODS version 2.0 include:
"The iRODS 2.0 release contains many new features and improvements, large and small, based on user requests and our years of experience with iRODS and the SRB Storage Resource Broker," said Senior Software Developer and Designer Wayne Schroeder. "In the aggregate these make iRODS a highly capable system that equips users to solve a wide variety of data management problems by making use of various subsets of the features."
iRODS supports seamless growth from small installations to the largest scales. At UCSD alone iRODS and the previous Storage Resource Broker (SRB) technology are already managing 1.2 petabytes of data and two hundred million files for 5,000 users, and growing.
"We also understand that performance is a very important part of iRODS usability, especially at the larger scales, and in addition to the new federation capability this release also contains important performance enhancements," said iRODS Software Architect Mike Wan. "We've added an efficient mechanism for transferring large files, a bundling mechanism for transferring a large number of small files, and a caching enhancement."
Other features of interest include the addition of a number of new micro-services; improvements in iRODS use of Grid Security Infrastructure (GSI), allowing regular iRODS users to authenticate with GSI; performance improvements in the iRODS FUSE user level file system capability; support for Rule-oriented Data Access to Oracle databases; a new data transfer mode for larger files, RBUDP (Reliable Blast UDP), in addition to the existing sequential (single TCP stream) and parallel (multi TCP streams) modes; and improvements to the iCAT iRODS Metadata Catalog, including rollback after errors to allow execution of subsequent SQL functions in PostgreSQL. iRODS 2.0 also includes improvements in testing and installation scripts.
iRODS version 2.0 is supported on Linux, Solaris, Macintosh, and AIX platforms. The iRODS commands are also supported on the Windows operating system, and there is a Windows GUI client. The iRODS Metadata Catalog (iCAT) will run on both the open source PostgreSQL database (which can be installed as part of the iRODS install package) and Oracle. And iRODS is quick and easy to install -- just answer a few questions and the install package automatically sets up the system for you.
iRODS was first released in late 2006. Version 1.0 of the software was released under a BSD open source license in January 2008. As a second generation data grid development effort, iRODS leverages more than 10 years of user-driven experience with the Storage Resource Broker (SRB). With a grant-funded core developer team, the iRODS system is growing rapidly as collaborating projects contribute code to the open source software.
The iRODS team is working with partners in a number of projects to apply the technology, including the Transcontinental Persistent Archives Prototype (TPAP) for the National Archives and Records Administration (NARA), the Ocean Observatories Initiative (OOI), the NSF Temporal Dynamics of Learning Center (TDLC), the NHPRC-supported Distributed Custodial Archival Preservation Environments (DCAPE) project, the French National Library, and many others.
Collaborators in the iRODS project include the French Institut National de Physique Nucléaire et de Physique des Particules (CC-IN2P3), the Sustaining Heritage Access through Multivalent ArchiviNg (SHAMAN) project, the UK e-Science Data Management Group at Rutherford Appleton Laboratory, and the High Energy Accelerator Research Organization, KEK, in Japan.
In addition to Moore and Rajasekar, the DICE group includes software architect Mike Wan and senior developer Wayne Schroeder, along with Sheau-Yen Chen, Lucas Gilbert, Chien-Yi Hou, Antoine de Torcy, Paul Tooby, and Bing Zhu. SILS professor Richard Marciano leads the DICE Sustainable Archives and Library Technologies (SALT) lab at UNC.
iRODS is funded by the National Archives and Records Administration and the National Science Foundation.
Source: Data-Intensive Cyber Environments (DICE) Group
Contributing commentator, Andrew Jones, offers a break in the news cycle with an assessment of what the national "size matters" contest means for the U.S. and other nations...
Today at the International Supercomputing Conference in Leipzing, Germany, Jack Dongarra presented on a proposed benchmark that could carry a bit more weight than its older Linpack companion. The high performance conjugate gradient (HPCG) concept takes into account new architectures for new applications, while shedding the floating point....
Not content to let the Tianhe-2 announcement ride alone, Intel rolled out a series of announcements around its Knights Corner and Xeon Phi products--all of which are aimed at adding some options and variety for a wider base of potential users across the HPC spectrum. Today at the International Supercomputing Conference, the company's Raj....
Jun 19, 2013 |
Supercomputer architectures have evolved considerably over the last 20 years, particularly in the number of processors that are linked together. One aspect of HPC architecture that hasn't changed is the MPI programming model.
Jun 18, 2013 |
The world's largest supercomputers, like Tianhe-2, are great at traditional, compute-intensive HPC workloads, such as simulating atomic decay or modeling tornados. But data-intensive applications--such as mining big data sets for connections--is a different sort of workload, and runs best on a different sort of computer.
Jun 18, 2013 |
Researchers are finding innovative uses for Gordon, the 285 teraflop supercomputer housed at the San Diego Supercomputer Center (SDSC) that has a unique Flash-based storage system. Since going online, researchers have put the incredibly fast I/O to use on a wide variety of workloads, ranging from chemistry to political science.
Jun 17, 2013 |
The advent of low-power mobile processors and cloud delivery models is changing the economics of computing. But just as an economy car is good at different things than a full size truck, an HPC workload still has certain computing demands that neither the fastest smartphone nor the most elastic cloud cluster can fulfill.
Jun 14, 2013 |
For all the progress we've made in IT over the last 50 years, there's one area of life that has steadfastly eluded the grasp of computers: understanding human language. Now, researchers at the Texas Advanced Computing Center (TACC) are utilizing a Hadoop cluster on its Longhorn supercomputer to move the state of the art of language processing a little bit further.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?
Join our webinar to learn how IT managers can migrate to a more resilient, flexible and scalable solution that grows with the data center. Mellanox VMS is future-proof, efficient and brings significant CAPEX and OPEX savings. The VMS is available today.