In the first part of the article, we have mainly focused on the major results of our study on large community grid initiatives: the lessons learned and the recommendations for those who want to design, build and run similar grid infrastructures. Here we present additional general information about these six grid initiatives: The ChinaGrid, D-Grid, EGEE, NAREGI, TeraGrid, and the UK e-Science Initiative. This article is a summary of the report which can be downloaded via the weblink provided at the end of this article.
In 2002, the Chinese Ministry of Education (MoE) launched the largest grid project in China, called the ChinaGrid, aiming at providing the nationwide grid computing platform and services for research and education among 100 key universities in China. The vision for the ChinaGrid project is to deploy the largest, most advanced and most practical grid computing project in China. The first phase of ChinaGrid was 2003-2005, with 12 key universities involved (20 at the end of 2004). At that time, the systems in the grid had a performance of about 6 Tflops, with 60 TB of storage.
The underlying infrastructure for ChinaGrid is CERNET, the China Education and Research Network, which began operation in 1994, covering more than 800 universities, colleges and institutes in China. Currently, it is the second largest nationwide network in China. The bandwidth of the CERNET backbone is (currently) 2.5 Gbps, connecting 7 cities, called local network centers. The bandwidth of the CERNET local backbone is 155Mbps.
The focus of the first stage of ChinaGrid is on the compute grid platform and on applications (e-science). These applications are from a variety of scientific disciplines, from life science to computational physics. The second stage of ChinaGrid project is from 2007 to 20010, covering 30 to 40 key universities in China. The focus will extend from computational grid applications to information service grid (e-information), including applications for a distance learning grid, digital Olympic grid, etc. The third stage will be from 2011 to 2015, extending the coverage of the ChinaGrid project to all the 100 key universities. The focus of the third stage grid application will be even more diverse, including instrument sharing (e-instrument).
The underlying common grid computing middleware platform for the ChinaGrid project is called ChinaGrid Supporting Platform (CGSP), to support all of the above mentioned three stages: e-science, e-information, and e-instrument. CGSP integrates all kinds of resources in education and research environments, making the heterogeneous and dynamic nature of the resources transparent to the users, and providing high performance, high reliable, and secure, convenient and transparent grid services to the scientific computing and engineering research communities. CGSP provides both a ChinaGrid service portal, and a set of development environments for deploying various grid applications.
The current version, CGSP 2.0, is based on Globus Toolkit 4.0, and is WSRF  and OGSA  compatible. The previous version, CGSP 1.0, has been released in October 2004: with the 5 main building blocks: Grid Portal, Grid Development Toolkits, Information Service, Grid Management (consisting of Service container, Data manager, Job manager, and Domain manager), and Grid security.
EGEE and EGEE-II, Enabling Grids for E-sciencE
EGEE-II is the second phase of a 4-year program. The aim of first-phase EGEE was to build on recent advances in Grid technology and develop a service Grid infrastructure, providing researchers in academia and industry with access to major computing resources, independent of their geographic location. The EGEE project also focuses on attracting a wide range of new users to the Grid. The project concentrates primarily on three core areas:
- The first area is to build a consistent, robust and secure Grid network that will attract and incorporate additional computing resources on demand.
- The second area is to continuously improve and maintain the middleware in order to deliver reliable services to users.
- The third area is to attract new users from industry as well as science and ensure they receive the high standard of training and support they need.
The EGEE Grid is built on the EU Research Network GÉANT and exploits Grid expertise generated by many EU, national and international Grid projects to date. In its first phase, EGEE comprised over 70 contractors and over 30 non-contracting participants, and was divided into 12 partner federations, covering a wide range of both scientific and industrial applications. With funding of over 30 million Euro from the European Commission (EC), the project was one of the largest of its kind. The initial focus of the project was on two application areas, namely High Energy Physics (HEP) and Biomedicine. The rationale behind this was that these fields were already grid-aware and would serve well as pilot areas for the development of the various EGEE Grid services.
The first phase provided the basis for assessing subsequent objectives and funding needs, and gave way to a second phase which started on 1 April 2006. This project saw its consortium grow to over 90 contractors and a further 48 non-contracting participants in 32 countries, and its funding levels to over 36 million Euro from the EC. It maintains its organizational structure into geographical federations. The EGEE Grid consists of over 20,000 CPU, in addition to about 10 Petabytes (10 million Gigabytes) of storage, and maintains on average 20,000 concurrent jobs. More than a two thousand scientists from all over the world submit over 17 million jobs during 2006, a 3-fold increase compared to 2005.
At present there are more than 20 applications from 9 domains on the EGEE Grid infrastructure: Astrophysics, Computational Chemistry, Earth Sciences, Financial Simulation, Fusion, Geophysics, High Energy Physics, Life Sciences, and Multimedia. In addition, there are several applications from the industrial sector running on the EGEE Grid, such as applications from geophysics and the plastics industry.
EGEE project now provides a stable and reliable Grid infrastructure with its own middleware stack, gLite. EGEE began work using the LCG-2 middleware, provided by the LCG project (which is itself based on the middleware from EU DataGrid, EGEE's predecessor). In parallel it produced the gLite middleware, using reengineered components from a number of sources to produce lightweight middleware that provides a full range of basic Grid services, part of which is based on Globus version 2.4. As of September 2006, gLite is at version 3.0, and comprises some 220 packages arranged in 34 logical deployment modules.
The German D-Grid Initiative
In 2003, the German scientific community publishing a strategic paper, examining the status and consequences of grid technology on scientific research in Germany and recommending a long-term strategic grid research and development initiative. This resulted in the German e-Science Initiative founded by the German Ministry for Research and Education (BMBF). In 2004, BMBF presented the vision of a new quality of digital scientific infrastructure which will enables our globally connected scientists to collaborate on an international basis; exchange information, documents and publications about their research work in real time; and guarantee efficiency and stability even with huge amounts of data from measurements, laboratories and computational results.
The e-Science Initiative and the first phase of D-Grid started in September 2005. BMBF is funding over 100 German research organizations with 100 Million Euro over the next 5 years. For the first 3-year phase of D-Grid, financial support is approximately 25 Million Euro. The goal is to design, build and operate a network of distributed, integrated and virtualized high-performance resources and related services to enable the processing of large amounts of scientific data and information. The Ministry for Research and Education is funding the assembling, set-up and operation of D-Grid in several overlapping stages:
- D-Grid, 2005-2008: IT services for scientists. The global services infrastructure will be tested and used by Community Grids in the areas of high-energy physics, astrophysics, medicine and life sciences, earth sciences (e.g. climate), engineering sciences, energy, and scientific libraries.
- D-Grid 2, 2007-2009: IT services for scientists, industry, and business, including new applications in chemistry, biology, drug design, economy, visualization of data, and so on. Grid services providers will offer basic IT services to these users.
- D-Grid 3, 2008- 2010: it is intended to extend the grid infrastructure with an SLA and a knowledge management layer, and adding several virtual competence centres, encourage global service-oriented architectures in the industry, and use this grid infrastructure for the benefit of our whole society, among others.
D-Grid consists of the DGI Infrastructure project and (currently) of the following seven Community Grid projects: AstroGrid-D (Astronomy), C3-Grid (Earth Sciences), HEP Grid (High-Energy Physics), InGrid (Engineering), MediGrid (Medical Research), TextGrid (Scientific Libraries, Humanities), and WISENT (Knowledge Network Energy Meteorology).
Short-term goal of D-Grid is to build a core grid infrastructure for the German scientific community, until the end of 2006. Then, first test and benchmark computations will be performed by the Community Grids, to provide technology feedback to DGI. Then, climate researchers of the C3-Grid, for example, will be able to predict climate changes faster and more accurately than before, to inform governments about potential environmental measures. Similarly, astrophysicists will be able to access and use radio-telescopes and supercomputers remotely via the grid, which they wouldn't be able to access otherwise, resulting in novel quality of research and the resulting data.
The D-Grid Infrastructure DGI is providing a set of basic grid middleware services offered to the Community Grids. So far, a core-grid infrastructure has been built for the community grids for testing, experimentation, and production. High-level services will be developed which guarantee security, reliable data access and transfer, and fair-use policies for computing resources. This core-grid infrastructure will then be further developed into a reliable, generic, long-term production platform which can be enhanced in a scalable and seamless way, such as the addition of new resources and services, distributed applications and data, and automated “on demand” provisioning of a support infrastructure.
DGI offers several grid middleware packages (gLite, Globus und Unicore) and data management systems (SRB, dCache und OGSA-DAI). A support infrastructure helps new communities and Virtual Organizations (VOs) with the installation and integration of new grid resources via a central Information Portal. In addition, software tools for managing VOs are offered, based on VOMS and Shibboleth. Monitoring und Accounting prototypes for distributed grid resources exist, as well as an early concept for billing in D-Grid. DGI offers consulting for new Grid Communities in all technical aspects of network and security, e.g. firewalls in grid environments, alternative network protocols, and CERT (Computer Emergency Response Team). DGI partners operate “Registration Authorities” to support simple application of internationally accepted Grid Certificates from DFN (German Research Network organization) and GridKA (Grid Project Karlsruhe). DGI partners support new members to build their own „Registration Authorities”. The Portal Framework Gridsphere serves as the user interface. Within the D-Grid environment SRM/dCache takes care of the administration of large amount of scientific data.
The Japanese NAREGI Project
The National Research Grid Initiative NAREGI was created in 2003 by the Ministry of Education, Culture, Sports, Science and Technology (MEXT). From 2006, under the “Science Grid NAREGI” Program of the “Development and Application of Advanced High-performance Supercomputer project ” being promoted by MEXT, research and development is continuing to build on current results, while expanding in scope to include application environments for next-generation, peta-scale supercomputers.
The main objective of NAREGI is to research and develop grid middleware according to global standards to a level that can support practical operation, to implement a large-scale computing environment (the Science Grid) for widely-distributed, advanced research and education. NAREGI is carrying out R&D from two directions: through the grid middleware development at the National Institute of Informatics (NII), and through an applied experimental study using nano-applications, at the Institute for Molecular Science (IMS). These two organizations advance the project in cooperation with industry, universities and public research facilities. The National Institute for Informatics (NII) is promoting the construction of the Cyber Science Infrastructure (CSI), which is the base for next-generation academic research. A core technology of CSI is the science grid environment, and it will be made up of academic data networks like SuperSINET.
A large number of research bodies from academia and industry are participating in this program, with research and development of grid middleware centered at the National Institute of Informatics (NII), and empirical research into grid applications being promoted by the Institute for Molecular Science (IMS). Also, in order to promote use of grid technology in industry, the Industrial Committee for Super Computing Promotion gathers research topics from industry and promotes collaborative work between academic and industrial research bodies. The results of this research will support construction of the Cyber Science Infrastructure (CSI), which is the academic research base being promoted by NII, as well as construction of the peta-scale computing environment for scientific research. Through this, NAREGI will accelerate research and development in scientific fields, improve international cooperation, and strengthen competitiveness in an economically effective way.
The middleware being developed by NAREGI will present heterogeneous computation resources, including supercomputers and high-end servers connected by network, to users as a single, large, virtual computing resource. In order to build a global grid, the middleware is being developed according to the Science Grid environment standards specifications from the Open Grid Forum. The infrastructure provides a user-friendly environment to the user, who can then focus on his/her computational science research without concern for the scale of computing resources or environment required. High-throughput processing and meta-computing can be applied to large-scale analysis using the grid, allowing the supercomputers to be used to their maximum capabilities.
This environment allows multi-scale/multi-physics coupled simulations, which is becoming very important in computational sciences, in a heterogeneous environment. Resource allocation is suited to each application, so that coupled analysis can be done easily. Virtual Organizations (VOs), separate from the real organizations to which researchers and research bodies belong, can be formed dynamically on the Grid according to the needs of the research community.
In 2003, NAREGI developed a component technology based on UNICORE, and in 2004, released an alpha-version prototype of middleware based on UNICORE to test integrated middleware functions. In 2005, research and development was advanced on beta-version grid middleware, based on newly-established OGSA specifications, to align with global activity. This beta version was released as open-source software in May 2006, and included enhanced functions supporting virtual organizations. In 2007, NAREGI Version 1.0, based on this beta version, will be released. From 2008, the scope of research and development will be expanded to include application environments for next-generation supercomputers, and the results of this will be released as NAREGI Version 2.0 in 2010.
The UK e-Science Program
The UK e-Science program was proposed in November 2000 and launched in the following year. The total funding for the first phase was $240M with a sum of $30M allocated to a Core e-Science Program. This was an activity across all the UK's Research Councils to develop generic technology solutions and generic middleware to enable e-Science and to form the basis for new commercial e-business software. This $30M funding was enhanced by an allocation of a further $40M from the Department of Trade and Industry which was required to be matched by equivalent funding from industry. The Core e-Science Program, which is managed by the UK Engineering and Physical Science Research Council (EPSRC) on behalf of all the Research Councils, is therefore the generic part of e-Science activities within the UK and thus ensured a viable infrastructure and coordination of the national effort.
The first phase of the Core e-Science Program (2001 – 2004) was structured around six key elements: A National e-Science Center linked to a network of Regional e-Science Grid Centers, Generic Grid Middleware and Demonstrator Projects; Grid Computer Science based Research Projects; Support for e-Science Application Pilot Projects; Participation in International Grid Projects and Activities; and Establishment of a Grid Network Support Team.
To ensure that researchers developing e-Science applications are properly supported, especially in the initial stages, the Grid Support Center was established. The UK Grid Support Center (see local activities) supports all aspects of the deployment, operation and maintenance of grid middleware and distributed resource management for the UK grid test-beds. The Grid Network Team (GNT) works with application developers to help identify the network requirements and help map these on to existing technology. It also considers the long-term networking research issues required by the grid.
The second phase of the Core e-Science Program (2004 -2006) is based around six key activities: A National e-Science Center linked to a network of Regional e-Science Centers; Support activities for the UK e-Science Community; An Open Middleware Infrastructure Institute (OMII); A Digital Curation Center (DCC); New Exemplars for e-Science; Participation in International Grid Projects and Activities.
Of particular significance in the second phase are the OMII and DCC. The Open Middleware Infrastructure Institute (OMII) is an institute based at the University of Southampton, located in the School of Electronics and Computer Science. The vision for the OMII is to become the source for reliable, interoperable and open-source grid middleware, ensuring the continued success of grid-enabled e-Science in the UK.
The Digital Curation Center (DCC) supports UK institutions with the problems involved in storing, managing and preserving vast amount of digital data to ensure its enhancement and continuing long-term use. The purpose of this DCC is to provide a national focus for research into curation issues and to promote expertise and good practice, both nationally and internationally, for the management of all research outputs in digital format. The DCC is based at the University of Edinburgh.
In addition to the UK e-Science program, there have been UK initiatives in the social sciences and the arts and humanities: The National Centre for e-Social Science has begun on an ambitious programme of developing e-Social Science tools and evaluating their social implications. Further, there is now an Arts and Humanities e-Science Support Centre which is creating a community around the uses of e-science in, for example, history and linguistics.
As a result of this initiative the UK e-Science program has enjoyed a number of strengths including:
- An Advanced National Grid Infrastructure, which was built specifically for use with grid computing. The National Grid Service (NGS) is one of the facilities available to UK researchers which provides access to over 2000 processors, and over 36 TB of “data-grid” capacity.
- Availability of Funding: new research and industrially related funding from the UK government and different funding bodies. Over $500M have been invested in the e-Science program over the last five years. This has been followed by smaller-scale funding more recently for e-social science and e-research in arts and humanities.
- Industrial involvement: Over a 100 companies are involved in UK e-Science projects including IBM, Intel, Oracle, and Sun, and a vast number of other national and international industries in different domains ranging from finance to pharmacy.
- The UK has extended its e-science capability to include not only the sciences and engineering, but also social sciences and arts and humanities, which will provide benefits across the academic community.
- New research advances: Large scale multidisciplinary teams of scientist have worked together and made advances in a wide range of disciplines.
The US TeraGrid
TeraGrid is an open scientific discovery infrastructure combining leadership class resources at nine partner sites to create an integrated, persistent computational resource. Using high-performance network connections, the TeraGrid integrates high-performance computers, data resources and tools, and high-end experimental facilities around the country.
TeraGrid is coordinated through the Grid Infrastructure Group (GIG) at the University of Chicago, working in partnership with the Resource Provider sites: Indiana University, Oak Ridge National Laboratory, National Center for Supercomputing Applications, Pittsburgh Supercomputing Center, Purdue University, San Diego Supercomputer Center, Texas Advanced Computing Center, University of Chicago/Argonne National Laboratory, and the National Center for Atmospheric Research.
Terascale Initiatives 2000-2004: In response to the 1999 report by the PITAC President's Information Technology Advisory Committee, NSF embarked on a series of “Terascale” initiatives to acquire computers capable of trillions of operations per second (teraflops); disk-based storage systems with terabytes capacities; and GBps networks. In 2000, the $36 million Terascale Computing System award to PSC supported the deployment of a computer (named LeMieux) capable of 6 trillion operations per second. When LeMieux went online in 2001, it was the most powerful U.S. system committed to general academic research.
In 2001, NSF awarded $45 million to NCSA, SDSC, Argonne National Laboratory, and the Center for Advanced Computing Research (CACR) at California Institute of Technology, to establish a Distributed Terascale Facility (DTF). Aptly named the TeraGrid, this multi-year effort aimed to build and deploy the world's largest, fastest, most comprehensive, distributed infrastructure for general scientific research. The initial TeraGrid specifications included computers capable of performing 11.6 teraflops, disk-storage systems with capacities of more than 450 terabytes of data, visualization systems, data collections, integrated via grid middleware and linked through a 40-gigabits-per-second optical network.
In 2002, NSF made a $35 million Extensible Terascale Facility (ETF) award to expand the initial TeraGrid to include PSC and integrate PSC's LeMieux system. Resources in the ETF provide the national research community with more than 20 teraflops of computing power distributed among the five sites and nearly one petabyte of disk storage capacity.
In 2003, NSF made three Terascale Extensions awards totaling $10 million, to further expand the TeraGrid's capabilities. The new awards funded high-speed networking connections to link the TeraGrid with resources at Indiana and Purdue Universities, Oak Ridge National Laboratory, and the Texas Advanced Computing Center. Through these awards, the TeraGrid put neutron-scattering instruments, large data collections and other unique resources, as well as additional computing and visualization resources, within reach of the nation's research and education community.
In 2004, as a culmination of the DTF and ETF programs, the TeraGrid entered full production mode, providing coordinated, comprehensive services for general U.S. academic research.
The TeraGrid 2005-2010: In August 2005, NSF's newly created Office of Cyberinfrastructure extended support for the TeraGrid with a $150 million set of awards for operation, user support and enhancement of the TeraGrid facility. Using high-performance network connections, the TeraGrid now integrates high-performance computers, data resources and tools, and high-end experimental facilities around the country. As of early 2006, these integrated resources include more than 102 teraflops of computing capability and more than 15 petabytes of online and archival data storage with rapid access and retrieval over high-performance networks. Through the TeraGrid, researchers can access over 100 discipline-specific databases. With this combination of resources, the TeraGrid is the world's largest, most comprehensive distributed cyberinfrastructure for open scientific research.
This report has been funded by the Renaissance Computing Institute RENCI at the University of North Carolina in Chapel Hill. I want to thank all the people who have contributed to this report and who are listed in the report on http://www.renci.org/publications/reports.php.
About the Author:
Wolfgang Gentzsch is heading the German D-Grid Initiative. He is an adjunct professor at Duke and a visiting scientist at RENCI at UNC Chapel Hill, North Carolina. He is Co-Chair of the e-Infrastructure Reflection Group and a member of the Steering Group of the Open Grid Forum.