Grid Initiatives Part 1

By By Wolfgang Gentzsch, D-Grid, Duke University, and RENCI

January 29, 2007

Over the past 12 months, major grid projects have been studied to better understand how to successfully design, build, manage and operate large Community Grids, based on the experience of early adopters and on case studies and lessons learned from these grid projects. For this purpose, we have selected and analyzed the UK e-Science Programme, the US TeraGrid, Naregi in Japan, the ChinaGrid, the European EGEE, and the German D-Grid initiative.

More details can be found in the corresponding report; please see weblink at the end of this article. The report provides answers on what is a grid and how does it function, it lists the benefits of grid computing for research and industry, explains the business and services side of grids, discusses the grid projects investigated, offers a look into the future of grids, and finally compiles a list of lessons learned and recommendations for those who intend to build grids in the near future. This first part of the article summarizes some key statistics of the grid initiatives investigated, discusses in details the lessons learned, and summarizes the recommendations which have been offered by the grid project leaders. The second part of the article, in next week's GRIDtoday, will present some additional information about these six grid initiatives.

Major Grid Initiatives

Our research is based on information from project Web sites, project reports, interviews with representatives of all these grid initiatives, and our own hands-on experience in helping to build the German D-Grid. Major focus of our research and of the interviews was on strategic directions, applications, government and industry funding, national and international cooperation, strengths and weaknesses of the grid projects as described by the interviewees, sustainability of the resulting grid infrastructure, commercial services, and the future of e-Science. All information provided is already out-dated now, having a time stamp of Fall 2006.

In the following we briefly summarize six of the major grid projects around the world, and present statistics and history. More information can be found in the report mentioned above or collected from the Web (as I did it). First, the following table presents the different phases of the projects, their funding (in $M), approximate number of experts involved, and type of users (from research or industry):

Initiative         Time        Funding     People      Users

UK e-Science-I: 2001 - 2004 180 900 Res.
UK e-Science-II: 2004 - 2006 220 1100 Res. Ind.

TeraGrid-I: 2001 - 2004 90 500 Res.
TeraGrid-II: 2005 - 2010 150 850 Res.

ChinaGrid-I: 2003 - 2006 3 400 Res.
ChinaGrid-II: 2007 - 2010 15 1000 Res.

NAREGI-I: 2003 - 2005 25 150 Res.
NAREGI-II: 2006 - 2010 40 250 Res. Ind.

EGEE: 2004 - 2006 40 900 Res.
EGEE-II: 2006 - 2008 45 1100 Res. Ind.

D-Grid: 2005 - 2008 32 220 Res.
D-Grid-II: 2007 - 2009 35 440 Res. Ind.

Lessons Learned

In the following, we summarize the most important results and lessons learned from the grid projects analyzed and from the interviews:

Most of the successful projects in the early days had a strong focus on just one topic (middleware OR application) or a few selected aspects and requirements, and were more pragmatic, and mostly application and user driven, with a focus on the development of standard and commodity components, open source, and results easy to understand and to use. Application-oriented and grid-enabled workflows and the separation of middleware and application layer helped the projects to deliver more sustainable results, and usability and integration became relevant. It seems to be very important that application scientists closely collaborate with computer scientists. Professional service centers proved successful. E.g. in the UK, National Grid Service (NGS), Grid Operation Support Center (GOSC) and Open Middleware Institute (OMII) are extremely important factors to guaranty sustainability of the project results.

However, there were also problems and challenges, especially with the early initiatives and projects:

There was a lot of hype especially in 2001 and 2002, and thus too high expectation in the projects and their results. Projects which focused on both applications and infrastructure faced a high risk. Almost all projects in the early days developed their own infrastructure because middleware in those days (e.g. Globus, Condor, SRB, with new releases every 6 – 12 months) turned out to be immature. Middleware developed in these projects was often proprietary. In the early days, an integration of individual projects into a larger community or environment was not yet possible. Later projects either focused on the infrastructure with the applications as a driver, or focused on the application using existing core grid building blocks. One of the main reasons of failure was a sudden change in 2003 from the classical, more proprietary grid technologies to standard web services. Also, missing software engineering methods and especially low usability resulted in low acceptance of project results. The user point-of-view is paramount — a “build it, they will come approach” will not work. It is important to work with the user communities to ensure the resulting system is of a general nature and not limited in scope to a small number of applications.

A lot of the grid middleware currently promoted is really intended for research and demonstrations but needs significant effort to be made suitable for large-scale production usage. Standards are evolving slowly and it is likely that initiatives to improve inter-operability between existing grids will produce meaningful results of benefit to the user communities on a shorter time scale. The experience gained with this inter-operability work will help identify the highest-priority points for standardization as well as a meaningful way to test if candidate standards can be implemented and deployed.

It is challenging (but important) to establish an environment of constructive competition such that good ideas and performance are recognized and rewarded.  There are still many areas where the “captive user” approach is viewed as a competitive advantage.

Recommendations

In this paragraph, we summarize major results and conclusions from 'lessons learned', and present recommendations especially for those who intend to start or fund a new grid initiative. Some of the recommendations seem trivial, but are still worth mentioning. They all result from our analysis and findings and from the evaluation of the interviews:

In any grid project, during development as well as during operation, the core grid infrastructure should be modified/improved only in large time cycles if necessary, because applications and users depend on this infrastructure. Continuity and sustainability especially for the infrastructure part of grid projects are extremely important. Therefore, additional funding should be available also after the end of the project, to guarantee service and support and continuous improvement and adjustment to new developments. Close collaboration in the grid development phase between the grid infrastructure developers and the application developers is mandatory for the applications to utilize the core grid services of the infrastructure and to avoid application silos.

For new grid projects, we recommend a close collaboration among grid-experienced computer scientists who build the (generic) grid infrastructure and the driving users who define their set of requirements for the grid infrastructure services. Application communities shouldn't start developing a core infrastructure from scratch on their own, but should — together with grid-experienced computer scientists — decide on using and integrating existing grid building blocks to avoid building proprietary application silo architectures and to focus more on the real applications.

In their early stage, grid projects need enough funding to get over the early-adopter phase into a mature state with a rock-solid grid infrastructure such that other communities can join easily. We estimate this funding phase currently to be in the order of three years, with more funding in the beginning for the grid infrastructure, and later more funding for the application communities. Included in such a grid infrastructure funding are Centers of Excellence for building, managing and operating grid centers, for middleware tools, application support, and for training. Thus, parallel developments with re-inventing wheels can be avoided and funding efficiently spent.

After a generic grid infrastructure has been built, projects should focus first on one or only a few applications or specific services, to avoid complexity and re-inventing wheels. Usage of software components from open-source and standards initiatives is highly recommended to enable interoperability especially in the infrastructure and application-oriented middleware layer. For interoperability reasons, focus on software engineering methods especially for the implementation of protocols and the development of standard interfaces is important.

New application grids (community grids) should utilize the (existing) components of a generic grid infrastructure to avoid re-inventing wheels and building of silos. The infrastructure building block should be user-friendly to enable easy adoption for  new (application) communities. In addition, the infrastructure group should offer an installation, operation, support and training services. Centers of Excellence should specialize on specific services, e.g. middleware development and maintenances, integration of new communities, grid operation, training, utility services, etc. In case of more complex projects, e.g. consisting of an integration and several application or community projects, a strong management board should steer coordination and collaboration among the projects and the working groups. The management board (Steering Committee) should consist of leaders of the different sub-projects. Success, especially in early-stage technology projects, is strongly proportional to the personality and leadership capabilities of the leaders.

We recommend to implement an utility computing paradigm only in small steps, starting from enhancing existing service models moderately, and testing utility models and accounting and billing concepts first as pilots. Experience in this field and in its mental, legal and regulatory barriers is still missing. Very often, today's existing government funding models are counter-productive when establishing new and efficient forms of utility services. Today's funding models in research and education are often project based and thus not ready for a utilitarian approach where resource usage is based on a pay-as-you-go approach. Old funding models first have to be adjusted accordingly before a utility model can be introduced successfully.

Finally, participation of industry should be industry-driven. A push from the outside, even with government funding, doesn't seem to be promising. Success will come only from natural needs e.g. through already existing collaborations with research and industry, as a first step. For several good reasons, industry in general is still in a wait-state with building and applying global grids, demonstrated by the moderate success so far in existing industrial global grid initiatives around the world. We recommend to closely work with the industry to develop appropriate funding and collaboration models which take into account the different technological, mental and legal requirements when adjusting the existing research community oriented approaches, ideally starting with already existing and successful research-industry collaborations. If there are good reasons to create your own grid (on a university campus or in an enterprise) rather than join an existing one, better start with cluster based cycle savaging and when the users and their management are convinced of the value of sharing resources then extend the system to multiple-sites.

Try to study, copy and/or use an existing grid if possible and connect your own resources once you are convinced of its value. There is much useful experience to learn from partners. Learn/keep up with what your peers have done/are doing. Focus on understanding your user community and their needs. Invest in a strong communication/participation channel towards the leaders of that group to engage. Instrument your services so that you collect good data about who is using which services and how. Analyze this data and learn from watching what's really going on, in addition to what users report as happening. Plan for an incremental approach and lots of time talking out issues and plans. Social effects dominate in non-trivial grids.

Acknowledgement:

This report has been funded by the Renaissance Computing Institute RENCI at the University of North Carolina in Chapel Hill. I want to thank all the people who have contributed to this report and who are listed in the report on http://www.renci.org/publications/reports.php.

About the Author:

Wolfgang Gentzsch is heading the German D-Grid Initiative. He is an adjunct professor at Duke and a visiting scientist at RENCI at UNC Chapel Hill, North Carolina. He is Co-Chair of the e-Infrastructure Reflection Group and a member of the Steering Group of the Open Grid Forum.

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

House Bill Seeks Study on Quantum Computing, Identifying Benefits, Supply Chain Risks

May 27, 2020

New legislation under consideration (H.R.6919, Advancing Quantum Computing Act) requests that the Secretary of Commerce conduct a comprehensive study on quantum computing to assess the benefits of the technology for Amer Read more…

By Tiffany Trader

$100B Plan Submitted for Massive Remake and Expansion of NSF

May 27, 2020

Legislation to reshape, expand - and rename - the National Science Foundation has been submitted in both the U.S. House and Senate. The proposal, which seems to have bipartisan support, calls for giving NSF $100 billion Read more…

By John Russell

IBM Boosts Deep Learning Accuracy on Memristive Chips

May 27, 2020

IBM researchers have taken another step towards making in-memory computing based on phase change (PCM) memory devices a reality. Papers in Nature and Frontiers in Neuroscience this month present IBM work using a mixed-si Read more…

By John Russell

Australian Researchers Break All-Time Internet Speed Record

May 26, 2020

If you’ve been stuck at home for the last few months, you’ve probably become more attuned to the quality (or lack thereof) of your internet connection. Even in the U.S. (which has a reasonably fast average broadband Read more…

By Oliver Peckham

Hats Over Hearts: Remembering Rich Brueckner

May 26, 2020

It is with great sadness that we announce the death of Rich Brueckner. His passing is an unexpected and enormous blow to both his family and our HPC family. Rich was born in Milwaukee, Wisconsin on April 12, 1962. His Read more…

AWS Solution Channel

Computational Fluid Dynamics on AWS

Over the past 30 years Computational Fluid Dynamics (CFD) has grown to become a key part of many engineering design processes. From aircraft design to modelling the blood flow in our bodies, the ability to understand the behaviour of fluids has enabled countless innovations and improved the time to market for many products. Read more…

Supercomputer Simulations Reveal the Fate of the Neanderthals

May 25, 2020

For hundreds of thousands of years, neanderthals roamed the planet, eventually (almost 50,000 years ago) giving way to homo sapiens, which quickly became the dominant primate species, with the neanderthals disappearing b Read more…

By Oliver Peckham

$100B Plan Submitted for Massive Remake and Expansion of NSF

May 27, 2020

Legislation to reshape, expand - and rename - the National Science Foundation has been submitted in both the U.S. House and Senate. The proposal, which seems to Read more…

By John Russell

IBM Boosts Deep Learning Accuracy on Memristive Chips

May 27, 2020

IBM researchers have taken another step towards making in-memory computing based on phase change (PCM) memory devices a reality. Papers in Nature and Frontiers Read more…

By John Russell

Nvidia Q1 Earnings Top Expectations, Datacenter Revenue Breaks $1B

May 22, 2020

Nvidia’s seemingly endless roll continued in the first quarter with the company announcing blockbuster earnings that exceeded Wall Street expectations. Nvidia Read more…

By Doug Black

Microsoft’s Massive AI Supercomputer on Azure: 285k CPU Cores, 10k GPUs

May 20, 2020

Microsoft has unveiled a supercomputing monster – among the world’s five most powerful, according to the company – aimed at what is known in scientific an Read more…

By Doug Black

HPC in Life Sciences 2020 Part 1: Rise of AMD, Data Management’s Wild West, More 

May 20, 2020

Given the disruption caused by the COVID-19 pandemic and the massive enlistment of major HPC resources to fight the pandemic, it is especially appropriate to re Read more…

By John Russell

AMD Epyc Rome Picked for New Nvidia DGX, but HGX Preserves Intel Option

May 19, 2020

AMD continues to make inroads into the datacenter with its second-generation Epyc "Rome" processor, which last week scored a win with Nvidia's announcement that Read more…

By Tiffany Trader

Hacking Streak Forces European Supercomputers Offline in Midst of COVID-19 Research Effort

May 18, 2020

This week, a number of European supercomputers discovered intrusive malware hosted on their systems. Now, in the midst of a massive supercomputing research effo Read more…

By Oliver Peckham

Nvidia’s Ampere A100 GPU: Up to 2.5X the HPC, 20X the AI

May 14, 2020

Nvidia's first Ampere-based graphics card, the A100 GPU, packs a whopping 54 billion transistors on 826mm2 of silicon, making it the world's largest seven-nanom Read more…

By Tiffany Trader

Supercomputer Modeling Tests How COVID-19 Spreads in Grocery Stores

April 8, 2020

In the COVID-19 era, many people are treating simple activities like getting gas or groceries with caution as they try to heed social distancing mandates and protect their own health. Still, significant uncertainty surrounds the relative risk of different activities, and conflicting information is prevalent. A team of Finnish researchers set out to address some of these uncertainties by... Read more…

By Oliver Peckham

[email protected] Turns Its Massive Crowdsourced Computer Network Against COVID-19

March 16, 2020

For gamers, fighting against a global crisis is usually pure fantasy – but now, it’s looking more like a reality. As supercomputers around the world spin up Read more…

By Oliver Peckham

[email protected] Rallies a Legion of Computers Against the Coronavirus

March 24, 2020

Last week, we highlighted [email protected], a massive, crowdsourced computer network that has turned its resources against the coronavirus pandemic sweeping the globe – but [email protected] isn’t the only game in town. The internet is buzzing with crowdsourced computing... Read more…

By Oliver Peckham

Global Supercomputing Is Mobilizing Against COVID-19

March 12, 2020

Tech has been taking some heavy losses from the coronavirus pandemic. Global supply chains have been disrupted, virtually every major tech conference taking place over the next few months has been canceled... Read more…

By Oliver Peckham

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its sco Read more…

By John Russell

Supercomputer Simulations Reveal the Fate of the Neanderthals

May 25, 2020

For hundreds of thousands of years, neanderthals roamed the planet, eventually (almost 50,000 years ago) giving way to homo sapiens, which quickly became the do Read more…

By Oliver Peckham

Steve Scott Lays Out HPE-Cray Blended Product Roadmap

March 11, 2020

Last week, the day before the El Capitan processor disclosures were made at HPE's new headquarters in San Jose, Steve Scott (CTO for HPC & AI at HPE, and former Cray CTO) was on-hand at the Rice Oil & Gas HPC conference in Houston. He was there to discuss the HPE-Cray transition and blended roadmap, as well as his favorite topic, Cray's eighth-gen networking technology, Slingshot. Read more…

By Tiffany Trader

Honeywell’s Big Bet on Trapped Ion Quantum Computing

April 7, 2020

Honeywell doesn’t spring to mind when thinking of quantum computing pioneers, but a decade ago the high-tech conglomerate better known for its control systems waded deliberately into the then calmer quantum computing (QC) waters. Fast forward to March when Honeywell announced plans to introduce an ion trap-based quantum computer whose ‘performance’ would... Read more…

By John Russell

Leading Solution Providers

SC 2019 Virtual Booth Video Tour

AMD
AMD
ASROCK RACK
ASROCK RACK
AWS
AWS
CEJN
CJEN
CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
IBM
IBM
MELLANOX
MELLANOX
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
SIX NINES IT
SIX NINES IT
VERNE GLOBAL
VERNE GLOBAL
WEKAIO
WEKAIO

Contributors

Fujitsu A64FX Supercomputer to Be Deployed at Nagoya University This Summer

February 3, 2020

Japanese tech giant Fujitsu announced today that it will supply Nagoya University Information Technology Center with the first commercial supercomputer powered Read more…

By Tiffany Trader

Tech Conferences Are Being Canceled Due to Coronavirus

March 3, 2020

Several conferences scheduled to take place in the coming weeks, including Nvidia’s GPU Technology Conference (GTC) and the Strata Data + AI conference, have Read more…

By Alex Woodie

Exascale Watch: El Capitan Will Use AMD CPUs & GPUs to Reach 2 Exaflops

March 4, 2020

HPE and its collaborators reported today that El Capitan, the forthcoming exascale supercomputer to be sited at Lawrence Livermore National Laboratory and serve Read more…

By John Russell

Cray to Provide NOAA with Two AMD-Powered Supercomputers

February 24, 2020

The United States’ National Oceanic and Atmospheric Administration (NOAA) last week announced plans for a major refresh of its operational weather forecasting supercomputers, part of a 10-year, $505.2 million program, which will secure two HPE-Cray systems for NOAA’s National Weather Service to be fielded later this year and put into production in early 2022. Read more…

By Tiffany Trader

‘Billion Molecules Against COVID-19’ Challenge to Launch with Massive Supercomputing Support

April 22, 2020

Around the world, supercomputing centers have spun up and opened their doors for COVID-19 research in what may be the most unified supercomputing effort in hist Read more…

By Oliver Peckham

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

15 Slides on Programming Aurora and Exascale Systems

May 7, 2020

Sometime in 2021, Aurora, the first planned U.S. exascale system, is scheduled to be fired up at Argonne National Laboratory. Cray (now HPE) and Intel are the k Read more…

By John Russell

TACC Supercomputers Run Simulations Illuminating COVID-19, DNA Replication

March 19, 2020

As supercomputers around the world spin up to combat the coronavirus, the Texas Advanced Computing Center (TACC) is announcing results that may help to illumina Read more…

By Staff report

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This