Grid Initiatives Part 1

By By Wolfgang Gentzsch, D-Grid, Duke University, and RENCI

January 29, 2007

Over the past 12 months, major grid projects have been studied to better understand how to successfully design, build, manage and operate large Community Grids, based on the experience of early adopters and on case studies and lessons learned from these grid projects. For this purpose, we have selected and analyzed the UK e-Science Programme, the US TeraGrid, Naregi in Japan, the ChinaGrid, the European EGEE, and the German D-Grid initiative.

More details can be found in the corresponding report; please see weblink at the end of this article. The report provides answers on what is a grid and how does it function, it lists the benefits of grid computing for research and industry, explains the business and services side of grids, discusses the grid projects investigated, offers a look into the future of grids, and finally compiles a list of lessons learned and recommendations for those who intend to build grids in the near future. This first part of the article summarizes some key statistics of the grid initiatives investigated, discusses in details the lessons learned, and summarizes the recommendations which have been offered by the grid project leaders. The second part of the article, in next week's GRIDtoday, will present some additional information about these six grid initiatives.

Major Grid Initiatives

Our research is based on information from project Web sites, project reports, interviews with representatives of all these grid initiatives, and our own hands-on experience in helping to build the German D-Grid. Major focus of our research and of the interviews was on strategic directions, applications, government and industry funding, national and international cooperation, strengths and weaknesses of the grid projects as described by the interviewees, sustainability of the resulting grid infrastructure, commercial services, and the future of e-Science. All information provided is already out-dated now, having a time stamp of Fall 2006.

In the following we briefly summarize six of the major grid projects around the world, and present statistics and history. More information can be found in the report mentioned above or collected from the Web (as I did it). First, the following table presents the different phases of the projects, their funding (in $M), approximate number of experts involved, and type of users (from research or industry):

Initiative         Time        Funding     People      Users

UK e-Science-I: 2001 - 2004 180 900 Res.
UK e-Science-II: 2004 - 2006 220 1100 Res. Ind.

TeraGrid-I: 2001 - 2004 90 500 Res.
TeraGrid-II: 2005 - 2010 150 850 Res.

ChinaGrid-I: 2003 - 2006 3 400 Res.
ChinaGrid-II: 2007 - 2010 15 1000 Res.

NAREGI-I: 2003 - 2005 25 150 Res.
NAREGI-II: 2006 - 2010 40 250 Res. Ind.

EGEE: 2004 - 2006 40 900 Res.
EGEE-II: 2006 - 2008 45 1100 Res. Ind.

D-Grid: 2005 - 2008 32 220 Res.
D-Grid-II: 2007 - 2009 35 440 Res. Ind.

Lessons Learned

In the following, we summarize the most important results and lessons learned from the grid projects analyzed and from the interviews:

Most of the successful projects in the early days had a strong focus on just one topic (middleware OR application) or a few selected aspects and requirements, and were more pragmatic, and mostly application and user driven, with a focus on the development of standard and commodity components, open source, and results easy to understand and to use. Application-oriented and grid-enabled workflows and the separation of middleware and application layer helped the projects to deliver more sustainable results, and usability and integration became relevant. It seems to be very important that application scientists closely collaborate with computer scientists. Professional service centers proved successful. E.g. in the UK, National Grid Service (NGS), Grid Operation Support Center (GOSC) and Open Middleware Institute (OMII) are extremely important factors to guaranty sustainability of the project results.

However, there were also problems and challenges, especially with the early initiatives and projects:

There was a lot of hype especially in 2001 and 2002, and thus too high expectation in the projects and their results. Projects which focused on both applications and infrastructure faced a high risk. Almost all projects in the early days developed their own infrastructure because middleware in those days (e.g. Globus, Condor, SRB, with new releases every 6 – 12 months) turned out to be immature. Middleware developed in these projects was often proprietary. In the early days, an integration of individual projects into a larger community or environment was not yet possible. Later projects either focused on the infrastructure with the applications as a driver, or focused on the application using existing core grid building blocks. One of the main reasons of failure was a sudden change in 2003 from the classical, more proprietary grid technologies to standard web services. Also, missing software engineering methods and especially low usability resulted in low acceptance of project results. The user point-of-view is paramount — a “build it, they will come approach” will not work. It is important to work with the user communities to ensure the resulting system is of a general nature and not limited in scope to a small number of applications.

A lot of the grid middleware currently promoted is really intended for research and demonstrations but needs significant effort to be made suitable for large-scale production usage. Standards are evolving slowly and it is likely that initiatives to improve inter-operability between existing grids will produce meaningful results of benefit to the user communities on a shorter time scale. The experience gained with this inter-operability work will help identify the highest-priority points for standardization as well as a meaningful way to test if candidate standards can be implemented and deployed.

It is challenging (but important) to establish an environment of constructive competition such that good ideas and performance are recognized and rewarded.  There are still many areas where the “captive user” approach is viewed as a competitive advantage.

Recommendations

In this paragraph, we summarize major results and conclusions from 'lessons learned', and present recommendations especially for those who intend to start or fund a new grid initiative. Some of the recommendations seem trivial, but are still worth mentioning. They all result from our analysis and findings and from the evaluation of the interviews:

In any grid project, during development as well as during operation, the core grid infrastructure should be modified/improved only in large time cycles if necessary, because applications and users depend on this infrastructure. Continuity and sustainability especially for the infrastructure part of grid projects are extremely important. Therefore, additional funding should be available also after the end of the project, to guarantee service and support and continuous improvement and adjustment to new developments. Close collaboration in the grid development phase between the grid infrastructure developers and the application developers is mandatory for the applications to utilize the core grid services of the infrastructure and to avoid application silos.

For new grid projects, we recommend a close collaboration among grid-experienced computer scientists who build the (generic) grid infrastructure and the driving users who define their set of requirements for the grid infrastructure services. Application communities shouldn't start developing a core infrastructure from scratch on their own, but should — together with grid-experienced computer scientists — decide on using and integrating existing grid building blocks to avoid building proprietary application silo architectures and to focus more on the real applications.

In their early stage, grid projects need enough funding to get over the early-adopter phase into a mature state with a rock-solid grid infrastructure such that other communities can join easily. We estimate this funding phase currently to be in the order of three years, with more funding in the beginning for the grid infrastructure, and later more funding for the application communities. Included in such a grid infrastructure funding are Centers of Excellence for building, managing and operating grid centers, for middleware tools, application support, and for training. Thus, parallel developments with re-inventing wheels can be avoided and funding efficiently spent.

After a generic grid infrastructure has been built, projects should focus first on one or only a few applications or specific services, to avoid complexity and re-inventing wheels. Usage of software components from open-source and standards initiatives is highly recommended to enable interoperability especially in the infrastructure and application-oriented middleware layer. For interoperability reasons, focus on software engineering methods especially for the implementation of protocols and the development of standard interfaces is important.

New application grids (community grids) should utilize the (existing) components of a generic grid infrastructure to avoid re-inventing wheels and building of silos. The infrastructure building block should be user-friendly to enable easy adoption for  new (application) communities. In addition, the infrastructure group should offer an installation, operation, support and training services. Centers of Excellence should specialize on specific services, e.g. middleware development and maintenances, integration of new communities, grid operation, training, utility services, etc. In case of more complex projects, e.g. consisting of an integration and several application or community projects, a strong management board should steer coordination and collaboration among the projects and the working groups. The management board (Steering Committee) should consist of leaders of the different sub-projects. Success, especially in early-stage technology projects, is strongly proportional to the personality and leadership capabilities of the leaders.

We recommend to implement an utility computing paradigm only in small steps, starting from enhancing existing service models moderately, and testing utility models and accounting and billing concepts first as pilots. Experience in this field and in its mental, legal and regulatory barriers is still missing. Very often, today's existing government funding models are counter-productive when establishing new and efficient forms of utility services. Today's funding models in research and education are often project based and thus not ready for a utilitarian approach where resource usage is based on a pay-as-you-go approach. Old funding models first have to be adjusted accordingly before a utility model can be introduced successfully.

Finally, participation of industry should be industry-driven. A push from the outside, even with government funding, doesn't seem to be promising. Success will come only from natural needs e.g. through already existing collaborations with research and industry, as a first step. For several good reasons, industry in general is still in a wait-state with building and applying global grids, demonstrated by the moderate success so far in existing industrial global grid initiatives around the world. We recommend to closely work with the industry to develop appropriate funding and collaboration models which take into account the different technological, mental and legal requirements when adjusting the existing research community oriented approaches, ideally starting with already existing and successful research-industry collaborations. If there are good reasons to create your own grid (on a university campus or in an enterprise) rather than join an existing one, better start with cluster based cycle savaging and when the users and their management are convinced of the value of sharing resources then extend the system to multiple-sites.

Try to study, copy and/or use an existing grid if possible and connect your own resources once you are convinced of its value. There is much useful experience to learn from partners. Learn/keep up with what your peers have done/are doing. Focus on understanding your user community and their needs. Invest in a strong communication/participation channel towards the leaders of that group to engage. Instrument your services so that you collect good data about who is using which services and how. Analyze this data and learn from watching what's really going on, in addition to what users report as happening. Plan for an incremental approach and lots of time talking out issues and plans. Social effects dominate in non-trivial grids.

Acknowledgement:

This report has been funded by the Renaissance Computing Institute RENCI at the University of North Carolina in Chapel Hill. I want to thank all the people who have contributed to this report and who are listed in the report on http://www.renci.org/publications/reports.php.

About the Author:

Wolfgang Gentzsch is heading the German D-Grid Initiative. He is an adjunct professor at Duke and a visiting scientist at RENCI at UNC Chapel Hill, North Carolina. He is Co-Chair of the e-Infrastructure Reflection Group and a member of the Steering Group of the Open Grid Forum.

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

At SC19: What Is UrgentHPC and Why Is It Needed?

November 14, 2019

The UrgentHPC workshop, taking place Sunday (Nov. 17) at SC19, is focused on using HPC and real-time data for urgent decision making in response to disasters such as wildfires, flooding, health emergencies, and accidents. We chat with organizer Nick Brown, research fellow at EPCC, University of Edinburgh, to learn more. Read more…

By Tiffany Trader

China’s Tencent Server Design Will Use AMD Rome

November 13, 2019

Tencent, the Chinese cloud giant, said it would use AMD’s newest Epyc processor in its internally-designed server. The design win adds further momentum to AMD’s bid to erode rival Intel Corp.’s dominance of the glo Read more…

By George Leopold

NCSA Industry Conference Recap – Part 1

November 13, 2019

Industry Program Director Brendan McGinty welcomed guests to the annual National Center for Supercomputing Applications (NCSA) Industry Conference, October 8-10, on the University of Illinois campus in Urbana (UIUC). One hundred seventy from 40 organizations attended the invitation-only, two-day event. Read more…

By Elizabeth Leake, STEM-Trek

Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020

November 12, 2019

The number of top-tier HPC systems makers has shrunk due to a steady march of M&A activity, but there is increased diversity and choice of processing components with Intel Xeon, AMD Epyc, IBM Power, and Arm server ch Read more…

By Tiffany Trader

Intel AI Summit: New ‘Keem Bay’ Edge VPU, AI Product Roadmap

November 12, 2019

At its AI Summit today in San Francisco, Intel touted a raft of AI training and inference hardware for deployments ranging from cloud to edge and designed to support organizations at various points of their AI journeys. The company revealed its Movidius Myriad Vision Processing Unit (VPU)... Read more…

By Doug Black

AWS Solution Channel

Making High Performance Computing Affordable and Accessible for Small and Medium Businesses with HPC on AWS

High performance computing (HPC) brings a powerful set of tools to a broad range of industries, helping to drive innovation and boost revenue in finance, genomics, oil and gas extraction, and other fields. Read more…

IBM Accelerated Insights

Help HPC Work Smarter and Accelerate Time to Insight

 

[Attend the IBM LSF & HPC User Group Meeting at SC19 in Denver on November 19]

To recklessly misquote Jane Austen, it is a truth, universally acknowledged, that a company in possession of a highly complex problem must be in want of a massive technical computing cluster. Read more…

SIA Recognizes Robert Dennard with 2019 Noyce Award

November 12, 2019

If you don’t know what Dennard Scaling is, the chances are strong you don’t labor in electronics. Robert Dennard, longtime IBM researcher, inventor of the DRAM and the fellow for whom Dennard Scaling was named, is th Read more…

By John Russell

Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020

November 12, 2019

The number of top-tier HPC systems makers has shrunk due to a steady march of M&A activity, but there is increased diversity and choice of processing compon Read more…

By Tiffany Trader

Intel AI Summit: New ‘Keem Bay’ Edge VPU, AI Product Roadmap

November 12, 2019

At its AI Summit today in San Francisco, Intel touted a raft of AI training and inference hardware for deployments ranging from cloud to edge and designed to support organizations at various points of their AI journeys. The company revealed its Movidius Myriad Vision Processing Unit (VPU)... Read more…

By Doug Black

IBM Adds Support for Ion Trap Quantum Technology to Qiskit

November 11, 2019

After years of percolating in the shadow of quantum computing research based on superconducting semiconductors – think IBM, Rigetti, Google, and D-Wave (quant Read more…

By John Russell

Tackling HPC’s Memory and I/O Bottlenecks with On-Node, Non-Volatile RAM

November 8, 2019

On-node, non-volatile memory (NVRAM) is a game-changing technology that can remove many I/O and memory bottlenecks and provide a key enabler for exascale. That’s the conclusion drawn by the scientists and researchers of Europe’s NEXTGenIO project, an initiative funded by the European Commission’s Horizon 2020 program to explore this new... Read more…

By Jan Rowell

MLPerf Releases First Inference Benchmark Results; Nvidia Touts its Showing

November 6, 2019

MLPerf.org, the young AI-benchmarking consortium, today issued the first round of results for its inference test suite. Among organizations with submissions wer Read more…

By John Russell

Azure Cloud First with AMD Epyc Rome Processors

November 6, 2019

At Ignite 2019 this week, Microsoft's Azure cloud team and AMD announced an expansion of their partnership that began in 2017 when Azure debuted Epyc-backed instances for storage workloads. The fourth-generation Azure D-series and E-series virtual machines previewed at the Rome launch in August are now generally available. Read more…

By Tiffany Trader

Nvidia Launches Credit Card-Sized 21 TOPS Jetson System for Edge Devices

November 6, 2019

Nvidia has launched a new addition to its Jetson product line: a credit card-sized (70x45mm) form factor delivering up to 21 trillion operations/second (TOPS) o Read more…

By Doug Black

In Memoriam: Steve Tuecke, Globus Co-founder

November 4, 2019

HPCwire is deeply saddened to report that Steve Tuecke, longtime scientist at Argonne National Lab and University of Chicago, has passed away at age 52. Tuecke Read more…

By Tiffany Trader

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

Using AI to Solve One of the Most Prevailing Problems in CFD

October 17, 2019

How can artificial intelligence (AI) and high-performance computing (HPC) solve mesh generation, one of the most commonly referenced problems in computational engineering? A new study has set out to answer this question and create an industry-first AI-mesh application... Read more…

By James Sharpe

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

D-Wave’s Path to 5000 Qubits; Google’s Quantum Supremacy Claim

September 24, 2019

On the heels of IBM’s quantum news last week come two more quantum items. D-Wave Systems today announced the name of its forthcoming 5000-qubit system, Advantage (yes the name choice isn’t serendipity), at its user conference being held this week in Newport, RI. Read more…

By John Russell

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

Crystal Ball Gazing: IBM’s Vision for the Future of Computing

October 14, 2019

Dario Gil, IBM’s relatively new director of research, painted a intriguing portrait of the future of computing along with a rough idea of how IBM thinks we’ Read more…

By John Russell

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Intel Confirms Retreat on Omni-Path

August 1, 2019

Intel Corp.’s plans to make a big splash in the network fabric market for linking HPC and other workloads has apparently belly-flopped. The chipmaker confirmed to us the outlines of an earlier report by the website CRN that it has jettisoned plans for a second-generation version of its Omni-Path interconnect... Read more…

By Staff report

Kubernetes, Containers and HPC

September 19, 2019

Software containers and Kubernetes are important tools for building, deploying, running and managing modern enterprise applications at scale and delivering enterprise software faster and more reliably to the end user — while using resources more efficiently and reducing costs. Read more…

By Daniel Gruber, Burak Yenier and Wolfgang Gentzsch, UberCloud

Dell Ramps Up HPC Testing of AMD Rome Processors

October 21, 2019

Dell Technologies is wading deeper into the AMD-based systems market with a growing evaluation program for the latest Epyc (Rome) microprocessors from AMD. In a Read more…

By John Russell

Rise of NIH’s Biowulf Mirrors the Rise of Computational Biology

July 29, 2019

The story of NIH’s supercomputer Biowulf is fascinating, important, and in many ways representative of the transformation of life sciences and biomedical res Read more…

By John Russell

Xilinx vs. Intel: FPGA Market Leaders Launch Server Accelerator Cards

August 6, 2019

The two FPGA market leaders, Intel and Xilinx, both announced new accelerator cards this week designed to handle specialized, compute-intensive workloads and un Read more…

By Doug Black

When Dense Matrix Representations Beat Sparse

September 9, 2019

In our world filled with unintended consequences, it turns out that saving memory space to help deal with GPU limitations, knowing it introduces performance pen Read more…

By James Reinders

With the Help of HPC, Astronomers Prepare to Deflect a Real Asteroid

September 26, 2019

For years, NASA has been running simulations of asteroid impacts to understand the risks (and likelihoods) of asteroids colliding with Earth. Now, NASA and the European Space Agency (ESA) are preparing for the next, crucial step in planetary defense against asteroid impacts: physically deflecting a real asteroid. Read more…

By Oliver Peckham

Cerebras to Supply DOE with Wafer-Scale AI Supercomputing Technology

September 17, 2019

Cerebras Systems, which debuted its wafer-scale AI silicon at Hot Chips last month, has entered into a multi-year partnership with Argonne National Laboratory and Lawrence Livermore National Laboratory as part of a larger collaboration with the U.S. Department of Energy... Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This