A professor, an engineer, and a researcher who have never met before sit down to a conference dinner. One of them has a petabyte database of worldwide historical climate data. The second one owns a weather simulation engine over a cluster of a couple of thousand nodes. The third one has access to a real satellite. During conversation the question comes up: How precisely can we predict today’s weather front in Beijing, China? Let’s not waste time on discussion, but find out: they open their laptops to form an ad-hoc virtual organization to immediately share their assets. Historical data are then being fed to the simulation engine, and the results are compared to the real-time satellite feed. Within minutes, the answer is there for all to see.
Impossible? Possible. Today.
Complex HPC infrastructures, grids, collaboratories need to manage a plethora of distributed assets: data repositories, machines, applications and services. To share in a coordinated way, the HPC community invented virtual organizations (VOs): groups that share resources because they trust each other. VOs are the base concept of grid security, envisioned as highly dynamic, on-demand structures. People and processes can form and dissolve a VO at any moment, to run a project.
This concept isn’t new. It’s a decade-old, developed in the mid-90’s by Foster, Kesselman and Tuecke (The Anatomy of the Grid, 1995). Yet, how many ad-hoc VOs, formed on the fly at a conference table, have you seen since?
Common distributed security frameworks used for cross-institutional collaborations have not met this criteria. In various clones of the grid security implementation, often descending from the PKI (Public Key Infrastructure) model, virtual organizations have become static, heavy-weight, unusable structures, managed by multiple administrators. In the end, for an average user it wasn’t that simple to become part of one. Not to mention the idea of creating a VO yourself.
What happened to the original, brilliant and forward-thinking vision of a VO? It appears to me that we can’t see the forest through the trees. We might be hitting a moment when this changes.
What the HPC community has not noticed is that the concept of VOs is alive and flourishing in Web 2.0 services. In Flickr, users share pictures with others. No admin intervention is needed. In peer-2-peer services designed for sharing music, sharing is as simple as a mouse click. In T-Mobile’s Media center, and hundreds of similar services, one can upload their pics and define groups of friends to access these.
What went wrong with the robust multidisciplinary, multi-institutional research projects? Remember, grid’s mantra is “coordinated resource sharing”! Then why are commodity solutions more mature than grids? I tend to think that the root cause of the problem is not technology, but the philosophy. Web 2.0 says: give users the power to decide.
You may think that this is more difficult in research environments, because these assets are way more complex (and expensive) than those in the Web 2.0 world. True. But going back to the core of the problem, it does not make sense. I know my data. I know best whom I should share it with. I take responsibility for my data. Why should I involve an admin? Do they care more about the data than I do? If technical complexity of my assets are beyond my understanding — fine, let’s have a security expert decide. If, however, it is all about letting my project peer run an SQL query over my data, involving anyone but myself in permitting the action is just another hurdle that I should be free from.
And technologically, what’s missing? Actually, not much. Almost nothing. The distributed security frameworks, such as PKI, GSI and Shibboleth, are feature rich. This is good because they give a lot of options. They enable (but don’t support) dynamic sharing. What’s missing is the simplicity on top. A layer that ties together the loose ends provided by complex security software, and brings this up as a simple, intuitive end-user interface.
In fact, our company, GridwiseTech, recently announced such a product. AdHoc, version 1.1.0, is specifically designed to enable regular users (not administrators) to create a virtual organization on the fly and share their resources.
So the story of the professor, the engineer and the researcher is not only possible, but has already been tried out in a project we’re involved in that shares other types of data: medical patient records across multiple hospitals. Now we’re looking to work and partner with academic as well as commercial institutions that wish to adopt the concept of dynamic sharing of data, applications and machines.
About the Author
Pawel Plaszczak’s international software engineering experience includes work at CERN, British Telecommunications and Argonne National Laboratory. In 2003, Pawel founded GridwiseTech to lead pioneering work for the early adopters of scalable systems. Under Pawel’s leadership the company has won the trust and respect of customers including Turner Broadcasting, Ricoh, and Philips, and led numerous research efforts for international consortia. Pawel is the author of numerous articles and tutorials, the book “Grid Computing: The Savvy Manager’s Guide,” and a frequent speaker at professional conferences and events. Pawel blogs at BigDataMatters.com.