I've noticed that although there are many different volunteer Grid computing systems, no single commercial products seems to have emerged as the dominant player. I have some thoughts to share on the obstacles preventing the advancement of Grid technology in the marketplace, and a solution for overcoming them.
The Problem: Security
The single biggest issue is, in my opinion, security. In a volunteer grid system, the computing software has to run on a computer that belongs to someone and is used for things other than supercomputing. I used to work as a network administrator at Kistler Aerospace, and know just how paranoid folks are about their computers. People are suspicious of anything that might make them stop working.
I'm not talking about the Young Turks in a company. Junior staff is usually excited to try anything new. Unfortunately (and I'm speaking from experience here), this enthusiasm often results in massive computer crashes and all-nighters in which they get to learn things like “How to rebuild a UNIX server” and “Really, really don't break the Primary Domain Controller if you don't have a Backup Domain Controller.”
The real xenophobes are the Old Guard: the people further up the org chart, with two or three decades of experience under their belts. By and large, they are immune to hype and conservative about what goes onto their computers. And unless they are extremely technical, if you come by and install a Grid client on their machine one day, and for some reason they can't print the next week, odds are good that they will suspect it is the Grid client that is to blame.
As far as I am concerned, the IT manager has a job only slightly less thankless than that of a mine sweeper. Computers go down all the time, and the IT manager is a constant focus for criticism and intolerance by the staff. If you ask the average IT guy how to best keep a network stable, they will tell you it is to lock the staff out of the building and keep them away from their computers. Ideally, the computers should all be turned off, too.
A seasoned IT manager is going to be vehemently against installing Grid computing clients on their network. It is just one more thing that can break. And since it is a network application, it is something that can break the entire network if things go really bad.
If you tell them that the software they'll be running was written by the local staff, they'd probably shut you down even faster. I don't know why, but there seems to be a perception that if it was published by another company it might be safer. Familiarity breeds contempt. I'm not speaking in general terms here. I remember a co-worker of mine trying to talk our IT admin into letting us install a test servlet on our corporate gateway. When faced with the assertion that the code was totally stable and safe, the admin barked bitterly at the engineer, “Yeah, right. I've seen the kind of bugs you write.”
So, what are the options for making sure that the code your Grid client runs is safe?
For starters, you can make sure that the Grid client only runs code signed by trusted organizations. What people think this does is make your computer safe. What it actually does is dramatically reduce the odds of your computer accidentally running malware (trojan horse viruses, worms, etc.).
The downside to blind faith in trusted code is that accidents happen. Look at the security holes in Microsoft Windows. Consider how much business revenue has been lost in the United States due to these security problems. Consider how hard it is to keep a straight face when your browser presents you with a code-signing certificate and offers you the option to “Always trust software from Microsoft.”
If a company you trust (hell, it could be your own company) downloads library binaries to use for development and doesn't do an md5 check on the libraries before linking them in with their code, they are opening the door for trojan horse viruses. If they link the libraries with their own code, and then sign the entire package, you now have signed code from a trusted source that contains a virus.
Likewise, there are simply bugs. Suppose that a program does some work, stores intermediate results in a scratch directory and then deletes the directory when completed. Improper input (like a null string) could cause the directory to delete a user directory instead of a subdirectory inside the user's folder. This bug could easily be present in a signed code library.
All that signing code does is tell you who to blame when it's already too late.
The Solution: Run-time Checking
Java, on the other hand, has the capability to provide run-time checking. If the security manager is enabled, the Java runtime will check every line of code before it is executed. If the code attempts to delete the user's directory (forbidden), the code will be halted. If the code attempts to read from anywhere outside of its own scratch directory, the code will be halted. If the code attempts to open an unauthorized network connection, it will be halted.
Runtime checking provides 11th hour preventative measures against the things the developers didn't know about their code.
I am convinced that run time checks are the very best way to safely host foreign code on a network. I am also convinced that promising run time checks on client code is the best way to coax people into allowing Grid software to run on their computers. Most IT admins will probably still balk at the idea, but if the benefits to the organization are high enough, they might at least not threaten to quit their jobs over the idea.
About Daniel Pasco
Daniel Pasco is chief engineer and executive vice president at Brain Murmurs Inc., in Seattle. He is in charge of internal product research and development and lead architect of the JIVA Grid software suite.