Dear Mr Feldman:
We think you're exactly right regarding MPI: “It's hard to find a real fan of MPI today. Most people either tolerate it or hate it” [see Programming Clusters Just Got Easier, http://www.hpcwire.com/hpc/663546.html]. But you also write that, “Unfortunately, writing applications for clusters means the programmer has to deal with the hard realities of distributed memory.” In fact, today's programmers can avoid these “hard realities” by using commercially supported tuple space-based software. Tuple space systems were originally introduced in 1986, when Carriero and Gelernter (undersigned) described Linda, the first efficient tuple space implementation on a distributed memory machine, at the ACM Operating System conference.
Tuple space is a virtual shared memory system that's proved to be highly efficient and (equally important) easier to use than any competing system we know. Tuple space systems make distributed programming substantially easier than message passing does; easier than any other type of system we've met. Which isn't surprising. Tuple space systems are fundamentally higher-level than the others. And the best tuple space systems are just as efficient as the low-level solutions.
Today, commercial tuple space systems are widely used in production software development. (The systems referred to here are built and supported by a New Haven software company called Scientific Computing Associates Inc – SCAI; “www/lindaspaces.com”.) Tuple space systems are used in many verticals, in academia and by ISVs. For example: the manufacturer of Gaussian, an important quantum chemistry code, offers a parallel version (Parallel Gaussian) that uses TCP-Linda to run on distributed memory Linux, UNIX, Windows and OS X platforms. Parallel Gaussian is used by chemistry research groups all over the world.
Tuple space systems are the basis, also, of proprietary parallel and distributed apps at such important companies as Lehman Brothers, UBS Warburg and CIBC in financial services, Pfizer in the Life Sciences, Hess and El Paso Natural Gas in the energy vertical; they have been used in the manufacturing and defense sectors too. And many high performance hardware vendors are Linda license-holders.
Modern tuple space implementations, especially SCAI's “NetWorkSpaces,” are significantly higher-level than Cluster OpenMP. They operate not at the level (and in the environment) of conventional programming languages; instead, NetWorkSpaces is a virtual shared-memory enhancement of rapid application development environments such as R, MATLAB, and Python. Users have already voted with their feet: these higher-level environments are displacing programming languages for most application development by working researchers, scientists and engineers (as opposed to professional programmers). Virtual shared memory is the only rational approach to distributed memory machines. And, unlike Cluster OpenMP, virtual shared memory enhancements rise to the user's level instead of dragging the user down.
Which leaves a fascinating question. Why is the parallel programming ecosystem still dominated by C or Fortran plus MPI? Tuple space systems have been judged far easier to use than MPI again and again; otherwise the tuple space apps I've mentioned wouldn't exist. No company will reject a widely-used default and choose an obscure competitor out of sentimentality! And high-level systems such as R, MATLAB and Python have user communities that keep growing too — because they make careful, sparing use of the world's most valuable resource, highly-trained human beings. And yet low-level systems designed for efficiency, not plus but instead of ease-of-use, still dominate parallel programming. Why? There are several reasons, but here is the most important one.
Parallel programming is in the middle of a major transition, from being an insider's game to a sport anyone can play — and everyone who uses computers to compute things, not just for communication and entertainment, will have to play. We saw exactly this kind of transition years ago, centered on operating systems. Once upon a time, operating systems were in the hands of trained professionals. But the rise of the PC meant that henceforth, operating systems had to be for everyone. Soon, sophisticated graphical interfaces had displaced complex, low-level command-line interfaces. Only computer professionals are willing to meet the computer on its level. Non-professionals expect the computer to rise to their levels. Non-professionals (in short) have higher, more exacting, more sophisticated software standards than professionals. (Why? Because they don't like playing with software and weren't trained for it. They have other things to do and want to get on with them.)
Once, parallel and distributed programming was in the hands of professional software developers with a high tolerance for (and the ability to understand) complex, low-level systems. (Like MPI.) But things have changed. Clusters are everywhere. Grids are everywhere (that is, multiple LAN-connected computers that are sometimes used as platforms for parallel apps). A multi-core machine is coming to a desk- or laptop near you soon. Computational demands keep growing — but the electronic speed of processor chips is no longer keeping pace. Perhaps most important, parallel programming is no longer frightening. We see hard problems solved by parallel tasking every day: you order the pizza, someone else get the soda, someone else get the beer, someone else get the napkins and plates, someone else check that the room is free… That's parallel programming; and it's not rocket science.
Operating systems made the transition from MVS/360 and its operator's console to the Mac OS. Parallel programming is on the verge of the same sort of transition: from MPI (with its fussy, low-level communication model) to the high-level simplicity of tuple spaces. Only computer professionals are willing to meet the computer on its level. Non-professionals expect the computer to rise to their levels. Tuple spaces will come out on top.
Yours,
Nicholas Carriero and David Gelernter
—–
Editor's note:
David Gelernter is a professor of computer science at Yale University and national fellow at the American Enterprise Institute. His interests include information management, parallel programming, software ensembles and artificial intelligence. He co-developed the “Linda” programming language with Nicholas Carriero. Gelernter is the author of many books (including “Mirror Worlds,” 1991), has published in many newspapers and magazines (serving as weekly columnist for the New York Post in 1987 and Los Angeles Times in 2005), is board member at the National Endowment for the Arts; his forthcoming book (“The Biblical Republic: America and Americanism”) will be published by Doubleday this year.
Nicholas Carriero is a computer scientist, also at Yale University, where he researches system issues in the development and deployment of software tools for parallelism. He has worked with David Gelernter and the Linda group at Yale, where Carriero has developed variants of C and Fortran that provide Linda's coordination model. This work has included the C-Linda precompiler and analyzer, and support kernels for shared-memory multiprocessors. Carriero's current work includes refinement of existing implementations of the Linda coordination model, development of new implementations, extension of the model, and exploration of parallel programming methodologies. Adaptive parallelism, distributed computing, and “non-traditional” coordination applications are topics of particular emphasis.
Both Gelernter and Carriero are consultants for Scientific Computing Associates Inc., the company that has commercialized the Linda technology.”