A lot has changed since 1997, when SGI put on the first public Grid demonstration at the Supercomputing show. Walter Stewart, SGI's business development manager for Grid, spoke with GRIDtoday editor Derrick Harris about just what has changed since then, as well about the company's tactics in the battle against increasingly large data spikes.
GRIDtoday: How do the new products mentioned in SGI's current release [the Altix 1350, Altix Hybrid Cluster, InfiniteStorage Total Performance 9700 and Silicon Graphics Prism] advance SGI's Grid strategy?
WALTER STEWART: We have, for some considerable time, ensured that any new product that we bring out is Grid-enabled. We have been looking to bring out products that bring a unique functionality to Grids, and we believe that these four products advance the kinds of functionalities that SGI is able to make available to people who are operating Grids. Particularly in the case of the 1350, we are bringing in functionality at a much more attractive price point than has been available before.
I think that speaks to SGI's overall Grid position. We're out there to be a toolmaker for the Grid and to make sure that the kind of power SGI brings to stand-alone compute facilities is available to hugely distributed users.
Gt: Do you know of any projects off-hand that are using or planning on using any of these new solutions?
STEWART: Because we've only just released them, I'm not aware of any that are looking at them for Grid at the moment. Certainly, some products from the Altix family have already been installed in major Grid installations around the world. We've had circumstances where we've had customers who are not interested in large, shared-memory machines, but who are interested in what might be described as “robust node clusters” for their Grids, and that's precisely what the Altix 1350 addresses. We are certainly very much involved with the Altix family in a number of major Grid installations around the world.
Gt: Which leads to my next question. There are certainly some well-established projects currently powered by SGI solutions, including the TeraGyroid, COSMOS and SARA projects. Could you talk a little about how SGI products are being used in these, and other, Grid projects?
STEWART: One interesting one, in your own country, is that we installed at the beginning of the year a 1,000-plus processor machine at NCSA, which will be one of the resources on the TeraGrid. This is, as far as I'm aware, the first shared-memory resource that has been available to TeraGrid users. So that's one very recent, and North American example.
I think I'm right in saying that it's next month that we're installing another 1,600-plus processor machine at the Australian Partnership for Advanced Computing, which will become a major resource on the Australian Grid.
COSMOS Grid has been around for some time, and we've been through a couple of generations with COSMOS Grid. We first installed Origin there, and have subsequently installed Altix. This is all because the COSMOS Grid people are in the business of setting up the data environment, including processing and visualization, in order to be ready for the data that will come flooding in from the Planck satellite in 2007. This is an example of bringing real power to Grids with a very strong emphasis on a very close connection among compute, visualization and data management.
Gt: SGI is focused on addressing four primary challenges of Grid, and I want to talk about two in particular. First: Why is security such a big issue in Grid computing, what are some of the major security issues and what is SGI doing to improve Grid security?
STEWART: We've been doing a lot of work with the open source community in transferring a lot of IP from our experience with our own operating system, IRIX. There are some security issues that we're hopeful will be picked up by the open source community. Because a lot of our security work with IRIX was right in the OS, we feel obliged to work with the open source community, and move at their speed, on the introduction of some of those attributes.
One area that isn't talked about in security, or security-related issues, that SGI is very preoccupied with is the whole issue of versioning. It's one thing to talk about the security of data, it's another thing to talk about the integrity of data. With our CXFS storage-area network over a wide-area network on the Grid, we are solving the problem of making multiple copies of data. So if you're in San Diego and I'm in Toronto, we could be working on the same data set without having to make a copy for each of us. Therefore, we can be confident that the data I'm working with is the same as you are because we're using the same copy.
We also keep a watchful eye on the work that goes on in the standards bodies like GGF.
Gt: I haven't heard a lot of companies state their dedication to the cause of visualization capabilities on Grid networks. Can you give me a little more detail on why this is so important to SGI?
STEWART: It's important to SGI because we think it's important to the world of Grid and the world of next-generation computing. Let me give you an example. We work with an engineering firm that was working in a fairly conventional IT environment where they had a number of workstations around the company, in a few locations, and they were copying files from workstation to workstation as different people had to work on them. Those files were in the neighborhood of 200GB, and it was taking about three hours on their network to move the files. But then they came to us and said that they were going to have a problem because their next data set was going to be a terabyte in size, and that was probably going to take something in the order of 22 hours to move — that was not going to be acceptable. We said, “Well, we wouldn't worry about it anyway if we were you.”
And they said, “Why is that?”
“Because you can't load it on the workstation anyway. It'll crash it.”
Before we presented the solution to them, they came back and said they had made a mistake. It was, in fact, not going to be 1TB; it was going to be 4TB. I might say this as an aside: this is an increasingly common happening among companies and among research organizations. The spike in data is so profound.
Quite clearly in that circumstance, there was no way those workstations could cope with 4TB, nor could the company's network. So we designed a system for them that allows them to have those remote legacy workstations, have the users there, send instructions into a SAN (Storage Area Network) to cause the compute server to compute the data, then the visualization piece of the compute server visualizes that data, then we strip the pixels from the data and stream the pixels in real-time back to the remote user on the legacy workstation. They have full interactivity not only with the data set, but the computation of that data, from a legacy workstation — stressing neither the network nor the workstation's capacity.
In that circumstance, visualization is a critical tool for working with big data. More and more, as people look at these data spikes, even if you are able to get the data moved — which is becoming increasingly impossible — if the data or the data results are being expressed alpha-numerically, it's going to take you too long to read it. Ask a big data question … you get bigger data answer frequently. And if it takes you six months to read the answer …
If you can look at it visually, you often can understand it in a fraction of the time it would take you to internalize the information if it's expressed to you alpha-numerically. We believe that kind of infrastructure is critical for all sorts of users, in all sorts of places, working on all sorts of devices, with all sorts of OS's. It's SGI's role to bring that core power to Grid installations so that people at the various points along the Grid can have access to it.
Gt: Finally, I want to go back, for a few questions, to SGI's groundbreaking demonstration at Supercomputing '97. What kind of effect did it have on the Grid movement? Did it add an element of legitimacy?
STEWART: I think it certainly got Grid going and began a process of people seeing that there is a possibility to design this very different kind of infrastructure. I think that, in truth, if the community could have kept the momentum of that activity in 1997, we'd be further ahead today. I think we got sidetracked for a number of years, particularly in North America, with the cycle scavenging model as a single approach to Grid computing. I'm happy to say that single approach has very much ended.
While going around and doing cycle scavenging is still a very legitimate part of Grid, it's no longer seen as the grid. People are recognizing that Grid users should have access to a variety of different devices and a variety of different kinds of tools to work.
So, I think that 1997 was critical. I just wish we could have maintained the momentum that [the demo in] 1997 started, and we might be further ahead today. Things have changed dramatically in the last year to two years and focused much more on the building of the kind of infrastructure that's required to deal with big data.
Gt: That was almost seven-and-a-half years ago, an eternity in information technology, and a whole lot has changed with Grid since then. What do you see as some of the biggest differences?
STEWART: Grids are now deployed in working environments — there are lots of Grids. I would characterize the Grid as having three phases so far. From 1997 until about 2001, you were looking at Grids deployed for research on Grids. Starting around 2001, you increasingly saw Grids deployed in a research environment to serve research goals of multiple disciplines. We moved away from the Grid being the object of the research to the Grid being a tool to enable research. Starting roughly around late 2003-2004, we really began to see a major ramp-up of Grids being installed in enterprise situations and in corporate situations.
Certainly, I might comment with one other hat that I wear as co-chair of the Plenary Program Committee at GGF. Our Plenary Program at GGF12 in Brussels last September was quite extraordinary [in regard to] the number of companies that turned up at event that either already had Grids or were seriously looking at installing Grids and came to find out about it. There was nothing like that attendance previously. If you go back to GGF in 2002, there would be no one there but vendors and researchers. By now, we believe we are seeing a strong corporate engagement in the whole issue of Grid.
Gt: My final question is: If you had a crystal ball, what do you think you would see in another seven-and-a-half years, in 2012? Where will Grid be, and what role will SGI have in helping it get there?
STEWART: I very much see Grid in this context: Starting in the middle of the 18th century, right up well through the 20th century, we built evermore elaborate distribution mechanisms, or infrastructures for distribution, in order to move raw materials, processed materials and finished products. That was the absolute foundation of the industrial economy. We began, sometime late in the 20th century, to begin creating the infrastructure for the knowledge economy. Data is the raw material for the knowledge economy. Grid is the nascent, or the beginning, of that infrastructure that is going to allow us to move data from data to information to knowledge and, therefore, to value.
I would say that we are going to increasingly see infrastructure built around the principles that are related to Grid computing that enable users in every conceivable location to have access to the tools that they need for data. If I chose to, I could drive about a mile from my house and be able to buy a lemon picked off a tree in California. We have the infrastructure in place to make it possible for me to have that in the middle of winter in Toronto. We're going to see the kind of infrastructure that will make it possible for me, regardless of where I am, to be able to access the power to deal with the data that I need in order to be a knowledge worker.
Gt: Is there anything else you would like to add in regard to this announcement or SGI's Grid strategy in general?
STEWART: I believe that SGI will always be looking forward to your 2012 date. SGI will always be there, designing the tools that are ready to deal with that next spike in the volumes of data people have to work with. We're not going to be down at the commodity level, and we're not going to be there for the problems that are already solved. We are going to be there for the people who are tackling the next data spike.
To read the release cited in this article, please see “New Solutions Extend SGI'S Drive to Advance Grid Computing” in the issue of GRIDtoday.