Visit additional Tabor Communication Publications
October 12, 2007
Sun Microsystems' recent acquisition of the Lustre file system and the associated Cluster File Systems (CFS) resources has caused less gnashing of teeth than one might have suspected. After all, Lustre has become one of the more widely deployed parallel file systems in high performance computing, especially in high-end supercomputing systems at the national labs. Having ownership of this technology in the hands of a system vendor is bound to cause a few worries. And it has -- but just a few. For the time being, Sun has managed to convince the Lustre community that its intentions are honorable.
But the Lustre deal may also be an indication that parallel file systems are not all about supercomputing. While Sun has worked hard to establish its HPC street cred with its Sun Fire x64 servers and Constellation blades, which became the basis for the Tokyo Tech TSUBAME and TACC Ranger superclusters, the company is still primarily focused on the broader enterprise market. Sun has stated it will support and enhance Lustre as a distributed file system for its own HPC offerings and for the overall high performance computing community, but the company may have larger plans in mind.
Even before the acquisition, Sun declared its intentions to marry Lustre to its own ZFS file system to produce a general-purpose, high-capacity parallel file system solution. ZFS is Sun's Solaris-based file system for applications that require very large storage capacity. For true scalability, the only element missing was a clustering capability, which they now have in Lustre.
From the hardware side, Sun has its Sun Fire X4500 "Thumper" 24 TB data server, which is aimed across a range of big data applications in HPC, media collection and streaming, business intelligence and nearline storage. Since the company would like to sell this hardware with Solaris and ZFS, the Lustre addition will give the platform a lot more range, but maybe not in the HPC realm.
That brings us to the other side of the story. For the HPC community, Lustre is all about Linux. When HPC system vendors learned of Sun's intention to buy one of the industry's more popular parallel file systems, I'm guessing there were a few skipped heartbeats out there. After the acquisition announcement in September, Sun quickly stepped in to assure the community that it didn't intend to leave HPC vendors and users in the lurch. According to Sun, it's going to business as usual. In a follow-up letter to the community on October 1, the company explicitly stated it will continue to support Lustre on Linux and on non-Sun hardware, as well as maintain the technology's open source model.
Most of the HPC companies I've talked with over the past few weeks were cautiously optimistic about Lustre's new ownership. None of the system vendors gave me any indication they were looking for alternative technologies or were thinking of creating a private version of Lustre for themselves, but there was also a wait-and-see attitude expressed regarding how Sun would handle the new arrangement.
Certainly non-Sun customers with deployed Lustre technology had to be concerned. At HP, the customer questions starting arriving soon after the deal was announced. Ed Turkel, manager of product and technology marketing for HP's High Performance Computing Division, said that customers and potential customers asked HP about the impact of the Sun deal on the support of HP's Lustre-based StorageWorks Scalable File Share (SFS) file server, a product intended for large-scale Linux clusters, i.e., beyond a few hundred nodes.
Turkel said they do not foresee a problem with their ability to support SFS, since the open source model of Lustre will be maintained under the new ownership, and HP has been reassuring customers to that effect. "Based on the statements that both CFS and Sun have made, we really don't see a change in our product strategy and product support," said Turkel. "So our intention is to continue to provide HP SFS as the high capacity, high bandwidth parallel file system that we've been delivering for awhile."
When I suggested it could be risky to depend on your rival for technology like this, Turkel reminded me that HP also ships Java with virtually every system they sell. They even have systems in the field running Sun's Grid Engine, which, like virtually everything at Sun these days, is open source.
While less of a direct competitor to Sun than HP, Cray has even greater exposure to the fortunes of Lustre, since all of its storage products are based on it. "We're obviously sold on Lustre," Cray CEO Peter Ungaro told me. "We think it's a great file system and getting better all the time."
Since both Cray and its customers have made a strategic investment in Lustre technology, the company needs to make sure they can support it for the foreseeable future. Like Turkel at HP, Ungaro feels the new arrangement at Sun will allow this to happen. Not only does he expect current contracts with CFS to be honored, but Ungaro also anticipates that future Lustre support will be forthcoming.
But he's not necessarily counting it. For extra insurance, a couple of years ago Cray assembled its own Lustre technology team within the company. If for some reason Sun fails to deliver on support, the Cray team is prepared to step in and take up the slack themselves. "We feel like we're very well protected and are continuing our investment in Lustre," said Ungaro.
Cray partner DataDirect Networks is also dependent on Lustre technology. They provide HPC storage to a number of system vendors, but are agnostic with regard to file systems. Most of their supercomputing deployments are based on either GPFS or Lustre.
For some time DataDirect had been concerned that, unlike GPFS, Lustre was in a financially precarious position. CFS, after all, was a small company that depended on a handful of large government installations for the majority of its revenue. DataDirect CEO Alex Bouzari said the Sun deal definitely puts Lustre on a much sounder financial footing than when it was a standalone company, a view shared by everyone else I've spoken with. Bouzari said, "One of the concerns, as Lustre gained prominence in the HPC community, was ensuring that the file system would live on and be under the umbrella of an organization that would bring a long term perspective to it -- meaning be adequately funded and have a professional management structure around it." For him, the one caveat was being able to keep Lustre open source, which has now been promised by Sun.
The gang at Panasas is perhaps most suspicious about Sun's intentions. Panasas sells its own parallel storage clusters (ActiveStore) and parallel file system (PanFS), and therefore competes with Lustre-based solutions. Larry Jones, the VP of Marketing at Panasas, thinks Sun's natural inclination will be to use Lustre and the CFS resources to augment Sun products, rather than continue the Linux focus. "Sun is all about Solaris and Lustre is now going to be all about Sun," said Jones.
He is specifically talking about incorporating Lustre with ZFS and Solaris to provide a clustered storage solution using the Sun Fire X4500 platform. This brings us back to Sun's Thumper strategy that I talked about at the beginning of this article. Sun is surely interested in selling ZFS-based storage under a Solaris OS environment to HPC customers. But at some level, they must realize it's a bit of fantasy. The HPC community is not ready for a third OS behind Linux and Windows (Compute Cluster Server). Even though Sun would love to drive Solaris adoption in HPC, a Solaris-based Thumper is unlikely to excite the supercomputing masses today. If Sun wants to sell more X4500 hardware to the HPC crowd, they would probably achieve greater effect porting ZFS to Linux. But if you believe Larry Jones, Sun's end game with Lustre lies elsewhere.
"I don't think it's primarily for HPC," Jones speculated. "I think it's because they're mainly an enterprise company and they want to take Thumper into the enterprise as a NAS solution, and that NAS solution now needs to be clustered and parallel, because that's the way the world's going."
As always, comments about HPCwire are welcomed and encouraged. Write to me, Michael Feldman, at firstname.lastname@example.org.
Posted by Michael Feldman - October 11, 2007 @ 9:00 PM, Pacific Daylight Time
Michael Feldman is the editor of HPCwire.
No Recent Blog Comments
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
Although Horst Simon was named Deputy Director of Lawrence Berkeley National Laboratory, he maintains his strong ties to the scientific computing community as an editor of the TOP500 list and as an invited speaker at conferences.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
May 09, 2013 |
The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
May 08, 2013 |
For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.