The advent of more powerful compute systems has increased the capacity to generate data at a fantastic rate. To solve associated issues of data management, a combination of grid technology and other storage components are currently being deployed. Many solutions have been designed to address these petabyte-scale data management problems, including new software, NAS/NFS products and parallel storage solutions from IBM, Panasas and others. This involves handling and storing very large data sets accessed simultaneously by thousands of compute clients.
Extreme single projects such as the Linear Hadron Collider (LHC) at CERN are producing 15 petabytes of data each year. Raw data are distributed to Tier-0 data centres and then, after an order of magnitude reduction, are passed on to Tier-1 data centres and so on. This activity involves 130 computer centres of which 12 are very large.
In HPC there is a strong demand for parallel storage from users in the fields of computational physics, CFD, crash analysis, climate modelling, oceanography, seismic processing and interpretation, bioinformatics, cosmology, computational chemistry and materials sciences. The parallel storage requirement is being driven by the growing size of data sets, more complex analysis, the requirement to run more jobs, simulations with more iterations and the fact that the HPC solutions (Linux clusters) are using multicore processors and more nodes. Inherently the systems and applications are becoming more parallel, hence the requirement for parallel I/O increases.
Before we concentrate on HPC storage needs, let’s briefly review the disk storage market trends.
The disk storage market is expanding rapidly. By 2008 the HPC storage market will be well over $4 billion, according to IDC. The spread of broadband creates huge volumes of data, increasing data exchange in commercial transactions, email, images, video and music. Since interactions are global, this is happening 24 hours a day, 7 days a week. This data growth and non-stop operations put storage and data protection at the heart of this business, requiring high-speed processing for large data sets and high-speed backup for protection. Tape is insufficient for this purpose.
High-end NAS storage systems are likely to be using 10 Gbps TCP/IP (soon to be using 20 Gbps) and could have over 150 TBs in a single rack. It is often connected to SAN with a fibre cable and expandable into a cluster.
To deliver a “best-in-class” solution, the compute server and data handling are decoupled. They are highly complementary, but need to be scaled together for balance to handle several petabytes of active data. Although data patterns vary, the system needs to be designed from the ground up for multiple petabyte capability and several millions, or even billions, of files. It is therefore imperative that the data handling systems scale and the network bandwidth does not become a bottleneck.
In the rich digital content environment of today, the limitations of traditional NAS/SAN storage — scalability, performance bottlenecks and cost — are driving the industry to find new solutions. The response from the industry was the clustered storage evolution. Vendors claim that clustered NFS storage provides customers with enormous benefits in this digital content environment. The benefits include massive scalability, 100X larger file system, unmatched performance, 20X higher total throughput and industry-leading reliability. They also claim it is as easy to manage a 10-petabyte file system as a 1-terabyte file system. Clustered NFS solutions are fine for most large Web sites, but they simply don’t handle the kind of large files typical of most HPC applications very well.
A typical cluster computing architecture consists of a software stack of applications and middleware, tens or thousands of processors/clients, a high speed interconnect using, say, 10GigE, InfiniBand, Myrinet or Quadrics, thousands of direct network connections and hundreds of connections to physical storage.
Storage clusters, similar to compute clusters, transparently aggregate a large amount of independent storage nodes in order to appear as a single-entity. They typically use the same network technology as the compute cluster (InfiniBand or 10GigE), processing power (CPU, multicore, SMP), large amounts of globally coherent cache, and disk drives (up to 1 TB each).
A cluster file system is likely to be using industry standard protocols, NFS, CIFS, HTTP, FTP, NDMP, SNMP, ADS, LDAP and NIS for security, or some other product of similar standing. A cluster file system creates one giant drive or NFS mounted fully symmetric cluster. Such a system is massively scalable to multiple petabytes, easy to manage and has plenty of growth potential. The management of LUNs, volumes or RAID is taken care of by the storage cluster management system and is normally hidden from the user.
The future of HPC is tied to larger data sets, more CPUs applied to each problem, and a requirement for parallel storage. Today’s high density 1U servers (typically with 8 cores each) have increased the number of processing cores per node, but I/O bandwidth has not evolved at the same rate. The reality is that the number of cores per node is still increasing, however scientific and technical analysis requires a system that balances compute cores and I/O bandwidth.
With this increase in compute nodes, traditional single-server NFS solutions have quickly become a bottleneck. A first approach to solve this problem came in the form of clustered NFS. This however is falling short of HPC requirements. Major HPC sites are therefore not significantly deploying clustered NFS, but are rather moving directly from NFS to parallel storage (like Panasas, IBM GPFS and Lustre).
Government and academia users are already heavily deploying parallel storage and this is likely to become a requirement for all simulation and modelling applications deployed on clusters. Simply put, parallel compute clusters require parallel storage!
In the last few years, new storage companies have succeeded in taking a significant share of the file storage component of the HPC market from traditional storage providers such as Network Appliance, IBM, Sun, NEC and so on. For example, Panasas made news in 2007, when it was chosen to provide the data storage subsystem to support the RoadRunner petaflop Supercomputer, built by IBM to be installed at Los Alamos. It’s interesting to note that LANL chose Panasas parallel storage even over IBM’s parallel storage system, GPFS.
Another feather in Panasas’ cap is that the company scooped the annual HPCwire reader’s choice and editors’ choice awards for Panasas ActiveStor parallel storage and for the new Panasas Tiered Parity architecture respectively, at Supercomputing 2007 (SC07) in Reno, Nev.
To overcome the potential I/O bottleneck inherent in such a large-scale system as RoadRunner, Panasas offered PanFS as part of its ActiveStor Storage cluster architecture. This architecture is object-based and uses the DirectFLOW protocol to provide high scalability, reliability and manageability. It supports Red Hat, SUSE and Fedora, and its DirectorBlades manage and enable metadata scalability by dividing namespace into virtual volumes.
PanFS is promoted by Panasas as the “best-in-class” file system for HPC environments. The company claims the system eliminates bottlenecks, solves manageability problems and improves overall reliability.
When Len Rosenthal, Panasas chief marketing officer, was asked what differentiates Panasas from other cluster storage vendors he said: “The ‘parallel’ element of our offering differentiates us from the clustered storage vendors as we can provide massive speed-up for HPC applications and higher utilization of clusters through parallelism.”
“What is driving the need for ‘parallel storage’ in HPC is the combination of multiple factors: 1) Explosion of data sets due to the need to run large and more accurate models. 2) The massive use of x86 clusters and multicore CPUs, where users are applying 100s and 1000s of CPUs to simulation and modelling problems. 3) Currently deployed I/O and file systems based on NFS, and even clustered NFS, cannot handle the I/O requirements,” continued Rosenthal.
According to Panasas, the evolution to Parallel NFS (pNFS) is the ultimate proof that the computer storage world is going parallel. Even though pNFS is inspired by Panasas technology, IBM, Sun, EMC and NetApp are all committed to implementing pNFS. One presumes that despite being competitors, these companies also recognise the performance and scalability advantages of parallel storage, especially for future HPC; hence, that is why they are also working towards the standardisation of pNFS.
The merits of standards are well known. Standards drive product adoption, unlock markets, drive down costs, make interoperability possible and reduce risk to the client. The key storage vendors have existing incompatible parallel file system products with no interoperability. IBM has GPFS, EMC MPFSi (High Road), Panasas ActiveScale, HP has Polyserve and so on. Similar interoperability concerns are also present in open source Red Hat GFS and Lustre.
pNFS is an extension to the Network File System v4 protocol standard. It allows for parallel and direct access from parallel Network File System clients to storage devices over multiple storage protocols. It essentially moves the Network File System server out of the data path.
The pNFS standard defines the NFSv4.1 protocol extensions between the server and the client. The I/O protocol between the client and storage is specified elsewhere, for example: SCSI Block Commands (SBC) over Fibre Channel (FC), SCSI Object-based Storage Device (OSD) over iSCSI and Network File System (NFS). The control protocol between the metadata server and storage devices is also specified elsewhere, for example: SCSI Object-based Storage Device (OSD) over iSCSI.
In my view, this standards effort is admirable and should be supported across the storage industry. Potential benefits for users include improved sustained performance, accelerated time to results (solution) and parallel storage capability with standard highly reliable performance. It offers more choice of parallel I/O capabilities from multiple storage vendors, freedom to access parallel storage from any client, as well as mix and match best of breed of vendor offerings. It also contains lower risk for the user community, since client constructs are tested and optimised by the operating system of vendors whilst the customer is free from vendor lock-in concerns. In short, it extends the benefits of the investment in storage systems.
In summary, vendors and users are recognising that the future of high-end file storage is parallel. The early adopters like government and academia have adopted it, but anyone in the HPC space who is building clusters with 100s of CPU-cores and generating terabytes of data will require parallel storage.
IBM, Lustre and Panasas are the primary parallel storage systems deployed in government and academia, but Panasas is a strong viable alternative in providing parallel storage systems to large commercial companies, like those in the energy, manufacturing and financial markets. Panasas customers include: Boeing, BP, Petroleum GeoServices, Fairfield Industries, Hyundai Automotive Technical Center, Statoil, BMW/Sauber F1 Motor Sports, Paradigm, Northrop Grumman, PetroChina, Novartis and dozens of others. Thus, companies that are using HPC and trying to accelerate product development and make profits from their HPC infrastructure are increasingly turning to parallel storage as their preferred solutions. Remember the old saying: “The proof of the pudding is in the eating.”
—–
Copyright (c) Christopher Lazou. January 2008. Brands and names are the property of their respective owners.