The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
January 18, 2008
The advent of more powerful compute systems has increased the capacity to generate data at a fantastic rate. To solve associated issues of data management, a combination of grid technology and other storage components are currently being deployed. Many solutions have been designed to address these petabyte-scale data management problems, including new software, NAS/NFS products and parallel storage solutions from IBM, Panasas and others. This involves handling and storing very large data sets accessed simultaneously by thousands of compute clients.
Extreme single projects such as the Linear Hadron Collider (LHC) at CERN are producing 15 petabytes of data each year. Raw data are distributed to Tier-0 data centres and then, after an order of magnitude reduction, are passed on to Tier-1 data centres and so on. This activity involves 130 computer centres of which 12 are very large.
In HPC there is a strong demand for parallel storage from users in the fields of computational physics, CFD, crash analysis, climate modelling, oceanography, seismic processing and interpretation, bioinformatics, cosmology, computational chemistry and materials sciences. The parallel storage requirement is being driven by the growing size of data sets, more complex analysis, the requirement to run more jobs, simulations with more iterations and the fact that the HPC solutions (Linux clusters) are using multicore processors and more nodes. Inherently the systems and applications are becoming more parallel, hence the requirement for parallel I/O increases.
Before we concentrate on HPC storage needs, let's briefly review the disk storage market trends.
The disk storage market is expanding rapidly. By 2008 the HPC storage market will be well over $4 billion, according to IDC. The spread of broadband creates huge volumes of data, increasing data exchange in commercial transactions, email, images, video and music. Since interactions are global, this is happening 24 hours a day, 7 days a week. This data growth and non-stop operations put storage and data protection at the heart of this business, requiring high-speed processing for large data sets and high-speed backup for protection. Tape is insufficient for this purpose.
High-end NAS storage systems are likely to be using 10 Gbps TCP/IP (soon to be using 20 Gbps) and could have over 150 TBs in a single rack. It is often connected to SAN with a fibre cable and expandable into a cluster.
To deliver a "best-in-class" solution, the compute server and data handling are decoupled. They are highly complementary, but need to be scaled together for balance to handle several petabytes of active data. Although data patterns vary, the system needs to be designed from the ground up for multiple petabyte capability and several millions, or even billions, of files. It is therefore imperative that the data handling systems scale and the network bandwidth does not become a bottleneck.
In the rich digital content environment of today, the limitations of traditional NAS/SAN storage -- scalability, performance bottlenecks and cost -- are driving the industry to find new solutions. The response from the industry was the clustered storage evolution. Vendors claim that clustered NFS storage provides customers with enormous benefits in this digital content environment. The benefits include massive scalability, 100X larger file system, unmatched performance, 20X higher total throughput and industry-leading reliability. They also claim it is as easy to manage a 10-petabyte file system as a 1-terabyte file system. Clustered NFS solutions are fine for most large Web sites, but they simply don't handle the kind of large files typical of most HPC applications very well.
A typical cluster computing architecture consists of a software stack of applications and middleware, tens or thousands of processors/clients, a high speed interconnect using, say, 10GigE, InfiniBand, Myrinet or Quadrics, thousands of direct network connections and hundreds of connections to physical storage.
Storage clusters, similar to compute clusters, transparently aggregate a large amount of independent storage nodes in order to appear as a single-entity. They typically use the same network technology as the compute cluster (InfiniBand or 10GigE), processing power (CPU, multicore, SMP), large amounts of globally coherent cache, and disk drives (up to 1 TB each).
A cluster file system is likely to be using industry standard protocols, NFS, CIFS, HTTP, FTP, NDMP, SNMP, ADS, LDAP and NIS for security, or some other product of similar standing. A cluster file system creates one giant drive or NFS mounted fully symmetric cluster. Such a system is massively scalable to multiple petabytes, easy to manage and has plenty of growth potential. The management of LUNs, volumes or RAID is taken care of by the storage cluster management system and is normally hidden from the user.
Page: 1 of 3(Digg, Technorati, more)
New Paper: Parallel Computing Without Parallel Programming
Learn how domain experts can run VHLL programs like MATLAB® on a variety of high-performance platforms without low-level reprogramming and how to work with the largest datasets and complex algorithms without sacrificing ease of use or reducing productivity.
Jul 09 | Engineer Live | The demand for computational tools to underpin the 3D seismic interpretation process has never been more apparent. Read more...
Jul 08 | EE Times | Unemployment for U.S. engineers has reached record levels, according to government figures. Read more...
Jul 08 | Network World | Global spending for 2009 projected to drop 6 percent, for a total of $3.2 trillion. Read more...
Jul 08 | Linux Magazine | Portability or efficiency? Neither is guaranteed when writing explicit parallel code. Read more...
Jul 07 | Ars Technica | Japanese company builds custom ASIC to accelerate real-time ray traced rendering for the auto industry. Read more...
Jul 10 | | Engineers, scientists, and other domain experts depend on the productivity enabled by very high-level language (VHLL) tools like MATLAB® and Python. However, as datasets grow larger and programs get more sophisticated, ordinary desktop computers can no longer keep up. The paper explores how to run VHLL programs on high-performance platforms without low-level reprogramming. Work with large datasets and complex algorithms without sacrificing ease of use or reducing productivity.
Apr 14 | | Many HPC IT departments are feeling the rising pressure to deliver more capacity computing and performance while trying to reduce the total cost of ownership. This white paper discusses how an environmentally-friendly and open-standards HPC building block based computing system using flexible interconnect options helps address capacity computing needs.
Source: Addison Snell, GM/VP, Tabor Research; sponsored by Dell
Many organizations that could benefit from the use of HPC clusters find that it is complicated to get the systems up and running because of limited IT resources or the complexities of the clusters themselves. Learn how the Intel Cluster Ready program, for which Dell was an original partner, seeks to address this challenge for entry level and mid-range HPC users.
BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.
Sun Studio Compilers and Tools and Sun HPC ClusterTools allow you to create high performance parallel applications for OpenSolaris, Solaris and Linux. Sun Studio Express 11/08 includes MPI performance analysis capabilities and full OpenMP 3.0 compiler support. Learn about all this and the latest in Sun HPC ClusterTools 8.1.