Visit additional Tabor Communication Publications
November 19, 2008
Researchers at Tohoku University in Sendai, north-eastern Japan, announced on Wednesday that they had broken a batch of performance records on their NEC SX-9 supercomputer, as measured on the HPC Challenge Benchmark test. Hiroaki Kobayashi, director the university's Cyberscience Center, said the SX-9 had achieved the highest marks ever in 19 of 28 areas the test evaluates in computer processing, memory bandwidth and networking bandwidth. The scores were matched against those previously achieved on the same independent benchmark test by other leading supercomputers, including IBM's Blue Gene/L, Cray's XT3/4 and SGI's Altix ICE, with the SX-9 coming out on top 64 percent of the time.
The news comes at a good time for NEC. The Tokyo-based manufacturer of vector-based supercomputers is battling in a market that has been moving away from its expensive high-performance vector processing models to systems that use more modestly priced commodity-type superscalar CPUs. These cheaper chips can be coupled tightly together or used in clusters of computers to achieve similar or better results than vector competitors -- at least in some areas of supercomputing.
At Tohoku University, however, a stronghold of vector computing since it installed its first SX-1 in 1985, Director Kobayashi argues that vector computing is essential for certain types of applications and will only increase in importance as advances are made in parallel processing.
"In the future, data parallel processing will become more important in high performance computing," says Kobayashi. "And vector processing provides a very efficient model for it." This is why, he adds, Intel, which has long provided short vector SIMD code extensions for its x86 architecture, is employing wider vector operations in its upcoming Larrabee graphics processing chip. "Regarding parallel processing, at the instruction-set level, vector instruction sets are the key to future processors, no matter what kind of micro-architecture is used," says Kobayashi."
In addition, he emphasizes that for the kind of programs that the 1,500 paying supercomputer users of the University's Cyberscience Center want to run, vector is still king. Most of these users are involved in government and academic research programs in areas like aerospace, environmental simulations, structural analysis and nanotechnology. "They want to conduct very large simulations, so are looking for an efficient handling mechanism to process extremely large amounts of data in a single operation," says Kobayashi. "Vector processing is best suited to this kind of application."
The SX-9 employs a single-chip vector processor capable of reaching 102 GFLOPS. Up to 16 CPUs sharing 1 TB of memory can be incorporated on a single node, combing to produce 1.6 TFLOPS of peak performance. The Tohoku University SX-9 set-up, which began operations this April, consists of 16 nodes, each of 16 CPUs, producing an overall peak performance of 26 TFLOPS. On a sustained performance bases, the Cyberscience Center's test results show a single SX-9 CPU outperforms that of the previous SX-8R by between four to eight times, depending on the application.
Much of the new CPU's improved performance can be accounted for by the addition of an arithmetic unit and raising the number of vector pipelines -- all integrated on a single chip that is the first to surpass 100 GFLOPS.
But Kobayashi notes that a new feature of the SX-9, the inclusion of an assignable data buffer or ADB, has also helped boost performance significantly. "ADB is software-controllable cache memory," he explains. "It lets the user assign the data to be cached, which prevents it from being evicted."
In a simulation used to detect the presence of land mines with electromagnetic waves, for instance, performance increased by 20 percent when ADB was used. In another simulation, which tracked the movement of tectonic plates (the cause of earthquakes), the use of ADB improved performance by 75 percent, while a simulation involving the physics of plasma under certain conditions saw performance jump two times when employing ADB.
Despite such gains, Kobayashi has a gripe with the current ADB design: the cache space is limited to just 256 kilobytes. This means users cannot place all the target data in the cache; rather, they must select only the portion that they judge will work most effectively in ADB. To determine the optimum amount of cache memory, the Cyberscience Center, which is developing a software simulator based on the SX-9 architecture to design future supercomputer models, ran simulations using real application code. To achieve the highest performance, the researchers found that a minimum of 8 MB of ADB memory is necessary. NEC has been so advised.
Regarding the HPC Challenge Benchmark results, it was no surprise that the SX-9, the architecture of which is particularly designed to produce efficient processing of large data amounts, came out on top in memory performance and did well in networking bandwidth. But Kobayashi was also keen to point out that when it came to computing performance, despite the relatively small size of the Center's SX-9 set-up, it still competed well against much larger configured systems.
"In the case of global-FFT testing, for instance, we still made second place to Cray's XT3, which is a huge system, with maybe 100 times more processors," says Kobayashi. "And while the XT3's peak performance was five times higher (than our system) its global-FFT result was only 20 percent higher. So if we could add even just one more lane (consisting of four nodes) we would expect to do much better."
In recent years NEC has had to relinquish its No. 1 position in the TOP500 list of best performing supercomputers to scalar-based systems from Cray, IBM and other competitors when it comes to sheer peak speeds. As a result, it has turned to emphasizing efficient sustained performance and productivity. But now there is belief within the company that given a large enough SX-9 installation, NEC could once again challenge for the top performance spot, which it held from 2002 to 2004 with its SX-6 generation.
"Next March JAMSTEC (Japan Agency for Marine-Earth Science Technology) will begin operations of its Earth Simulator II," notes Rie Toh, manager of NEC's HPC marketing promotion division. The system, used to forecast global climate changes, typhoons and other extreme weather conditions, as well as predict earthquakes, volcano activity and the like, will use NEC supercomputer technology, as did the previous Earth Simulator I. The new system will incorporate 160 SX-9 nodes, each containing eight CPUs, making a total of 1280 CPUs. NEC says this would produce a peak performance of 131 TFLOPS. "Given that Cray's XT3 holds the HPC Challenge Benchmark's highest score for G-FFT system performance with 124.4 TFLOPS," says Toh, "we are eager to see what the SX-9-based Earth Simulator II will achieve when it's up and running."
But NEC's window of opportunity to win speed-king bragging rights may not be open for long. In the endless game of breaking supercomputer performance records, Cray has just announced it plans to ship its next-generation XT5 model at about the time the Earth Simulator II is to begin operations.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
May 09, 2013 |
The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
May 08, 2013 |
For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.