Latest Benchmark Results on NEC Super Highlights SX-9 Performance

By John Boyd

November 19, 2008

Researchers at Tohoku University in Sendai, north-eastern Japan, announced on Wednesday that they had broken a batch of performance records on their NEC SX-9 supercomputer, as measured on the HPC Challenge Benchmark test. Hiroaki Kobayashi, director the university’s Cyberscience Center, said the SX-9 had achieved the highest marks ever in 19 of 28 areas the test evaluates in computer processing, memory bandwidth and networking bandwidth. The scores were matched against those previously achieved on the same independent benchmark test by other leading supercomputers, including IBM’s Blue Gene/L, Cray’s XT3/4 and SGI’s Altix ICE, with the SX-9 coming out on top 64 percent of the time.

The news comes at a good time for NEC. The Tokyo-based manufacturer of vector-based supercomputers is battling in a market that has been moving away from its expensive high-performance vector processing models to systems that use more modestly priced commodity-type superscalar CPUs. These cheaper chips can be coupled tightly together or used in clusters of computers to achieve similar or better results than vector competitors — at least in some areas of supercomputing.

At Tohoku University, however, a stronghold of vector computing since it installed its first SX-1 in 1985, Director Kobayashi argues that vector computing is essential for certain types of applications and will only increase in importance as advances are made in parallel processing.

“In the future, data parallel processing will become more important in high performance computing,” says Kobayashi. “And vector processing provides a very efficient model for it.” This is why, he adds, Intel, which has long provided short vector SIMD code extensions for its x86 architecture, is employing wider vector operations in its upcoming Larrabee graphics processing chip. “Regarding parallel processing, at the instruction-set level, vector instruction sets are the key to future processors, no matter what kind of micro-architecture is used,” says Kobayashi.”

In addition, he emphasizes that for the kind of programs that the 1,500 paying supercomputer users of the University’s Cyberscience Center want to run, vector is still king. Most of these users are involved in government and academic research programs in areas like aerospace, environmental simulations, structural analysis and nanotechnology. “They want to conduct very large simulations, so are looking for an efficient handling mechanism to process extremely large amounts of data in a single operation,” says Kobayashi. “Vector processing is best suited to this kind of application.”

The SX-9 employs a single-chip vector processor capable of reaching 102 GFLOPS. Up to 16 CPUs sharing 1 TB of memory can be incorporated on a single node, combing to produce 1.6 TFLOPS of peak performance. The Tohoku University SX-9 set-up, which began operations this April, consists of 16 nodes, each of 16 CPUs, producing an overall peak performance of 26 TFLOPS. On a sustained performance bases, the Cyberscience Center’s test results show a single SX-9 CPU outperforms that of the previous SX-8R by between four to eight times, depending on the application.

Much of the new CPU’s improved performance can be accounted for by the addition of an arithmetic unit and raising the number of vector pipelines — all integrated on a single chip that is the first to surpass 100 GFLOPS.

But Kobayashi notes that a new feature of the SX-9, the inclusion of an assignable data buffer or ADB, has also helped boost performance significantly. “ADB is software-controllable cache memory,” he explains. “It lets the user assign the data to be cached, which prevents it from being evicted.”

In a simulation used to detect the presence of land mines with electromagnetic waves, for instance, performance increased by 20 percent when ADB was used. In another simulation, which tracked the movement of tectonic plates (the cause of earthquakes), the use of ADB improved performance by 75 percent, while a simulation involving the physics of plasma under certain conditions saw performance jump two times when employing ADB.

Despite such gains, Kobayashi has a gripe with the current ADB design: the cache space is limited to just 256 kilobytes. This means users cannot place all the target data in the cache; rather, they must select only the portion that they judge will work most effectively in ADB. To determine the optimum amount of cache memory, the Cyberscience Center, which is developing a software simulator based on the SX-9 architecture to design future supercomputer models, ran simulations using real application code. To achieve the highest performance, the researchers found that a minimum of 8 MB of ADB memory is necessary. NEC has been so advised.

Regarding the HPC Challenge Benchmark results, it was no surprise that the SX-9, the architecture of which is particularly designed to produce efficient processing of large data amounts, came out on top in memory performance and did well in networking bandwidth. But Kobayashi was also keen to point out that when it came to computing performance, despite the relatively small size of the Center’s SX-9 set-up, it still competed well against much larger configured systems.

“In the case of global-FFT testing, for instance, we still made second place to Cray’s XT3, which is a huge system, with maybe 100 times more processors,” says Kobayashi. “And while the XT3’s peak performance was five times higher (than our system) its global-FFT result was only 20 percent higher. So if we could add even just one more lane (consisting of four nodes) we would expect to do much better.”

In recent years NEC has had to relinquish its No. 1 position in the TOP500 list of best performing supercomputers to scalar-based systems from Cray, IBM and other competitors when it comes to sheer peak speeds. As a result, it has turned to emphasizing efficient sustained performance and productivity. But now there is belief within the company that given a large enough SX-9 installation, NEC could once again challenge for the top performance spot, which it held from 2002 to 2004 with its SX-6 generation.

“Next March JAMSTEC (Japan Agency for Marine-Earth Science Technology) will begin operations of its Earth Simulator II,” notes Rie Toh, manager of NEC’s HPC marketing promotion division. The system, used to forecast global climate changes, typhoons and other extreme weather conditions, as well as predict earthquakes, volcano activity and the like, will use NEC supercomputer technology, as did the previous Earth Simulator I. The new system will incorporate 160 SX-9 nodes, each containing eight CPUs, making a total of 1280 CPUs. NEC says this would produce a peak performance of 131 TFLOPS. “Given that Cray’s XT3 holds the HPC Challenge Benchmark’s highest score for G-FFT system performance with 124.4 TFLOPS,” says Toh, “we are eager to see what the SX-9-based Earth Simulator II will achieve when it’s up and running.”

But NEC’s window of opportunity to win speed-king bragging rights may not be open for long. In the endless game of breaking supercomputer performance records, Cray has just announced it plans to ship its next-generation XT5 model at about the time the Earth Simulator II is to begin operations.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

SC17 Student Cluster Competition Configurations: Fewer Nodes, Way More Accelerators

November 16, 2017

The final configurations for each of the SC17 “Donnybrook in Denver” Student Cluster Competition have been released. Fortunately, each team received their equipment shipments on time and undamaged, so the teams are r Read more…

By Dan Olds

Student Clusterers Demolish HPCG Record! Nanyang Sweeps Benchmarks

November 16, 2017

Nanyang pulled off the always difficult double-play at this year’s SC Student Cluster Competition. The plucky team from Singapore posted a world record LINPACK, thus taking the Highest LINPACK Award, but also managed t Read more…

By Dan Olds

Student Cluster LINPACK Record Shattered! More LINs Packed Than Ever before!

November 16, 2017

Nanyang Technological University, the pride of Singapore, utterly destroyed the Student Cluster Competition LINPACK record by posting a score of 51.77 TFlop/s at SC17 in Denver. The previous record, established by German Read more…

By Dan Olds

HPE Extreme Performance Solutions

Harness Scalable Petabyte Storage with HPE Apollo 4510 and HPE StoreEver

As a growing number of connected devices challenges IT departments to rapidly collect, manage, and store troves of data, organizations must adopt a new generation of IT to help them operate quickly and intelligently. Read more…

Hyperion Market Update: ‘Decent’ Growth Led by HPE; AI Transparency a Risk Issue

November 15, 2017

The HPC market update from Hyperion Research (formerly IDC) at the annual SC conference is a business and social “must,” and this year’s presentation at SC17 played to a SRO crowd at a downtown Denver hotel. This w Read more…

By Doug Black

SC17 Student Cluster Competition Configurations: Fewer Nodes, Way More Accelerators

November 16, 2017

The final configurations for each of the SC17 “Donnybrook in Denver” Student Cluster Competition have been released. Fortunately, each team received their e Read more…

By Dan Olds

Student Clusterers Demolish HPCG Record! Nanyang Sweeps Benchmarks

November 16, 2017

Nanyang pulled off the always difficult double-play at this year’s SC Student Cluster Competition. The plucky team from Singapore posted a world record LINPAC Read more…

By Dan Olds

Student Cluster LINPACK Record Shattered! More LINs Packed Than Ever before!

November 16, 2017

Nanyang Technological University, the pride of Singapore, utterly destroyed the Student Cluster Competition LINPACK record by posting a score of 51.77 TFlop/s a Read more…

By Dan Olds

Hyperion Market Update: ‘Decent’ Growth Led by HPE; AI Transparency a Risk Issue

November 15, 2017

The HPC market update from Hyperion Research (formerly IDC) at the annual SC conference is a business and social “must,” and this year’s presentation at S Read more…

By Doug Black

The Betting Window is Closed: Final Student Cluster Competition Betting Odds are in!

November 15, 2017

The window has closed and the bettors are clutching their tickets, anxiously awaiting the results of the SC17 Student Cluster Competition. We’ve seen big chan Read more…

By Dan Olds

2017 Student Cluster Competition Benchmarks, Workloads, and Pre-Planned Disasters

November 15, 2017

The students competing in the 2017 Student Cluster Competition in Denver are facing a grueling 48 hour marathon of HPC benchmarks and real scientific applicatio Read more…

By Dan Olds

Nvidia Focuses Its Cloud Containers on HPC Applications

November 14, 2017

Having migrated its top-of-the-line datacenter GPU to the largest cloud vendors, Nvidia is touting its Volta architecture for a range of scientific computing ta Read more…

By George Leopold

HPE Launches ARM-based Apollo System for HPC, AI

November 14, 2017

HPE doubled down on its memory-driven computing vision while expanding its processor portfolio with the announcement yesterday of the company’s first ARM-base Read more…

By Doug Black

US Coalesces Plans for First Exascale Supercomputer: Aurora in 2021

September 27, 2017

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, in Arlington, Va., yesterday (Sept. 26), it was revealed that the "Aurora" supercompute Read more…

By Tiffany Trader

NERSC Scales Scientific Deep Learning to 15 Petaflops

August 28, 2017

A collaborative effort between Intel, NERSC and Stanford has delivered the first 15-petaflops deep learning software running on HPC platforms and is, according Read more…

By Rob Farber

Oracle Layoffs Reportedly Hit SPARC and Solaris Hard

September 7, 2017

Oracle’s latest layoffs have many wondering if this is the end of the line for the SPARC processor and Solaris OS development. As reported by multiple sources Read more…

By John Russell

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Google Releases Deeplearn.js to Further Democratize Machine Learning

August 17, 2017

Spreading the use of machine learning tools is one of the goals of Google’s PAIR (People + AI Research) initiative, which was introduced in early July. Last w Read more…

By John Russell

GlobalFoundries Puts Wind in AMD’s Sails with 12nm FinFET

September 24, 2017

From its annual tech conference last week (Sept. 20), where GlobalFoundries welcomed more than 600 semiconductor professionals (reaching the Santa Clara venue Read more…

By Tiffany Trader

Graphcore Readies Launch of 16nm Colossus-IPU Chip

July 20, 2017

A second $30 million funding round for U.K. AI chip developer Graphcore sets up the company to go to market with its “intelligent processing unit” (IPU) in Read more…

By Tiffany Trader

Amazon Debuts New AMD-based GPU Instances for Graphics Acceleration

September 12, 2017

Last week Amazon Web Services (AWS) streaming service, AppStream 2.0, introduced a new GPU instance called Graphics Design intended to accelerate graphics. The Read more…

By John Russell

Leading Solution Providers

Reinders: “AVX-512 May Be a Hidden Gem” in Intel Xeon Scalable Processors

June 29, 2017

Imagine if we could use vector processing on something other than just floating point problems.  Today, GPUs and CPUs work tirelessly to accelerate algorithms Read more…

By James Reinders

EU Funds 20 Million Euro ARM+FPGA Exascale Project

September 7, 2017

At the Barcelona Supercomputer Centre on Wednesday (Sept. 6), 16 partners gathered to launch the EuroEXA project, which invests €20 million over three-and-a-half years into exascale-focused research and development. Led by the Horizon 2020 program, EuroEXA picks up the banner of a triad of partner projects — ExaNeSt, EcoScale and ExaNoDe — building on their work... Read more…

By Tiffany Trader

Delays, Smoke, Records & Markets – A Candid Conversation with Cray CEO Peter Ungaro

October 5, 2017

Earlier this month, Tom Tabor, publisher of HPCwire and I had a very personal conversation with Cray CEO Peter Ungaro. Cray has been on something of a Cinderell Read more…

By Tiffany Trader & Tom Tabor

Cray Moves to Acquire the Seagate ClusterStor Line

July 28, 2017

This week Cray announced that it is picking up Seagate's ClusterStor HPC storage array business for an undisclosed sum. "In short we're effectively transitioning the bulk of the ClusterStor product line to Cray," said CEO Peter Ungaro. Read more…

By Tiffany Trader

Intel Launches Software Tools to Ease FPGA Programming

September 5, 2017

Field Programmable Gate Arrays (FPGAs) have a reputation for being difficult to program, requiring expertise in specialty languages, like Verilog or VHDL. Easin Read more…

By Tiffany Trader

HPC Chips – A Veritable Smorgasbord?

October 10, 2017

For the first time since AMD's ill-fated launch of Bulldozer the answer to the question, 'Which CPU will be in my next HPC system?' doesn't have to be 'Whichever variety of Intel Xeon E5 they are selling when we procure'. Read more…

By Dairsie Latimer

IBM Advances Web-based Quantum Programming

September 5, 2017

IBM Research is pairing its Jupyter-based Data Science Experience notebook environment with its cloud-based quantum computer, IBM Q, in hopes of encouraging a new class of entrepreneurial user to solve intractable problems that even exceed the capabilities of the best AI systems. Read more…

By Alex Woodie

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “ Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Share This