Mannheim, Germany — As Ad Emmen reported for Primeur, at the Mannheim Supercomputer Seminar and the CUG in Stuttgart, the first prelimanary benchmark results were presented on the two processor Tera MTA machine at SDSC in San Diego. Allan Snavely stated that he was satisfied with the results. They were even a little better than the predictions based on a slower one-processor system.
Results for real live applications are comparable, and in some cases better than on the Cray T90. Pierre Hassid, manager for Tera Europe, hopes these results encourage a European HPC center to also partner with Tera. This way European applications could benefit from this new technology.
“The main reason to look at Tera is its simple programming model,” explained Allan Snavely, “our users do not like to program on machines which requires complex memory management from them, like MPPs do. In general, when we give a training class on parallel programming, we only see about 10% of the attendees really using it afterwards.”
Tera, with its simple global memory structure is more the way scientist like to see a machine. They can concentrate more on science and less on laying out data onto parallel processors, says Snavely.
At the San Diego Supercomputer Center, Snavely ran several benchmarks on the Tera machines. First on a one processor 145 MHz computer and recently on a two-processor 255 MHz machine. The one processor prototype ran applications at a speed comparable to that of a one processor Cray T90 when projected onto the final 300 MHz clock speed.
The two-processor results are new and were revealed this week in Stuttgart and Mannheim. Again, results were compared with the Cray T90. Snaveley stresses however that the Tera MTA machine is not only aimed at technical/scientific applications like the T90, but at a broad range of aplications, including commercial ones.
The three codes benchmarked were LCPFCT, a transport code that runs well on the T90 at 400-500 Mflop/s, AMBER, a Molecular Dynamics code performing 300-400 Mflop/s on a T90 and LS-Dyna, a crash code whose performance on a T90 heavily depends on the type of crash simulated.
Preliminary results show that an example of LCPFCT, the so called FAST20-800 with vector lengths of 104, performs at 498 Mflop/s on a Cray T90 as compared to 458 on a 255 MHz two-processor MTA-2 machine.
AMBER, with a vector length of 71, runs at 306 Mflop/s on a Cray T90 processor and 270 Mflop/s on the MTA-2.
On a crash code sample, C2500/NJ, the Cray T90 runs at 135 Mflop/s for one processor and 154 on two processors. The Tera MTA one processor performs at 105 Mflop/s and at two processors 171 Mflop/s, better than the T90.
SDSC plans to slowly upgrade the Tera machine. A four processor MTA will be installed within two months. If the evaluation is OK, an 8 processor and perhaps even a 16 procssor machine could be added.
After the first positive results in San Diego, Tera is now looking for additional partners. Pierre Hassid stresses that Europe would be a good place for such an early machine. A small system at one of Europe’s centers could be employed for user evaluation and building up experience with this kind of technology. Tera is ready to transfer the MTA technology to Europe. Hassid expects the market for Tera MTA machines will be much broader than that of traditional MPP’s or PVP’s.