RADIOSS, one of the top structural analysis solvers in the industry, is used by auto companies like Ford and PSA in France to run car crash and other simulations to test safety and viability of a broad array of products. Like similar CAE codes, it is highly parallelized for top performance on large, powerful clusters–and as one might expect, it’s incredibly core-hungry.
The code itself, which has been part of Altair’s Hyperworks umbrella of modeling and simulation tools since 2006, is nearing its thirtieth birthday and has been given the gift of more memory-balanced cores with the new Xeon E7 v2 as well as the Xeon Phi, according to Altair’s Director of High Performance Computing, Eric Lequiniou.
He recently compared the performance of the code against the previous top architecture for RADIOSS, the Xeon E5-2696 v2 24-core node with Intel’s new Xeon E7-4780 v2 four-socket node with 60 cores. The results were quite impressive—the code saw a 2.75x performance boost, higher than the median performance improvements expected for other CAE code expectations in the new Xeon line, which Intel benchmarked around 2.5x.
The key to the overall efficiency of RADIOSS lies in its hybrid nature—it’s parallelized using MPI and OpenMP, giving it the kind of multi-level parallelism that’s a perfect fit for the new Xeons and their many cores and balanced memory profile.
There are other options for speedups of crash test code that his team has worked with, including both GPUs and the Xeon Phi. He says that while they demonstrated great speedups a couple of years ago at the NVIDIA GTC conference using GPUs, adding more cores is just as good for performance for a highly scalable code like this as going through the process of porting to GPUs. Further, he says their code is mostly explicit, and this explicit solver inside RADIOSS requires porting several millions of lines of Fortran to fit the GPU—making it a difficult proposition in this particular case, even with sizable performance increases.
The Xeon Phi porting process, which Altair recently completed, also offers solid speedups, but there are limitations in using PCIe, which Lequiniou says will be addressed with the next generation of Knight’s Landing, which is tentatively slated for 2015 and will remove that bottleneck, according to Intel.
The ability for smaller CAE and modeling and simulation companies to power into the space with one single highly powerful node is an important part of what Intel’s new offerings bring to the table, says Lequiniou.
For one thing, using the same approach he benchmarked (as part of the Intel’s early release SDP program that puts new boxes in ISV hands for testing and development), means that smaller companies could be freed from having to invest the time and money of Infiniband to power several weaker nodes.
Further, the ability to pack 60 cores into a node means that custom codes will be easier to get off the ground. While he notes the cost is indeed quite high (listed at $5729), the added freedom from network and other challenges as well as overall efficiency make the boxes a TCO no-brainer.
Lequiniou says that the new Xeon E7 nodes are a prime fit for the “missing middle” in CAE and beyond who require the kind of high core count and memory bandwidth of a large cluster but without management and other overhead of maintaining an Infiniband-connected cluster of E5s or other architecture. According to IDC and others, this “missing middle” is most profoundly realized in manufacturing and product design areas where RADIOSS and similar codes are common, which could spell progress for smaller companies who take the E7 plunge for a single four or eight-socket node.