Results Redefine AI, Dense Linear Algebra, and More
The new Intel® Xeon® Scalable Processors are already seeing some impressive results as they tackle some of the most challenging HPC workloads in linear algebra, weather, electromagnetics, and more. Improvements in the processor’s memory and network interfaces are redefining HPC computing and cost effectiveness.
Overall performance improvements
Intel claims the new Intel Xeon Scalable processors deliver a performance increase of 1.63xi on average performance increase for a range of HPC workloads over previous Intel Xeon processors due to a host of microarchitecture improvements that include: support for Intel® AVX-512 wide vector instructions, up to 28 cores and 56 threads per socket, support for up to eight socket systems, an additional two memory channels, support for DDR4 2666 MHz memory, and more.
The 1.63x improvement is an on-average result based in an internal Intel assessment using the Geomean of Weather Research Forecasting – Conus 12Km, HOMME, LSTCLS-DYNA Explicit, INTES PERMAS V16, MILC, GROMACS water 1.5M_pme, VASPSi256, NAMDstmv, LAMMPS, Amber GB Nucleosome, Binomial option pricing, Black-Scholes, Monte Carlo European options.
Third-party reports are also showing improved performance and capability.
Powering a “renaissance” in dense linear algebra
As Hatem Ltaief, Senior Research Scientist at KAUST, an Intel® Parallel Computing Center (IPCC) collaborator who received an early Intel Xeon Scalable processor system states, “They rock!”
Ltaief and his postdocs have been testing these processors with their HiCMA library, an approximate yet accurate dense linear algebra package for big matrices – think billion rows by billion columns. He refers to the ability to perform matrix operations at this scale a “renaissance” in dense linear algebra. One example of the faster runtime and ability to run larger than current highly-optimized dense linear packages is shown in the figure below.
Improved performance across a wide range of EM applications
Computer Simulation Technology (CST) also reports a performance boost across a wide range of electromagnetic applications as compared to previous generation processors. Their results are shown below:
Integrated Intel® Omni-Path Architecture (Intel® OPA)
CST reports efficient strong scaling of their transient solver when using an external Intel Omni-Path Architecture adaptor.
These results present a very positive view towards the future as Intel has launched a series of Intel Xeon Platinum and Intel Xeon Gold processor SKUs with Intel OPA integrated onto the processor package: access a fast, low-latency 100Gbps fabric without having to purchase an external Intel OPA interface card!
Fast results from a fast processor
Even after a short period of general availability, reports show that these new processors are delivering superlative performance, hardware efficiency, and performance vs. cost capability.
Learn more about the value Intel Xeon Scalable processors can bring to your HPC environment at http://www.intel.com/ssf.
i Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance/datacenter
ii Using a custom Wide Residual Network architecture
iii SURFsara would like to acknowledge PRACE for awarding access to the MareNostrum 4 resource based in Spain at BSC.