Everyone in HPC knows the general rule, “the faster the memory, the better,” but drilling down into this rule of thumb, a better question is, “How much faster and how much better?” Fortunately, this notion can be easily tested with the right hardware.
Recently, over on the Phoronix site, Linux benchmark and testing hero Michael Larabel evaluated the difference between DDR5-4800 and DDR5-5600 memory using a Xeon Platinum 8592+ processor.
The tested DDR5-4800 (63 GB/s) modules were Samsung M321R8GA0BB0-CQKEG DIMMs with a CAS latency of 40. The faster DDR5-5600 (69 GB/s) modules were Kingston KSM56R46BD4PMI-64HAI with a CAS latency of 46. The memory speed is how fast memory data can move on and off the DIMM (higher is better), and latency is how long it takes to set up the movement (lower is better).
The server has 16 DDR5 Registered ECC memory modules and two (dual-socket) Intel Xeon Platinum 8592+ processors running Ubuntu 23.10. Other than the different types of memory, everything else remained the same in the tests.
The first example in Figure 1 is the HPCG test, which is known to hammer on the memory. The DDR5-5600 DIMMS provided a performance boost of 10% (which is approximately the difference between the memory speeds). The improvement is noteworthy, and the maximum one would expect.
Tuning to another application from the NAS Parallel Benchmarks (V3.4), the BT (Block Tri-diagonal solver, Class C) benchmark did not fair as well. Figure 2 indicates a small increase of only 2%.
Another test is the popular GROMACS molecular dynamics application. Interestingly, the results shown in Figure 3 show about a 1% slowdown in performance for the faster DDR5-5600 memory.
The results for faster memory seem to go from great to small to retrograde. Clearly something is influencing the results other than memory bandwidth. Besides the speed, the CAS latency is different for each of the memory modules (40 ns for the DDR5-4800 and 46 ns for the DDR5-5600),
Digging deeper, we note that CAS is an acronym for the Column Address Strobe. The CAS signal is presented to the memory module, and the time (in nanoseconds) at which the corresponding data is made available by the memory module is the latency. Thus, when reading the memory, there is a wait until it becomes available. Though not much different, the faster latency is likely the reason GROMACS does slightly better with the slower memory.
The non-intuitive GROMACS result is a good example of why testing assumptions with benchmarks is important. In this case, upgrading or buying a server with the fastest memory may not result in better performance. In the case of GROMACS, latency matters, and the increased memory speed is not enough to overcome the increase in CAS timings. (There are also memory and cache access patterns that contribute to performance, but using the same application keeps these constant.)
The Phoronix article has many more test results, including other NAS Parallel benchmarks. The results vary for many non-HPC applications, and often, there is little or no improvement. Remember, benchmarks tell the truth.