A “fat node” in HPC clusters is a node with a large amount of compute and shared memory for complex, data-intensive workloads. Fat nodes are useful whenever an HPC job is too big or complex to break up and run across multiple smaller servers. They are also useful when scientists need to accelerate time to discovery, by running more jobs in less time on one large fast server.
However, it can be difficult to add fat nodes to HPC clusters when there is a need to manage them separately from other servers. Such management burden takes time away from science and may require specialized skills.
HPE Superdome Flex is a perfect fit for the role of fat node, with its massive scalability, shared memory and performance. And now it’s easier to integrate within HPC clusters using HPE Performance Cluster Manager (HPCM) software, which provides a unified solution for managing all HPC systems from HPE.
Let’s look at fat node use cases, how system management fits into the HPC software stack, and the benefits of HPCM.
When do fat nodes add value?
Fat nodes can add value to HPC clusters when workloads are too big or too complex to run across multiple nodes.
For example, in genomics, genome mapping might compare billions of small sequences and terabytes of data with a previously assembled genome until complete. This is easier to do with a large single server than to (a) break up the data sets and their complex relationships, (b) re-design applications to orchestrate a cluster of smaller servers, and (c) reassemble everything at the end.
In fraud detection, fat nodes can detect bad actors faster and more accurately by finding patterns and anomalies in transaction data, and reduce false positives with a high level of accuracy. Some complex computer-aided engineering (CAE) workloads such as electromagnetic simulation or computational fluid dynamics can also benefit from fat nodes.
In these cases, investment in a fat node is likely to save time and significantly accelerate insights.
How complex system management impacts HPC
Based on years of talking to customers, HPE finds that managing HPC clusters has always been a top challenge.
An IDC analysis[i] of the HPC software stack identifies HPC system management as a crucial component that provisions, manages, and monitors clusters, impacting:
- System setup speed
- Time spent on administration
- Cluster performance optimization
- Compatibility with HPC applications software
Having a heterogeneous, standardized management environment – i.e. a cluster of servers that can all be managed as one platform – is needed to achieve optimal HPC performance and cost efficiency.
Keep HPC systems at peak performance with HPE Performance Cluster Manager (HPCM)
HPCM is a fully integrated system management solution for all HPE HPC systems that:
- Improves day-to-day HPC management – With fast system setup from bare metal, comprehensive hardware monitoring and management, image management and software updates and power management.
- Manages every aspect of the cluster – Including GPUs, CPUs, interconnect, software, jobs, power and cooling.
- Manages HPC anywhere –supporting on-premise and hybrid deployments.
HPCM now supports HPE Superdome Flex
HPCM now supports HPE Superdome Flex, so customers can easily fit these powerful fat nodes into HPC clusters without having to manage them differently.
Now customers can build HPC clusters that deliver accelerated time to insight with Superdome Flex-based fat nodes, and achieve optimal cost-performance with a standardized management environment.
To learn more visit hpe.com/superdome and download the HPCM infographic
[i] High Performance Computing (HPC) Software Stack, Marketplace Research, IDC, August 2022