The default method for accelerating Deep Learning projects is increasing the size of a GPU cluster. However, the cost is increasingly prohibitive. According to Andreessen Horowitz, many companies investing in AI ‘spend more than 80% of their total capital raised on compute resources,’ and rightly so. GPUs are the cornerstone of AI infrastructure and as much budget as possible should be allocated to them. However, there are other ways to raise performance that should be considered and are becoming increasingly necessary amid these high costs.
Expanding a GPU cluster is far from straightforward, especially as generative AI has accelerated shortages. NVIDIA A100 GPUs were some of the first to be impacted (reported increases by up to 40% above MSRP according to WCCFtech) and they are now so scarce that the lead time for some versions is up to a year. These supply chain challenges have forced many to consider the even higher end H100s as an alternative, but a server full will be accompanied by a markedly higher price tag.
The hyperscalers are understandably picking up every piece of silicon they can get as they have the price point is less of a concern for them. But for those investing in their own infrastructure to create the next great generative AI solution for their industry, this development shines a light on the importance of squeezing every drop of efficiency from existing GPUs.
Let’s take a look at how a business can extract more out of its compute investment by proposing modifications on the design of AI infrastructure with networking and storage.
The Data Problem
If a project can’t wait until the shortage cools down, or its budget doesn’t provide carte blanche, a helpful approach is to consider the inefficiencies in existing compute infrastructure and how to mitigate for the best possible utilization from those resources. Maximizing GPU utilization is a challenge simply because the data is often delivered too slowly to keep GPUs busy. Some users have GPU utilization ratios as low as 20%, which is clearly not acceptable. This is a good place for AI teams to start to look for ways to maximize their AI investments.
GPUs are the engine of an AI environment. Just as a car engine requires gasoline to run, GPUs run on data. Restricting the flow of data limits GPU performance. If the GPUs are only working at even 50% efficiency, the AI team is less productive, a project will take twice as long to complete, and ROI is halved. It is imperative that infrastructure design ensures that the GPUs will run at full efficiency and deliver the compute performance expected.
How are You Delivering Data to Your GPUs?
It’s worth noting that both DGX A100 & H100 servers come with internal storage capacity of up to 30 terabytes. However, this capacity is not feasible for the vast majority of Deep Learning models considering that the average model size is roughly 150 terabytes. Hence the need for additional external data storage to keep GPUs fed with data.
While additional storage can sometimes simply mean attaching a ‘JBOD’ (just a bunch of drives) in certain environments, this is not the case in AI. So, what kind of storage is needed?
Storage Performance
AI Storage is made up of a server, NVMe SSDs and storage software, usually packaged up in a simple appliance. Just as GPUs are optimized for processing massive amounts of data in parallel with hundreds of thousands of cores, the storage that feeds the network also needs to be high performance. The fundamental requirement of storage in AI is – as well as storing the whole dataset – to have the capability to deliver the data to the GPUs at wire speed (as fast as the network will allow) in order to saturate GPUs and keep them running efficiently. Anything less is underutilizing this very costly and valuable GPU resource.
Delivering the data at speeds capable of keeping up with a cluster of 10 or 15 GPU servers operating at full speed will help to optimize GPU resources and create a gain in performance across the entire environment, making the best possible use of the budget to get the most from the infrastructure as a whole.
The challenge in fact is that storage vendors who are not optimized for AI require many client compute nodes to extract the full performance from the storage. If starting with one GPU server, it will conversely require many storage nodes to hit that performance to power a single GPU server.
Do not believe all benchmark results; it is easy to gain large bandwidth figures when using several GPU servers at the same time, but AI benefits from storage that will deliver all of its performance to a single GPU node whenever needed. Stick to storage that delivers the ultra-high performance that is required, but that does this in a single storage node and is capable of delivering this performance to a single GPU node. This may narrow the market down, but it is high on the list of priorities when starting out on an AI project journey.
Network Bandwidth
Ever more powerful compute capabilities drive constantly increasing demands on the rest of the AI infrastructure. Bandwidth requirements have reached new heights to be able to manage the massive amounts of data being sent across the network from storage every second to be processed by GPUs. The network adapters (NICs) in a storage device connect to the switches in a network which connect to the adapters inside the GPU server. NICs can connect storage directly to the NICs in 1 or 2 GPU servers with no bottlenecks when configured correctly, but always consult a solution provider for advice on networking.
Ensuring the bandwidth is high enough to pass the maximum data load from storage through to the GPUs to keep them saturated over sustained periods is the key and failure to do this is in many cases the reason why we see lower GPU utilization.
GPU Orchestration
Once the infrastructure is in place, GPU orchestration and allocation tools greatly help teams to pool and allocate resources more efficiently, get visibility into GPU usage, provide a higher level of control of resources, reduce bottlenecks and increase utilization. These tools can only do all of this as intended if the underlying infrastructure allows the data to flow correctly in the first place.
The Role of Data in AI
In AI, the data is the input, so lots of the great features of traditional enterprise flash storage for a business’ mission critical applications such as stock control database servers, email servers, backup servers are simply not relevant for AI. These solutions were built using legacy protocols and while they have been re-purposed for AI, these legacy foundations demonstrably limit their performance for GPU and AI workloads, drive prices up and waste funds on overly expensive and unnecessary features.
With the current worldwide GPU shortage, combined with a burgeoning AI sector, it’s never been more important to find ways to maximize GPU performance – especially for the short term. These are a few of the key ways to keep costs down and output high as the Deep Learning projects continue to flourish.
Stevie Lanigan is Partnerships Director at PEAK:AIO. Stevie is a highly skilled sales and business development leader with a wealth of experience in the global OEM & AI startup worlds. With a proven track record of success and a passion for delivering innovative solutions to customers, Stevie has built and managed several multi-million dollar partnerships, providing cutting-edge products and services. Through a deep understanding of customer needs and a keen eye for market trends, Stevie has helped drive growth for businesses across a range of industries.
Stevie’s success is built on a foundation of strong leadership skills, strategic thinking, and a collaborative approach to problem-solving. With a focus on building high-performing teams and empowering individuals to achieve their full potential, Stevie has created a culture of innovation and excellence that has propelled businesses to new heights.
Whether working with established global OEMs or fast-paced AI startups, Stevie brings a unique perspective and a deep knowledge of the industry to every project.