When the Chips Are Down

By Dairsie Latimer

January 11, 2018

In the last article (“The High Stakes Semiconductor Game that Drives HPC Diversity”) I alluded to the challenges facing the semiconductor industry and how that may impact the evolution of HPC systems over the next few years. I thought I’d lift the covers a little and look at some of the commercial challenges that impact the component technology we use in HPC.

The semiconductor market has experienced a sustained growth phase over the last twenty years but there have been periods where year on year revenues have collapsed (in particular 2001 and 2009). Despite our almost insatiable appetite for new technology (almost all predicated on semiconductors) the commodity semiconductor industry is as vulnerable to boom and bust as any other market. What triggers one of these cycles is usually down to the confluence of one or more issues around:

• Capital spending and investment patterns
• Process migration challenges
• Change in demand (often due to disruptive technologies)

Like most of forecasting the only truly precise science is done retrospectively. It is notoriously hard to predict when a disruptive technology will really fulfill its full potential as there are usually so many interdependencies which can disturb momentum. Falter for too long and the next ‘big thing’ comes along and the ever fickle market starts investing somewhere else.

Of the main three triggers, capital spending is probably the easiest to track and like unemployment in traditional economics, it is responsible hysteresis in the consumer semiconductor market. In other words, the consequences of an over or under investment in capital spending (for new fabs, process and technology transitions, etc.) means there is an inevitable time lag before the effects can be addressed.

Historically when the semis are profitable and can see strong demand for a product, they invest in additional manufacturing capacity (which takes some time to come online – order of three-plus years for a greenfield site). As this new capacity starts to ramp, depending on how firm the demand for the commodity is, price competition starts to kick in and margins erode. If there is significant oversupply in the market then the price can collapse, with the attendant short term cheer for consumers. However, when this happens it causes a glitch in the capital spending pattern which then drives the cycle again as demand again outstrips supply and prices rise.

Now in the last decade there have been two significant changes to the way the semiconductor market has evolved. The first is the dramatically increasing capital costs for developing lower geometry processes for high capacity foundries. Secondly there has been a steady reduction in the number of companies who can afford to invest the $8B+ per modern fab. That list probably stands at fewer than ten that can really afford to do it without direct state intervention.

That said, the strong growth in demand for semiconductors means current investment levels are at record highs. So will the market experience a bust cycle in the next few years? Probably not just yet, but that certainly doesn’t mean the HPC market is out of the woods.

We’re all aware of the precipitous climb in memory costs for HPC systems in the last twelve months. Why you ask, is this happening in a market where the investment in capacity is at record levels? Well now we come back to the risks attached to relying on commodity and therefore consumer electronics to drive the development process.

The bulk of the commodity DRAM is aimed at the mobile and consumer market, where memory capacity of 4-8 GB is the sweet spot (roughly 0.5 – 1 GB per core). While memory requirements for HPC applications vary significantly, a typical installation aims for 1-2GB per core and for the larger scale installations that is creeping upwards steadily.

What this means is that the volume in the market is biased towards the commodity DDR4 density which simply isn’t designed to deliver 16/32/64 GB per DIMM (lets ignore the insanity of 128 GB DRAM DIMMS for now). This goes back to my original thesis, which is that much of HPCs ‘free lunch’ (thanks Herb) is over and we will return to value rather than purely commodity driven pricing.

We have a not dissimilar pattern emerging from the NAND market, where 3D NAND has apparently ridden to the rescue for NAND scaling. Sadly it’s not just capacity that HPC wants but the low latency and endurance. This is different from what the hyperscalers and mobile device vendors typically want which is capacity and low cost.

It also exposes one of the major problems associated with the shift to lower process geometries, deep 3D structures, chip-stacking and complex packaging for ASICs, which is that around 30 percent of the cost of a device is now in the assembly and functional testing portion of the supply and value chain. It adds no intrinsic value but without it you may just have a pile of very expensive sand.

Deep 3D integration also has its own problems, predominantly the sheer number of process steps required for a 48, 64 or 96 string NAND device, which all add complexity and more importantly cost to the supply chain. This means that the cost per bit is not declining as fast as many hoped and according to recent reports it means that NAND will not become price comparable with HDDs for bulk storage for most of the next decade. We’ll ignore for now, just how much more investment would be required to actually deliver the same total capacity to the market that HDDs can for another day.

The issues around process migration have been well documented, as has the parlous state of Moore’s law, the slow ramp of EUV and 450 mm wafers as cost reduction tools and the lack of a clear process path past 7 nm. The semiconductor industry has surpassed even Harry Houdini’s talent of getting out of sticky situations but it is arguably facing its greatest test yet in continuing to innovate and also meet consumer expectations for a continued increase in capacity and capability while also pretending that some things aren’t going to get more difficult and as a result more expensive.

This has profound implications for the HPC market. We’ve long ago accepted that we couldn’t rely on Moore’s law for CPUs to get us to exascale (at least in under 20 MW), for all of the reasons outlined above and many more that are related. So where does that leave us?

Well in the immediate term it leaves us with higher DRAM prices for the next year at least, possibly more until some new fab capacity arrives. Better hope that consumer devices double or better yet quadruple their typical DRAM load out, as that would start to shift the DRAMurai’s focus to higher density memory devices. This would somewhat ease the current situation for HPC which is certainly seeing the effects of higher DRAM prices.

Of course, since we’re already looking at major changes to the memory hierarchy with the introduction of NVDIMMs (NAND or 3DXPoint) this may not turn out to be such a problem. That said NAND prices also rose significantly in 2017, though there is some hope that supply will start to pick up later in 2018 now that the transition to 3D NAND is well under way for most suppliers.

We’ve covered the fact that there is now some real choice in the HPC CPU space and this will certainly help mitigate the spiraling component costs for other aspects of HPC systems but inevitably much of this will eventually be passed on to the customer. Couple that with a rather difficult childhood for the 10 nm process node and the expectation that the transition to 7 nm will be just as challenging, then we’re looking at a period of real pricing uncertainty.
Add in the uncertainties around the latest round of security exploits made public in the last few days and the CPU vendors are going to have to do some hard thinking.

As an industry we should also do some collective soul searching. Do we really care about performance almost to the exclusion of all other imperatives? Or are we now sufficiently concerned about the implications for ‘Meltdown’ and ‘Spectre’ that we pay a bit more than lip-service to the mantra that security is important. The fact is that security isn’t sexy. Most of us don’t get excited about architectural approaches to security and especially not if we’re told that there is a performance penalty. Are we willing to pay the price, especially in time, taken to fix these issues and protect against other similar escapes?

Remember what I said about change in demand?

As we are constantly assured by those promoting Brexit in the UK “Where there’s uncertainty there is opportunity.” So it will be interesting to see how the CPU vendors respond. It will be even more interesting to see how we as consumers change our buying decisions based on what we learn over the next few months.

The changes that will inevitably be needed to address these sorts of vulnerabilities will create other opportunities and I imagine open up the field a little more than we expected even a few weeks ago.

About the Author

Dairsie Latimer has an impeccable and diverse career in IT, having worked in a variety of roles on supplier side and client side across the commercial and public sectors as a consultant and software engineer. Following an early career in computer graphics, micro-architecture design and full stack software development, he has over twelve years’ specialist experience in the HPC sector, ranging from developing low-level libraries and software for novel computing architectures to porting complex HPC applications to a range of accelerators.

As Managing Consultant at Red Oak Consulting, Dairsie advises clients on strategy, technology futures, HPC procurements and managing challenging technical projects. For more information, visit www.redoakconsulting.co.uk.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Intersect360 Research Takes a Deep Dive into the HPC-AI Market in New Report

May 1, 2024

A new report out of analyst firm Intersect360 Research is shedding some new light on just how valuable the HPC and AI market is. Taking both of these technologies as a singular unit, Intersect360 Research found that the Read more…

Qubit Watch: Intel Process, IBM’s Heron, APS March Meeting, PsiQuantum Platform, QED-C on Logistics, FS Comparison

May 1, 2024

Intel has long argued that leveraging its semiconductor manufacturing prowess and use of quantum dot qubits will help Intel emerge as a leader in the race to deliver practical quantum computing - a race that James Clarke Read more…

Amazon’s New AI Assistant Is an Editor to Prevent Hallucinations

May 1, 2024

Large-language models regularly spit out off-the-rails answers, and companies are introducing editors and guardrails to ensure that responses from AI are more on point. Amazon this week announced the general availabil Read more…

Intel’s Next-gen Falcon Shores Coming Out in Late 2025 

April 30, 2024

It's a long wait for customers hanging on for Intel's next-generation GPU, Falcon Shores, which will be released in late 2025.  "Then we have a rich, a very aggressive cadence of Falcon Shores products following that Read more…

Stanford HAI AI Index Report: Science and Medicine

April 29, 2024

While AI tools are incredibly useful in a variety of industries, they truly shine when applied to solving problems in scientific and medical discovery. Researching both the world around us and the bodies we inhabit has c Read more…

Atos/Eviden Find a Strategic Path Forward

April 29, 2024

French IT giant Atos seems to have found a path forward. In recent years, Atos has been struggling financially and has not had much luck finding a buyer for some or all of its technology. Atos is the parent of the Read more…

Qubit Watch: Intel Process, IBM’s Heron, APS March Meeting, PsiQuantum Platform, QED-C on Logistics, FS Comparison

May 1, 2024

Intel has long argued that leveraging its semiconductor manufacturing prowess and use of quantum dot qubits will help Intel emerge as a leader in the race to de Read more…

Stanford HAI AI Index Report: Science and Medicine

April 29, 2024

While AI tools are incredibly useful in a variety of industries, they truly shine when applied to solving problems in scientific and medical discovery. Research Read more…

IBM Delivers Qiskit 1.0 and Best Practices for Transitioning to It

April 29, 2024

After spending much of its December Quantum Summit discussing forthcoming quantum software development kit Qiskit 1.0 — the first full version — IBM quietly Read more…

Shutterstock 1748437547

Edge-to-Cloud: Exploring an HPC Expedition in Self-Driving Learning

April 25, 2024

The journey begins as Kate Keahey's wandering path unfolds, leading to improbable events. Keahey, Senior Scientist at Argonne National Laboratory and the Uni Read more…

Quantum Internet: Tsinghua Researchers’ New Memory Framework could be Game-Changer

April 25, 2024

Researchers from the Center for Quantum Information (CQI), Tsinghua University, Beijing, have reported successful development and testing of a new programmable Read more…

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Poin Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Resear Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

Intel Plans Falcon Shores 2 GPU Supercomputing Chip for 2026  

August 8, 2023

Intel is planning to onboard a new version of the Falcon Shores chip in 2026, which is code-named Falcon Shores 2. The new product was announced by CEO Pat Gel Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire