HPC in Life Sciences Part 1: CPU Choices, Rise of Data Lakes, Networking Challenges, and More

By John Russell

February 21, 2019

For the past few years HPCwire and leaders of BioTeam, a research computing consultancy specializing in life sciences, have convened to examine the state of HPC (and now AI) use in life sciences.

Without HPC writ large, modern life sciences research would quickly grind to a halt. It’s true most life sciences research computing is less focused on tightly-coupled, low-latency processing (traditional HPC) and more dependent on data analytics and managing (and sieving) massive datasets. But there is plenty of both types of compute and disentangling the two has become increasingly difficult. Sophisticated storage schemes have long been de rigueur and recently fast networking has become important (no surprise given lab instruments’ prodigious output). Lastly, striding into this shifting environment is AI – deep learning and machine learning – whose deafening hype is only exceeded by its transformative potential.

This year’s discussion included Ari Berman, vice president and general manager of consulting services, Chris Dagdigian, one of BioTeam’s founders and senior director of infrastructure, and Aaron Gardner, director of technology. Including Dagdigian, who focuses largely on the enterprise, widened the scope of insights so there’s a nice blend of ideas presented about biotech and pharma as well as traditional academic and government HPC.

Because so much material was reviewed we are again dividing coverage into two articles. Part One, presented here, examines core infrastructure issues around processor choices, heterogeneous architecture, network bottlenecks (and solutions), and storage technology. Part Two, scheduled for next week, tackles the AI’s trajectory in life sciences and the increasing use of cloud computing in life sciences. In terms of the latter, you may be familiar with NIH’s STRIDES (Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability) program which seeks to cut costs and ease cloud access for biomedical researchers.

Enjoy

HPCwire: Let’s tackle the core compute. Last year we touched potential rise of processor diversity (AMD, Intel, Arm, Power9) and certainly AMD seems to have come on strong. What’s your take on changes in core computing landscape?

Chris Dagdigian: I can be quick and dirty. My view in the commercial and pharmaceutical and biotech space is that, aside from things like GPUs and specialized computing devices, there’s not a lot of movement away from the mainstream processor platforms. These are people moving in 3-to-5-year purchasing cycles. These are people who standardized on Intel after a few years of pain during the AMD/Intel wars and it would take something of huge significance to make them shift again. In commercial biopharmaceutical and biotech there’s not a lot of interesting stuff going on in the CPU set.

The only other thing that’s interesting that’s happening is as more and more of this stuff goes to the cloud or gets virtualized, a lot of the CPU stuff actually gets hidden from the user. So there’s a growing part of my community (biomedical researchers in enterprise) where the users don’t even know what CPU their code is running on. That’s particularly true for things like AWS batch, and AWS Lambda (serverless computing services) and that sort of stuff running in the cloud. I think I’ll stop here are say on the commercial side they are slow and conservative and it’s still an Intel world and the cloud is hiding a lot of the true CPU stuff particularly as people go serverless.

Aaron Gardner: That’s an interesting point. As more clouds have adopted the Epyc CPU, some people may not realize they are running on them when they start instances. I would say also that the rise of informatics as a service and workflows as a service is going to abstract things even more. It’s relatively easy today to run most code with some level of optimization across the Intel and AMD CPUs. But the gap widens a bit when you talk about, is the code, or portions of it being GPU accelerated, or did you switch architectures from AMD64 to Power9 or something like that.

We talked last year about a transition from compute clusters being a hub fed by large-spoke data systems towards a data cluster where the hub is the data lake with its various moving pieces and storage tiers, but the spokes are all the different types of heterogeneous compute services that span and support the workload run on that system. We definitely have seen movement towards that model. If you look at all Cray’s announcements in the last few months, everything from what they are doing with Shasta and Slingshot, and work towards making the CS (cluster supercomputers) and XC (tightly coupled supercomputers) work seamlessly, interoperably, in the same infrastructure, we’re seeing companies like Cray and others gearing up for a heterogeneous future where they are going to support multiple processor architectures and optimize for multiple processor architectures as well as accelerators, CPUs and GPUs, and have it all work together in a coherent whole. That’s actually very exciting, because it’s not about betting on one particular horse or another; it’s about how well you are going to integrate across architectures, both traditional and non-traditional.

Ari Berman: Circling back to what Chris said. Life sciences historically has been sort of slow to jump in and adopt new stuff just to try it or to see if it will be three percent faster because the differences gained in knowledge generation at this point in life science for those three percent are not ground breaking – it’s fine to wait a little while. Those days, however, are dwindling because of the amount of data being generated and the urgency with which it has to be processed and also the backlog of data that has to be processed.

So we are not in life sciences at a point where – other than the differentiation of GPUs – applications are being designed specifically for different system processors other than for Intel. There’s some caveats to that. Normally as long as you can compile it and run it on one of the main system processors and it can run on a normal version of Linux, they are not optimizing for that; the exceptions to that are some of the built in math libraries that can be taken advantage of on the Intel system platform, some of the data offloading for moving data to and from CPUs from remote or even internally, memory bandwidth really matters a lot, and some of those things are differentiated based on what kind of research you are doing.

HPCwire: It sounds a little like the battle for mindshare and market share among processor vendors doesn’t matter as much in life sciences, at least at the user level. Is that fair?

Ari Berman: Well, we really like a lot of the future architectures. AMD is coming out with for better memory bandwidth to handle things like PCIe links, having new interconnects between CPUs, and also the connection to the motherboard. One of the big bottlenecks Intel still has to solve is how do you get data to and from the machine from external sources. Internally they have optimized the bandwidth a whole lot, but if you have huge central sources of data from parallel file systems, you still have to get it in and out of that system, and there are bottlenecks there.

Aaron Gardner: With the Rome architecture moving forward, AMD has provided a much better approach to memory access, moving away from NUMA (nonuniform memory) to a central memory controller with uniform latency across dies. This is really important when you have up to 64 cores per socket. Moving back towards a more favorable memory access model on a per node design level I think is really going to help provide advantages to workloads in the life sciences and that is certainly something we are looking at testing and exploring over the next year.

Ari Berman: I do think that for the first time in a while Power9 has some potential relevance, mostly because Summit and Sierra (IBM-based supercomputers) coming into play and those machines being built on Power9. I think people are exploring it but I don’t know that it will make much of a play outside of just pure HPC. The other thing I meant to bring up is a place where I think AMD is ahead of Intel in fab technology. AMD is already manufacturing at 7nm versus the 14nm. I thought that it was really innovative of AMD to do a multiple nanometer fabrication for their next release of processors where the IO core is 14nm and the processing core is 7nm because, just for power and distribution efficiency.

Aaron Gardner: In terms of market share, I think AMD has been extremely strategic over the last 18 months because when you look at places that got burned by AMD in the past when it exited the server market, there were not enough benefits to warrant jumping back in fully right away. But AMD is really geared towards the economies-of-scale type plays such as in the cloud where any advantage in efficiency is going to be appreciated. So I think they have been strategic [in choosing target markets] and we’ll see over the next couple of years how it plays out. I think we are at the moment not in a place where the client needs to specify a certain processor. We are going to see the integrators influence here, what they choose to put together in their heterogeneous HPC systems portfolio, influence what CPUs people get and that may really effect the winners and losers over time.

ARM we see continue to grow but not explosively and I’d say Power is certainly interesting. Having the large Power systems at the top of the TOP500 has really validated Power9 for use in capability supercomputing. How those are used though versus the GPUs for target workloads is interesting. In general we may be headed to a future where the CPU is used to turn on the GPU for certain workloads. Nvidia would probably favor that model. It’s just very interesting the interplay between CPU and GPU; it really does have to do with whether you are accelerating a small number of codes to the nth degree or you are trying to have more diverse application support which is where multiple CPU and GPU architectures are going to be needed.

Ari Berman: Using GPUs is still a huge thing for lots of different reasons. At the moment GPUs are hyped for AI and ML, but they have been used extensively for a lot of the simulation space, Schrodinger suite, molecular modeling, quantum chemistry, those sorts of things, and also down into phylogenetic inference, special inheritance, things like that. There are many great applications for graphic processors, but really I would agree with others that it really boils down to system processors and GPUs at the moment in life sciences. I did hear anecdotally from a couple of folks in the industry that were using the IBM Q cloud just to try quantum [computing], just to see how it worked with really high level genomic alignment and they kind of got it to work and I’ll leave it at that.

HPCwire: We probably don’t devote enough coverage to networking given its importance driven by huge datasets and the rise of edge computing. What’s the state of networking in life sciences?

Chris Dagdigian: In pharmaceuticals and biotech, Ethernet rules the world. The high speed low latency interconnects are still in niche environments. When we do see non-ethernet fabrics in the commercial world they are being used for parallel filesystems or in specialized HPC chemistry & molecular modeling application environments where MPI message passing latency actually matters. However I will bluntly say networking speed is now the most critical issue in my HPC world. I feel that compute and storage at petascale are largely tractable problems. Moving data at scale within an organization or outside the boundaries of your firewall to a collaborator or a cloud is the single biggest rate limiting bottleneck for HPC in pharma and biotech. Combine with that the cost high speed Ethernet has not gone down as fast as the cost of commoditization in storage and compute. So we are in this double whammy world where we desperately need fast networks.

The corporate networking people are fairly smug about the 10 gig and 40 gig links they have in the datacenter core whereas we need 100 gig networking going outside the datacenter, 100 gig going outside the building, sometimes we need 100 gig links to a particular lab. Honestly the way that I handle this in enterprise is I am helping research organizations become a champion for the networking groups; they traditionally are under budgeted and don’t typically have 40 gig and 100 gig and 400 gig on their radar because you know they are looking at bandwidth graphs for their edge switches or their firewalls and they just don’t see the insane data movement that we have to do between the laboratory instrument and a storage system. The second thing, and I have utterly failed at it, is articulating that there are products other than Cisco in the world. That argument does not fly in enterprise because there is a tremendous installed base. So I am in the catch 22 of I pay a lot of money for Cisco 40 gig and 100 gig and I just have to live with it.

Ari Berman: I would agree networking is one of the major challenges. Depending on what granularity you are looking at, I think most of the HPCwire readers will care a lot about interconnects on clusters. Starting there, I would say we are seeing a fairly even distribution of pure Ethernet on the back end because of vendors like Arista for instance, which is producing more affordable 100 gig low latency Ethernet that can be put on the back end so you don’t have to do the whole RDMA versus TCP/IP dance necessarily. But most clusters are still using InfiniBand on their back end.

In life sciences I would say that we still see Mellanox predominantly as the back end. I have not seen life-science-directed organizations [use] a whole lot of Omni-Path (OPA). I have seen it at the NSF supercomputer centers, used to great effect, and they like it a lot, but not really so much in life sciences. I’d say the speed and diversity and the abilities of the Mellanox implementation could really outclass what is available in OPA today. I think the delays in OPA2 have hurt them. I do think the new interconnects like Shasta/Slingshot from Cray are paving the way to producing a reasonable competitor to where Mellanox is today.

Moving out from that, Chris is right. There are so many people using the cloud that don’t upgrade their internet connections to a wide enough bandwidth or take their security enough out of the way or optimize it enough so that people can effectively use the cloud for data-intensive applications, that getting the data there is impossible. You can use the cloud but only if the data is already there. That’s a huge problem.

Internally, a lot of organizations have moved to hot spots of 100 gig to be able to move data effectively between datacenters and from external data sources but a lot of 10 gig still predominates. I’d say that there is a lot of 25 gig implementations and 50 gig implementations now. 40 gig sort went by the wayside. That’s because of the 100 gig optical carriers where they are actually made up of four individual wavelinks and so what they did was to just break those out and so the form factors have shrunk.

Going back to the cluster back end. In life sciences the reason high performance networking on the back end of a cluster is really important isn’t necessarily for inter-process communications, it’s for storage delivery to nodes. Almost every implementation has a large parallel distributed file system where all of the data are coming from at one point or another. You have to get them to the CPU and that backend network needs to be optimized for that traffic.

Aaron Gardner: That’s a common case in the life sciences. We primarily look at storage performance to bring data to nodes and even to move between nodes versus message passing for parallel applications. That’s starting to shift a little bit but that’s traditionally been how it is. We usually have looked at a single high performance fabric talking to a parallel files system. Whereas HPC as a whole has for a long time dealt with having a fast fabric for internode communications for large scale parallel jobs and then having a storage fabric that was either brought to all of the nodes or somehow shunted into the other fabric using IO router nodes.

“One of the things that is very interesting with Cray announcing Slingshot is the ability to speak both an internal low latency HPC optimized protocol as well as Ethernet, which in the case of HPC storage removes the need for IO router nodes, instead allowing the HCA (host channel adapters) and switching to handle the load and protocol translation and all of that. Depending on how transparent and easy it is to implement Slingshot at the small and mid-scale I think that is a potential threat to the continued prevalence of traditional InfiniBand in HPC, which is essentially Mellanox today.”

HPCwire: We’ve talked for a number of years about the revolution in life sciences instruments, and how the gush of data pouring from them overwhelms research IT systems. That has put stress on storage and data management. What’s you sense of the storage challenge today?

Chris Dagdigian: My sense is storing vast amounts of data is not particularly challenging these days. There’s a lot of products on the market, very many vendors to choose from, and the actual act of storing the data is relatively straightforward. However, no one has centrally cracked the how we manage it, how do we understand what we’ve got on disk, how do we carefully curate and maintain that stuff. Overwhelmingly the dominant storage pattern in my world is if they are not using a parallel files system for speed it’s overwhelmingly scale-out network attached storage (NAS). But we are definitely in the era where some of the incumbent NAS vendors are starting to be seen as dinosaurs or being placed on a 3-year or 4-year upgrade cycle.

The other thing is there’s still a lot of interest in hybrid storage, storage that spans the cloud and can be replicated into the cloud. The technology is there but in many cases the pipes are not. So it is still relatively difficult to either synchronize or replicate and maintain a consistent storage namespace unless you are a really solid organization with really fast pipes to the outside world. We still see the problems of lots of islands of storage. The only other thing I will say is I am known for saying the future of scientific data at rest belongs in an object store, but that it’s going to take a long time to get there because we have so many dependencies on things that expect to see files and folders. I have customers that are buying petabytes of network attached storage but at the same time they are also buying petabytes of object storage. In some cases they are using the object storage natively; in other cases the object storage is their data continuity or backup target.

In terms of file system preference, the commercial world is not only conservative but also incredibly concerned with admin burden and value so almost universally it is going to be a mainstream choice like GPFSsupported by DDN or IBM. There are lots of really interesting alternatives like BeeGFS but the issue really is the enterprise is nervous about fancy new technologies, not because of the fancy new technologies but because they have to bring new people in to do the care and feeding.

Aaron Gardner: Some of the challenges with how we see storage deployed across life science organizations is how close to the bottom have they been driven. With traditional supercomputing, you’re trying to get the fastest storage you can, and the most of it, for the least amount of money. The support needed is not the primary driver. In HPC as a whole, Lustre and GPFS/Spectrum Scale are still the predominate players in terms of parallel file system. The interesting stuff over the last year or so has been Lustre trading hands (from Intel to DDN). With DDN leading the charge, the ecosystem is still being kept open and I think carefully crafted so other vendors can provide solutions independently from DDN. We do see IBM stepping up Spectrum Scale performance and Spectrum Scale 5offering a lot of good features proven out and demonstrated on the Summit and Sierra type systems, making Spectrum Scale every bit as relevant as it ever was.

As far as performant parallel file systems there are interesting alternatives. There is more presence and momentum behind BeeGFS than we have seen in prior years. We see some adoption and clients interested in trying and adopting it but the number deployments in production and at a large scale is still pretty limited.

These days object storage is seen more like a tap that you turn on and you are getting your object storage through AWS or Azure or GCP. If you are buying it for on-premise, there’s little differentiation seen between object vendors. That’s the perception at least. We are seeing interest in what we call next generation storage systems and file systems – things like WekaIO that provide NVMe over fabrics (NVMeOF) on the front end and export their own NVMeOF native file system as opposed to block storage. This removes the need to use something like Spectrum Scale or Lustre to provide the file system and can drain cold data to object storage either on premise or in the cloud. We do see that as a viable model moving forward.

I would add say that speaking to NVME over fabrics in general; that it seems to be growing and becoming established as most of the new storage vendors coming on the scene are currently architecting that way. That’s good in our book. We certainly see performance advantages but it really matters how it’s done—it is important that the software stack driving the NVME media has been purpose built for NVME over fabrics or at least significantly redesigned. Something ground up like WekaIO or VAST will perform very well. On the other hand you could choose NVME over fabrics as the hardware topology for a storage system, but if you then layer on a legacy file system that hasn’t been updated for it you might not see much benefit.

Couple of other quick notes. It seems like storage benchmarking in HPC has been receiving more attention both in terms of measuring throughput and metadata operations, with the latter being valued and seen as one of the primary bottlenecks that govern the absolute utility of a cluster. For projects like the IO500 we’ve seen an uptick in participation, both from national labs as well as vendors and other organizations. The last thing worth mentioning is data management. Scraping data for ML training data sets, for example, is one of the things driving us to understand the data we store better than we have in the past. One of the simple ways to do that is to tag your data and we are seeing more files systems coming on the scene with a focus on tagging as a core in-built feature. So while they come at the problem from different angles you could look at what companies like Atavium is doing for primary storage or Igneous for secondary storage, providing the ability to tag data on ingest and the ability to move data (policy-driven) according to tags. This is something that we have talked about for a long time and have helped a lot of clients tackle.”

Link to Part Two (HPC in Life Sciences Part 2: Penetrating AI’s Hype and the Cloud’s Haze)

Topics: AI, Backup / Recovery, Business, Datacenters, Developer Tools, Interconnects, Networking, People, Processors, Research, Storage, Systems, Use Cases, Vendor News, Virtualization

Sectors: Academia & Research, Government, Life Sciences

Tags: artificial intelligence, BioTeam, HPC, life sciences

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

2024 Winter Classic: Texas Two Step

April 18, 2024

Texas Tech University. Their middle name is ‘tech’, so it’s no surprise that they’ve been fielding not one, but two teams in the last three Winter Classic cluster competitions. Their teams, dubbed Matador and Red Read more…

2024 Winter Classic: The Return of Team Fayetteville

April 18, 2024

Hailing from Fayetteville, NC, Fayetteville State University stayed under the radar in their first Winter Classic competition in 2022. Solid students for sure, but not a lot of HPC experience. All good. They didn’t Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use of Rigetti’s Novera 9-qubit QPU. The approach by a quantum Read more…

2024 Winter Classic: Meet Team Morehouse

April 17, 2024

Morehouse College? The university is well-known for their long list of illustrious graduates, the rigor of their academics, and the quality of the instruction. They were one of the first schools to sign up for the Winter Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pressing needs and hurdles to widespread AI adoption. The sudde Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Google Announces Homegrown ARM-based CPUs

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in it Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Google Addresses the Mysteries of Its Hypercomputer

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Click Here for More Headlines

HPCwire is a registered trademark of Tabor Communications, Inc. Use of this site is governed by our Terms of Use and Privacy Policy.

Reproduction in whole or in part in any form or medium without express written permission of Tabor Communications, Inc. is prohibited.

Leading Solution Providers

Off The Wire

Industry Headlines

April 18, 2024

April 17, 2024

April 16, 2024

Subscribe to HPCwire's Weekly Update!

Kathy Yelick on Post-Exascale Challenges

2024 Winter Classic: Texas Two Step

2024 Winter Classic: The Return of Team Fayetteville

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

2024 Winter Classic: Meet Team Morehouse

MLCommons Launches New AI Safety Benchmark Initiative

Kathy Yelick on Post-Exascale Challenges

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

MLCommons Launches New AI Safety Benchmark Initiative

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

Nvidia’s GTC Is the New Intel IDF

Google Announces Homegrown ARM-based CPUs

Nvidia H100: Are 550,000 GPUs Enough for This Year?

Synopsys Eats Ansys: Does HPC Get Indigestion?

Intel’s Server and PC Chip Development Will Blur After 2025

Choosing the Right GPU for LLM Inference and Training

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

Google Addresses the Mysteries of Its Hypercomputer

How AMD May Get Across the CUDA Moat

Leading Solution Providers

Contributors

Tiffany Trader

Editorial Director

Douglas Eadline

Managing Editor

John Russell

Senior Editor

Kevin Jackson

Contributing Editor

Ali Azhar

Contributing Editor

Alex Woodie

Contributing Editor

Addison Snell

Contributing Editor

Drew Jolly

Assistant Editor

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

China Is All In on a RISC-V Future

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

DoD Takes a Long View of Quantum Computing

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

Eyes on the Quantum Prize – D-Wave Says its Time is Now

GenAI Having Major Impact on Data Culture, Survey Says

The GenAI Datacenter Squeeze Is Here

The Information Nexus of Advanced Computing and Data systems for a High Performance World

Share

Copy short link