For the past few years HPCwire and leaders of BioTeam, a research computing consultancy specializing in life sciences, have convened to examine the state of HPC (and now AI) use in life sciences.
Without HPC writ large, modern life sciences research would quickly grind to a halt. It’s true most life sciences research computing is less focused on tightly-coupled, low-latency processing (traditional HPC) and more dependent on data analytics and managing (and sieving) massive datasets. But there is plenty of both types of compute and disentangling the two has become increasingly difficult. Sophisticated storage schemes have long been de rigueur and recently fast networking has become important (no surprise given lab instruments’ prodigious output). Lastly, striding into this shifting environment is AI – deep learning and machine learning – whose deafening hype is only exceeded by its transformative potential.
This year’s discussion included Ari Berman, vice president and general manager of consulting services, Chris Dagdigian, one of BioTeam’s founders and senior director of infrastructure, and Aaron Gardner, director of technology. Including Dagdigian, who focuses largely on the enterprise, widened the scope of insights so there’s a nice blend of ideas presented about biotech and pharma as well as traditional academic and government HPC.
Because so much material was reviewed we are again dividing coverage into two articles. Part One, presented here, examines core infrastructure issues around processor choices, heterogeneous architecture, network bottlenecks (and solutions), and storage technology. Part Two, scheduled for next week, tackles the AI’s trajectory in life sciences and the increasing use of cloud computing in life sciences. In terms of the latter, you may be familiar with NIH’s STRIDES (Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability) program which seeks to cut costs and ease cloud access for biomedical researchers.
HPCwire: Let’s tackle the core compute. Last year we touched potential rise of processor diversity (AMD, Intel, Arm, Power9) and certainly AMD seems to have come on strong. What’s your take on changes in core computing landscape?
Chris Dagdigian: I can be quick and dirty. My view in the commercial and pharmaceutical and biotech space is that, aside from things like GPUs and specialized computing devices, there’s not a lot of movement away from the mainstream processor platforms. These are people moving in 3-to-5-year purchasing cycles. These are people who standardized on Intel after a few years of pain during the AMD/Intel wars and it would take something of huge significance to make them shift again. In commercial biopharmaceutical and biotech there’s not a lot of interesting stuff going on in the CPU set.
The only other thing that’s interesting that’s happening is as more and more of this stuff goes to the cloud or gets virtualized, a lot of the CPU stuff actually gets hidden from the user. So there’s a growing part of my community (biomedical researchers in enterprise) where the users don’t even know what CPU their code is running on. That’s particularly true for things like AWS batch, and AWS Lambda (serverless computing services) and that sort of stuff running in the cloud. I think I’ll stop here are say on the commercial side they are slow and conservative and it’s still an Intel world and the cloud is hiding a lot of the true CPU stuff particularly as people go serverless.
Aaron Gardner: That’s an interesting point. As more clouds have adopted the Epyc CPU, some people may not realize they are running on them when they start instances. I would say also that the rise of informatics as a service and workflows as a service is going to abstract things even more. It’s relatively easy today to run most code with some level of optimization across the Intel and AMD CPUs. But the gap widens a bit when you talk about, is the code, or portions of it being GPU accelerated, or did you switch architectures from AMD64 to Power9 or something like that.
We talked last year about a transition from compute clusters being a hub fed by large-spoke data systems towards a data cluster where the hub is the data lake with its various moving pieces and storage tiers, but the spokes are all the different types of heterogeneous compute services that span and support the workload run on that system. We definitely have seen movement towards that model. If you look at all Cray’s announcements in the last few months, everything from what they are doing with Shasta and Slingshot, and work towards making the CS (cluster supercomputers) and XC (tightly coupled supercomputers) work seamlessly, interoperably, in the same infrastructure, we’re seeing companies like Cray and others gearing up for a heterogeneous future where they are going to support multiple processor architectures and optimize for multiple processor architectures as well as accelerators, CPUs and GPUs, and have it all work together in a coherent whole. That’s actually very exciting, because it’s not about betting on one particular horse or another; it’s about how well you are going to integrate across architectures, both traditional and non-traditional.
Ari Berman: Circling back to what Chris said. Life sciences historically has been sort of slow to jump in and adopt new stuff just to try it or to see if it will be three percent faster because the differences gained in knowledge generation at this point in life science for those three percent are not ground breaking – it’s fine to wait a little while. Those days, however, are dwindling because of the amount of data being generated and the urgency with which it has to be processed and also the backlog of data that has to be processed.
So we are not in life sciences at a point where – other than the differentiation of GPUs – applications are being designed specifically for different system processors other than for Intel. There’s some caveats to that. Normally as long as you can compile it and run it on one of the main system processors and it can run on a normal version of Linux, they are not optimizing for that; the exceptions to that are some of the built in math libraries that can be taken advantage of on the Intel system platform, some of the data offloading for moving data to and from CPUs from remote or even internally, memory bandwidth really matters a lot, and some of those things are differentiated based on what kind of research you are doing.
Ari Berman: Well, we really like a lot of the future architectures. AMD is coming out with for better memory bandwidth to handle things like PCIe links, having new interconnects between CPUs, and also the connection to the motherboard. One of the big bottlenecks Intel still has to solve is how do you get data to and from the machine from external sources. Internally they have optimized the bandwidth a whole lot, but if you have huge central sources of data from parallel file systems, you still have to get it in and out of that system, and there are bottlenecks there.
Aaron Gardner: With the Rome architecture moving forward, AMD has provided a much better approach to memory access, moving away from NUMA (nonuniform memory) to a central memory controller with uniform latency across dies. This is really important when you have up to 64 cores per socket. Moving back towards a more favorable memory access model on a per node design level I think is really going to help provide advantages to workloads in the life sciences and that is certainly something we are looking at testing and exploring over the next year.
Ari Berman: I do think that for the first time in a while Power9 has some potential relevance, mostly because Summit and Sierra (IBM-based supercomputers) coming into play and those machines being built on Power9. I think people are exploring it but I don’t know that it will make much of a play outside of just pure HPC. The other thing I meant to bring up is a place where I think AMD is ahead of Intel in fab technology. AMD is already manufacturing at 7nm versus the 14nm. I thought that it was really innovative of AMD to do a multiple nanometer fabrication for their next release of processors where the IO core is 14nm and the processing core is 7nm because, just for power and distribution efficiency.
Aaron Gardner: In terms of market share, I think AMD has been extremely strategic over the last 18 months because when you look at places that got burned by AMD in the past when it exited the server market, there were not enough benefits to warrant jumping back in fully right away. But AMD is really geared towards the economies-of-scale type plays such as in the cloud where any advantage in efficiency is going to be appreciated. So I think they have been strategic [in choosing target markets] and we’ll see over the next couple of years how it plays out. I think we are at the moment not in a place where the client needs to specify a certain processor. We are going to see the integrators influence here, what they choose to put together in their heterogeneous HPC systems portfolio, influence what CPUs people get and that may really effect the winners and losers over time.
ARM we see continue to grow but not explosively and I’d say Power is certainly interesting. Having the large Power systems at the top of the TOP500 has really validated Power9 for use in capability supercomputing. How those are used though versus the GPUs for target workloads is interesting. In general we may be headed to a future where the CPU is used to turn on the GPU for certain workloads. Nvidia would probably favor that model. It’s just very interesting the interplay between CPU and GPU; it really does have to do with whether you are accelerating a small number of codes to the nth degree or you are trying to have more diverse application support which is where multiple CPU and GPU architectures are going to be needed.
Ari Berman: Using GPUs is still a huge thing for lots of different reasons. At the moment GPUs are hyped for AI and ML, but they have been used extensively for a lot of the simulation space, Schrodinger suite, molecular modeling, quantum chemistry, those sorts of things, and also down into phylogenetic inference, special inheritance, things like that. There are many great applications for graphic processors, but really I would agree with others that it really boils down to system processors and GPUs at the moment in life sciences. I did hear anecdotally from a couple of folks in the industry that were using the IBM Q cloud just to try quantum [computing], just to see how it worked with really high level genomic alignment and they kind of got it to work and I’ll leave it at that.
HPCwire: We probably don’t devote enough coverage to networking given its importance driven by huge datasets and the rise of edge computing. What’s the state of networking in life sciences?
Chris Dagdigian: In pharmaceuticals and biotech, Ethernet rules the world. The high speed low latency interconnects are still in niche environments. When we do see non-ethernet fabrics in the commercial world they are being used for parallel filesystems or in specialized HPC chemistry & molecular modeling application environments where MPI message passing latency actually matters. However I will bluntly say networking speed is now the most critical issue in my HPC world. I feel that compute and storage at petascale are largely tractable problems. Moving data at scale within an organization or outside the boundaries of your firewall to a collaborator or a cloud is the single biggest rate limiting bottleneck for HPC in pharma and biotech. Combine with that the cost high speed Ethernet has not gone down as fast as the cost of commoditization in storage and compute. So we are in this double whammy world where we desperately need fast networks.
The corporate networking people are fairly smug about the 10 gig and 40 gig links they have in the datacenter core whereas we need 100 gig networking going outside the datacenter, 100 gig going outside the building, sometimes we need 100 gig links to a particular lab. Honestly the way that I handle this in enterprise is I am helping research organizations become a champion for the networking groups; they traditionally are under budgeted and don’t typically have 40 gig and 100 gig and 400 gig on their radar because you know they are looking at bandwidth graphs for their edge switches or their firewalls and they just don’t see the insane data movement that we have to do between the laboratory instrument and a storage system. The second thing, and I have utterly failed at it, is articulating that there are products other than Cisco in the world. That argument does not fly in enterprise because there is a tremendous installed base. So I am in the catch 22 of I pay a lot of money for Cisco 40 gig and 100 gig and I just have to live with it.
Ari Berman: I would agree networking is one of the major challenges. Depending on what granularity you are looking at, I think most of the HPCwire readers will care a lot about interconnects on clusters. Starting there, I would say we are seeing a fairly even distribution of pure Ethernet on the back end because of vendors like Arista for instance, which is producing more affordable 100 gig low latency Ethernet that can be put on the back end so you don’t have to do the whole RDMA versus TCP/IP dance necessarily. But most clusters are still using InfiniBand on their back end.
In life sciences I would say that we still see Mellanox predominantly as the back end. I have not seen life-science-directed organizations [use] a whole lot of Omni-Path (OPA). I have seen it at the NSF supercomputer centers, used to great effect, and they like it a lot, but not really so much in life sciences. I’d say the speed and diversity and the abilities of the Mellanox implementation could really outclass what is available in OPA today. I think the delays in OPA2 have hurt them. I do think the new interconnects like Shasta/Slingshot from Cray are paving the way to producing a reasonable competitor to where Mellanox is today.
Moving out from that, Chris is right. There are so many people using the cloud that don’t upgrade their internet connections to a wide enough bandwidth or take their security enough out of the way or optimize it enough so that people can effectively use the cloud for data-intensive applications, that getting the data there is impossible. You can use the cloud but only if the data is already there. That’s a huge problem.
Internally, a lot of organizations have moved to hot spots of 100 gig to be able to move data effectively between datacenters and from external data sources but a lot of 10 gig still predominates. I’d say that there is a lot of 25 gig implementations and 50 gig implementations now. 40 gig sort went by the wayside. That’s because of the 100 gig optical carriers where they are actually made up of four individual wavelinks and so what they did was to just break those out and so the form factors have shrunk.
Going back to the cluster back end. In life sciences the reason high performance networking on the back end of a cluster is really important isn’t necessarily for inter-process communications, it’s for storage delivery to nodes. Almost every implementation has a large parallel distributed file system where all of the data are coming from at one point or another. You have to get them to the CPU and that backend network needs to be optimized for that traffic.
Aaron Gardner: That’s a common case in the life sciences. We primarily look at storage performance to bring data to nodes and even to move between nodes versus message passing for parallel applications. That’s starting to shift a little bit but that’s traditionally been how it is. We usually have looked at a single high performance fabric talking to a parallel files system. Whereas HPC as a whole has for a long time dealt with having a fast fabric for internode communications for large scale parallel jobs and then having a storage fabric that was either brought to all of the nodes or somehow shunted into the other fabric using IO router nodes.
“One of the things that is very interesting with Cray announcing Slingshot is the ability to speak both an internal low latency HPC optimized protocol as well as Ethernet, which in the case of HPC storage removes the need for IO router nodes, instead allowing the HCA (host channel adapters) and switching to handle the load and protocol translation and all of that. Depending on how transparent and easy it is to implement Slingshot at the small and mid-scale I think that is a potential threat to the continued prevalence of traditional InfiniBand in HPC, which is essentially Mellanox today.”
HPCwire: We’ve talked for a number of years about the revolution in life sciences instruments, and how the gush of data pouring from them overwhelms research IT systems. That has put stress on storage and data management. What’s you sense of the storage challenge today?
Chris Dagdigian: My sense is storing vast amounts of data is not particularly challenging these days. There’s a lot of products on the market, very many vendors to choose from, and the actual act of storing the data is relatively straightforward. However, no one has centrally cracked the how we manage it, how do we understand what we’ve got on disk, how do we carefully curate and maintain that stuff. Overwhelmingly the dominant storage pattern in my world is if they are not using a parallel files system for speed it’s overwhelmingly scale-out network attached storage (NAS). But we are definitely in the era where some of the incumbent NAS vendors are starting to be seen as dinosaurs or being placed on a 3-year or 4-year upgrade cycle.
The other thing is there’s still a lot of interest in hybrid storage, storage that spans the cloud and can be replicated into the cloud. The technology is there but in many cases the pipes are not. So it is still relatively difficult to either synchronize or replicate and maintain a consistent storage namespace unless you are a really solid organization with really fast pipes to the outside world. We still see the problems of lots of islands of storage. The only other thing I will say is I am known for saying the future of scientific data at rest belongs in an object store, but that it’s going to take a long time to get there because we have so many dependencies on things that expect to see files and folders. I have customers that are buying petabytes of network attached storage but at the same time they are also buying petabytes of object storage. In some cases they are using the object storage natively; in other cases the object storage is their data continuity or backup target.
In terms of file system preference, the commercial world is not only conservative but also incredibly concerned with admin burden and value so almost universally it is going to be a mainstream choice like GPFSsupported by DDN or IBM. There are lots of really interesting alternatives like BeeGFS but the issue really is the enterprise is nervous about fancy new technologies, not because of the fancy new technologies but because they have to bring new people in to do the care and feeding.
Aaron Gardner: Some of the challenges with how we see storage deployed across life science organizations is how close to the bottom have they been driven. With traditional supercomputing, you’re trying to get the fastest storage you can, and the most of it, for the least amount of money. The support needed is not the primary driver. In HPC as a whole, Lustre and GPFS/Spectrum Scale are still the predominate players in terms of parallel file system. The interesting stuff over the last year or so has been Lustre trading hands (from Intel to DDN). With DDN leading the charge, the ecosystem is still being kept open and I think carefully crafted so other vendors can provide solutions independently from DDN. We do see IBM stepping up Spectrum Scale performance and Spectrum Scale 5offering a lot of good features proven out and demonstrated on the Summit and Sierra type systems, making Spectrum Scale every bit as relevant as it ever was.
As far as performant parallel file systems there are interesting alternatives. There is more presence and momentum behind BeeGFS than we have seen in prior years. We see some adoption and clients interested in trying and adopting it but the number deployments in production and at a large scale is still pretty limited.
These days object storage is seen more like a tap that you turn on and you are getting your object storage through AWS or Azure or GCP. If you are buying it for on-premise, there’s little differentiation seen between object vendors. That’s the perception at least. We are seeing interest in what we call next generation storage systems and file systems – things like WekaIO that provide NVMe over fabrics (NVMeOF) on the front end and export their own NVMeOF native file system as opposed to block storage. This removes the need to use something like Spectrum Scale or Lustre to provide the file system and can drain cold data to object storage either on premise or in the cloud. We do see that as a viable model moving forward.
I would add say that speaking to NVME over fabrics in general; that it seems to be growing and becoming established as most of the new storage vendors coming on the scene are currently architecting that way. That’s good in our book. We certainly see performance advantages but it really matters how it’s done—it is important that the software stack driving the NVME media has been purpose built for NVME over fabrics or at least significantly redesigned. Something ground up like WekaIO or VAST will perform very well. On the other hand you could choose NVME over fabrics as the hardware topology for a storage system, but if you then layer on a legacy file system that hasn’t been updated for it you might not see much benefit.
Couple of other quick notes. It seems like storage benchmarking in HPC has been receiving more attention both in terms of measuring throughput and metadata operations, with the latter being valued and seen as one of the primary bottlenecks that govern the absolute utility of a cluster. For projects like the IO500 we’ve seen an uptick in participation, both from national labs as well as vendors and other organizations. The last thing worth mentioning is data management. Scraping data for ML training data sets, for example, is one of the things driving us to understand the data we store better than we have in the past. One of the simple ways to do that is to tag your data and we are seeing more files systems coming on the scene with a focus on tagging as a core in-built feature. So while they come at the problem from different angles you could look at what companies like Atavium is doing for primary storage or Igneous for secondary storage, providing the ability to tag data on ingest and the ability to move data (policy-driven) according to tags. This is something that we have talked about for a long time and have helped a lot of clients tackle.”
Link to Part Two (HPC in Life Sciences Part 2: Penetrating AI’s Hype and the Cloud’s Haze)