2017 was not necessarily the best year to build a large HPC system for life sciences say Ari Berman, VP and GM of consulting services, and Aaron Gardner, director of technology, for research computing consultancy, BioTeam. Perhaps that’s true more generally as well. The reason is there were enough new technology options entering or expected soon to market – think AMD’s EPYC processor line, Intel Skylake, and IBM’s Power9 chip – that choosing wisely among them could seem premature. The jolt of Spectre-Meltdown in early 2018 hasn’t helped settle the waters.
In part one of HPCwire’s examination of 2018 HPC trends in life sciences, published last week, Berman and Gardner talked about AI trends and cloud use in life sciences. In Part Two, presented here, they consider the prospect of real challenge to Intel’s dominance in the processor landscape, the proliferation of storage technology options, and the rising need for fast networking in life sciences.
HPCwire: The processor market hasn’t been this frothy for a long time with AMD showing real traction and IBM hoping for the same. Are we looking at real change?
Ari Berman: I agree in 2017 and 2018 diversity is the name of the game. For a long time, Intel was the only game in town with the Xeons and then they tried to break in with Xeon Phi and co-processing. I think the really interesting issue now is that Intel took a gamble by going more to a platform model for CPU in Skylake and in some ways, that’s paid off, and in some ways it hasn’t. I think that particularly in the HPC space, independent of Spectre-Meltdown, there were some performance problems with the Skylake architecture. It’s pretty common in version one of any new product. But it took some of the major HPC centers to figure out that there were problems that the chip designer didn’t anticipate. Intel is sort of running to catch up in a lot of ways, down to microcode which is really hard to catch up on. At the same time, AMD has surged out and EPYC has gained some footing in this market to.
It’s a really interesting time for core processing, and at the same time Power9 has come out. No one knows what kind of an impact that is going to have. IBM is definitely making a power play to this space and suddenly the dome has crumbled a little bit on the Intel monopoly.
Aaron Gardner: We see a lot of good things happening on the horizon. When the Ryzen CPUs came out I bought one immediately just because it was something new. Early testing is looking really good with the EPYC family on the server side, based on Naples (architecture). I think we are even more interested in the forthcoming Rome architecture and what we’ll see come out of that. I think the industry as a whole in the last year has been cautiously optimistic about what it means to have AMD involved again. I think we are going to see a lot of rejuvenation in the CPU space.
Our advice to folks is this is not a time to just use what you have always used; instead look at the playing field and look at all options. We certainly are taking that approach with the organizations we work with. I do think interoperability across architectures is going to become more and more important for many clients while some people do like to pick a particular architecture and ultra-optimize, but I think we are seeing all of the landscape moving forward where there are multiple architectures in play, and that is important.
HPCwire: Betting against Intel is not often a good idea. Are we seeing a real and persistent change in the processor landscape?
Ari Berman: That’s a great question. The safe answer is, it depends. What it depends upon is the ability of both AMD and IBM to deliver. That was the problem in the past and has remained the problem. IBM’s additional problem is cost. It’s always much more expensive to go with an IBM processor than the other two. At scale that matters immensely. [For example] the exascale folks are going to work minimize processor cost and power consumption, and Power9 is not a cheap processor both in power utilization and in raw cost. There may be some benefits to it but we still have not seen any real penetration of the Power platform in life sciences.
However, we are seeing clusters being built with EPYC and the Intel family of processors. It really depends. Arm is also right in there. The problem with Arm is, much like GPUs, a lot of the algorithms have to be refactored and recompiled and tested to see if they work the same because it is not an x86 architecture. So, is the activation energy required by the scientific use cases worth the effort to convert to an all Arm system?
Aaron Gardner: And people need to be prepared to move back if necessary. There absolutely has been a transition in the last year, but the staying power of that transition is an important question. That’s why we are looking towards the Rome architecture with AMD. We are seeing some interesting signals from vendors around them diversifying their CPU architecture. How much that leads to further change and lasting change in the industry is going to have to do with the lift that is required to realize that change. From what we are seeing now, the lift is not much on AMD platforms to optimize for life sciences workloads. That’s not to say that Arm doesn’t have a play there, but it is definitely not as quick a lift and not as generalized a solution. IBM has done a lot with Power to make it something that is tenable, but we’ll have to wait and see there.
HPCwire: Are you hearing worries about AMD’s commitment to the market?
Aaron Gardner: Earlier on, we definitely heard that story repeated by OEMs, vendors, and partners, but we are hearing less of that now. Again, we’ll check back in a year. I think that would be a knee jerk reaction if there are bumps in the road again. I think everyone wants to see a diversified playing field in the CPU space but people have memories of that previous pull back from AMD and also Intel’s ability to engineer their way back into dominance. Both of those stories are deeply ingrained in the industry psyche.
HPCwire. Talking about scarring the psyche. The thunderbolt of Spectre-Meltdown is stirring serious worry in both the user and vendor communities. What’s your take?
Aaron Gardner: It’s true. The Spectre and Meltdown security vulnerabilities are still reverberating and in play. We’ll see how it all plays out. One of the interesting things, going back to cloud computing, then I will get back on CPU, the cloud providers, due to the nature of those vulnerabilities had an obligation to mitigate those vulnerabilities quickly. Very quickly mitigation measures were put in place across cloud providers. But there are performance implications with those patches. If you have local infrastructure, you can kind of choose your approach and your stance. But in the cloud, you must accept the providers’ approaches.
We saw messages from the high-performance storage community pointing out that applying the Spectre and Meltdown patches to storage clients had a tremendous impact on storage performance. We also had many letters coming out from the storage vendors speaking to their stances on how to approach the vulnerability. The summation there is that having a diversified CPU portfolio, especially if you are on-premise and off premise, as well as multi-cloud, just gives people some hedges and adaptability to navigate the CPU waters. I think Spectre-Meltdown really showed the industry that there can be speed bumps along the way with different CPUs, so being able to move workloads across architectures can become an important consideration.
Ari Berman: I’ll add to that. Just as in a multi-cloud environment, it’s important to understand what different processors and different platforms bring to you. Take the time to understand [your needs]. Does your workload require a lot of PCI links, [then] AMD is the thing for you. If you still need a lot of high-powered integer calculations, maybe Skylake is still the thing for you.
HPCwire: Let’s shift gears and talk about storage.
Ari Berman: BioTeam vacillates between what’s our biggest issue in life sciences and it vacillates between networking and storage and data management. In the last year or so, the problem has shifted back to storage from networking as a major issue. The main issue is there’s a lot of hype and a lot of diversity in the storage market, and people are realizing that the storage market is incredibly overpriced for what it delivers. Also, the available technologies that are coming out – files systems, the dropping prices of flash, and PCIe switching – all of that stuff all has the potential to transform this entire space.
At the same time the need for multi-petabyte storage by almost everybody has really driven life sciences. We are literally almost to the point that any laboratory with significant data generation capability needs to be peta-capable, because over the course of a year or two they have a single device that generates a petabyte of data. That was a major challenge a couple of years ago. Today, having to manage one or ten petabytes is common place. But the power costs and the management costs haven’t changed at all on scale. That’s one of the major challenges.
A few years ago, we did see a shift away from a scale-out NAS to parallel distributed file systems in life sciences because of the scalability. That [coincided] with speed improvements. That shift continues on some fronts but there’s a bit of disillusionment on what those parallel distributed file systems capabilities are and vendors are actually being pushed to deliver what they say they can.
One other thing is that cloud has become a semi-major player, at least in long-term storage and data sharing. The thing is, as per usual, you have to use those cloud resources very carefully because if you are sharing ten petabytes of genomics data in Amazon, that’s going to cost you a whole lot of money. Same thing in Google, Google storage is very expensive as well. Again, the interesting thing is, if the data you’re storing in the cloud is something you need to access a lot outside of the cloud environment, and this is the challenge of using a cloud environment, you are going to find yourself paying tens of thousands dollars each time you move that data out of the cloud. That’s the opportunity cost that clouds charge and their business model is to lock you into using their stuff, otherwise you pay a lot for it.
Aaron Gardner: One of the big things we’ve seen over the past couple of years is people moving to tiered storage systems and creating storage workflows that facilitate movement through storage tiers and to move data out through the right network segments. One of the things that is driving movement to tiered storage is that the cost of storage changes depending upon the context of the workload, so the idea of hot storage and cold storage and everything in between becomes important. At a certain scale, you can no longer have a one-size fits all storage solution.
I think where people are now is they are realizing that all of these different types of storage they have bought with all of the orchestration, middleware layers and software for moving data around has mitigated the cost over the last few years, but added so much complexity that when new things are introduced – such as doing analytics across all of the data you have to get value, maybe driving training data sets, or for doing a data commons and things like that – all of a sudden you are putting all of your storage into action in a way it wasn’t before.
You’ll notice a lot of storage vendors are talking about their read access patterns and read requirements with deep learning and how that is different from what they did with HPC workloads in the past. Workload access patterns are changing. More and more of the written data is being accessed later, but we are also seeing now that we are getting more [into] analytics and that people are sobering up a lot about, “we have stored everything until now [but] we actually haven’t accessed a lot of it; is it still valuable?” People are trying to do those exercises more and more. We are certainly seeing an increase in data commons efforts within organizations to organize and make accessible all of the actionable data an organization has.
HPCwire: Broadly, is there a rush to new storage architectures to accommodate things like greater analytics demands?
Aaron Gardner: Not holistically. I would say bifurcation is a trend in the sense that proven strategies are still working as well as new approaches. We’ve been involved in some very large traditional HPC buildouts, architectures, and specification type work recently, and actually we still found that going with tried and true platforms with a best practices design in distributed parallel storage architecture is providing a reliable means of storage. So, in some ways it’s not like everything we have been doing for the last five years or decade is getting thrown out. On the other hand, I would say traditional storage hasn’t changed much, while there continue to be improvements. There’s the evolution of Lustre. There’s all of the work on the CORAL program done for GPFS. Those are two stable files systems that are still prevalent in HPC and so there is very much sustaining innovation happening.
We also have things like cloud, distributed infrastructure, all these things pushing the new software defined storage paradigm that’s been growing from marketing hype to reality. There’s absolutely a sea of storage offerings in that space. Some of it is chasing performance to harness the capabilities of NAND storage and other new storage mediums like 3D Xpoint/Optane type stuff; other parts of it are trying to chase the realities of what has been realized at hyperscale. In terms of economies of scale, a lot of the practices that have been present in web scale and hyperscale customers are now becoming common in software defined storage offerings, and they definitely are being consumed by researchers in the life sciences. So, we’re seeing a one-two punch. Tried and true methods are the most reliable but we also are seeing absolutely a shift into next generation files systems and storage.
HPCwire: What’s happening with Lustre? Last year, Ari was not especially high on it.
Aaron Gardner: Lustre continues to improve. If you look at the leading edge of the releases, it is becoming more tenable for life sciences workloads. The challenge this year is the rapidity at which those features are adopted and supported in the vendor space. I still see it being a couple of years before you can rely on the latest changes and adaptations being present in whatever offering you buy. You really have to be aware of where a particular Lustre vendor stands with respect to all the work that is being done. We are still seeing more adoption of GPFS as far as life sciences workloads. It continues to improve and get better as well, but has some similar challenges in the vendor space. Also, if you drill down into some of the emerging benchmarks with BeeGFS, it does really well with metadata which is becoming more and more important. We are seeing people picking up on BeeGFS in Europe but still waiting to see critical mass stateside.
HPCwire: Object storage has enjoyed a fair amount of buzz the year. What are you seeing in terms use cases and adoption trend?
Ari Berman: The interesting thing about object storage is that it still remains the tactic that hyperscalers are using to manage their sprawling data web services on top of, but it still isn’t penetrating that far, at least in the life sciences space, into everyday utilization with our customer base. Everybody is talking about it. People have proof of concepts. Some folks even have extensive distributions. But, what is still missing is the adoption to use it in [its] native data format. Folks are still using emulators and translators in front of object storage. My prediction for this year coming, because of the increased cloud adoption and some algorithm generation, is that folks are going to start seeing native object support as a part of algorithms [in applications]. But the kicker is that most object storage has been sold as an archive solution and doesn’t have the speed built into it that real analytics would need. [The result] is everyone sees object storage as a cheap and deep type of solution, rather than a scaling analytics solution, and that’s going to be a hard thing to overcome in the market because you can’t do it at speed.
Aaron: My sense is this past year is when object storage just became common place. It’s not new and shiny anymore. It’s just an accepted part of the landscape. I would say there are a couple different sides to object storage. One is that more and more of the foundational bedrock of storage right above the media level is addressed at the object level; even Lustre storage has had object as part of the inherent design. So, there’s that back-end architectural consideration. We’re seeing more next generation efforts from the storage vendors continue to assume an object storage back end, which is really helpful in the cloud.
At the front end, it purely is a protocol consideration. One of the things we’re seeing more of, and this is to Ari’s point about the adoption of object storage at the application level, is that while object storage is one access method that works really well for web scale type use, folks are starting to now make object more of a protocol use case consideration, that your storage knows both POSIX and object. It’s just understood that for modern storage you are going to have a good object front end as well. A POSIX front end is needed sometimes and sometimes it’s an object name space being represented.
Relating to Ari’s comment on analytics performance, the new and shiny thing is NVMe over fabric. Vendors are pushing that through their proprietary implementations or taking more open standards approaches. That’s absolutely something that is taking hold and we will see more adoption moving forward as a way to take care of the concern of needing low latency access to IOPs over the network.
HPCwire: High speed networking has always been important in HPC but perhaps less so in life sciences. That seems to be changing. Could you give us a brief overview of the state of networking in life sciences HPC?
Ari Berman: The most obvious statement I can make with networking is that the byproduct of producing a lot of data is that you have to move it. Everyone talks about delivering the compute to the data and that’s great but you still have to move the data there and you still have to back it up, and manage it, and lifecycle it, and share it, and a lot of these are things that folks aren’t doing. All of those things require high performance or at least performance-minded networking built underneath the systems.
Many organizations have 10Gb or 40Gb capability, and in some cases 100Gb capability, but the security that’s put in front of it limits everything down to 20Gb batches. High performance security is something that’s lagging way behind. The whole Science DMZ network architecture traded a more distributed security model around data intensive science that enabled performant networking and sort of solved that problem, but typical security shops just haven’t adopted and/or don’t understand or aren’t comfortable without a physical firewall and many levels of abstraction.
What’s really interesting, and I am going to quote Chris Dagdigian (BioTeam co-founder and Senior Director of Infrastructure), is that we have organizations that are still smug about having 10Gb, which was released 16 years ago. It’s an old technology at this point. It’s just reached the point of wide adoption and cost effectiveness such that you actually cannot have an enterprise network without some 10Gb paths in it, especially in scientific environments. As such 100Gb is quickly becoming the standard on a scientific enterprise scale.
The 100Gb stuff scares people, but it is at the point where it is easily manageable and easily deployable. It doesn’t necessarily cost a ridiculous amount. You do have to relocate some of your optics with 10Gb and 40Gb to be able to make it work well. But, mostly it’s pretty easy to deploy. Then there’s the 400Gb standard, which was featured at SC17 this year as a part of SCinet. This is something that’s really happening, and vendors have released a 400Gb Ethernet standard. So, that moves the ball forward again towards a terabit, which is probably coming out in the next year or so. This is an innovation curve that is needed in the industry.
A fast HPC connected at 1 gigabit is not going to work. Build a fast HPC connected with a 100Gb and it might work. The challenge remains that moving data at speed is a hard thing to do. Having a single server move data at a 100Gb is almost an impossibility, even though 100Gb cards exist for those machines – you have to balance the IO with the PCI bus perfectly to even approach those speeds on a single machine. We’ve done a couple tests where we have reached 72Gb, but it was with sample data. It’s really, really hard to do. Just to drive the point home, PCI express is the bottleneck right now for single server transfer speeds to work well. Very clearly, a lot of people have been screaming for PCIe4. It would be great for a big player like Intel to release it for a lot of reasons. If that standard were to come out sooner, that would allow us to shift the conversation again.
HPCwire: What’s your take on the jockeying between Ethernet and InfiniBand for sway in HPC?
Ari Berman: We have seen this interesting phase in the market where 100Gb Ethernet is similarly priced with EDR InfiniBand. What that means is everyone says, “well we can get 100Gb either way, so let’s use something that’s easy to use and we don’t have to have RDMA or we can have a system use RDMA over Converged Ethernet or something like that if we use these specialized switches.” The truth with HPC, my personal opinion, is Ethernet is still a wonky protocol to use on the back-end of a truly scaling HPC system. It still causes issues for TCP, even within a contained system, that reduces some of the performance gains you get by having a low latency, lossless networking protocol like OmniPath or InfiniBand. The interesting thing within life sciences is that low latency isn’t really tremendously needed for the analytics for parallelism, but it is really needed for the delivering and absorption of data to and from the storage systems at scale. If you have 100,000 cores or 10,000 nodes trying to access a single file system, you have to have an incredible network, not to mention an incredible storage system, to be able to deliver that data to the cluster as a whole. Those are the challenges we are seeing.
Here’s another challenge. We have designed and we have seen designs where the back end of a cluster is entirely InfiniBand or OmniPath, because you can build those to support TCP/IP and the native protocols work well for analytics. The problem then is how do you scale that out into the Ethernet space. So, you have to have gateways, and InfiniBand routers, and all these interesting rather convoluted things that sometimes work well and sometimes don’t work well but also have to operate at speed. There’s a lot of kluges in the market there.
We are seeing Arista 100Gb Ethernet installed in the middle of clusters now because they have hit a price point that is similar to Mellanox and Intel and those work fine, and it’s something folks know and doesn’t require extra knowledge to operate. We’ve seen some of the NSF supercomputing centers put in OmniPath. They really like it. It works well. But we haven’t seen wide adoption across life sciences. At least in life sciences, Mellanox is still the king here, and the interesting thing about Mellanox is they have responded to the competition of OmniPath by quadrupling their capabilities.
Aaron Gardner: Many people are always trying to guess when InfiniBand is going to die out and Ethernet will take hold. We still see InfiniBand demand being strong going forward while we note that Mellanox continues to sell more and more Ethernet. That pushes a healthy diversified ecosystem and you have players like Arista in the mix as well. On OPA, we’ve seen it used to good effect when people want to adopt Intel’s platform strategy.
I think the thing that would perhaps slow Intel, and why we haven’t seen OPA translate to the general market at the same time it has with the leadership class, is what we are seeing with CPU diversification in terms of maybe you want some Arm, maybe you want some AMD in part of your environment too. People then crave a uniformity at the networking layer which is why I think InfiniBand and Ethernet will still stay in HPC and hold their lines and even increase market share. I don’t see a single emerging solution for the network fabric space.
HPCwire: Thank you both for your time and the overview. Let’s check the scorecard next year.