Messaging around HPC-big data convergence which had been ramping up all year reached new heights at SC15 — and you’d be hard-pressed to find a bigger champion of the unified platform strategy than American supercomputer-maker Cray. HPCwire met with Cray’s Barry Bolding at the show in Austin last month to discuss the company’s latest customer wins, its take on OpenHPC and its plans to over time coalesce its product lines toward a single flexible infrastructure.
Although the company didn’t have major product refreshes this year, it still had plenty to talk about. “The show for us is focused on a couple of really interesting customer wins that we did press releases on,” said Bolding. “They are interesting because they are indicative of some of the trends we are seeing.”
He’s talking about the Institute for Computational Mathematics (ICM) out of Warsaw, which Cray provided with an XC40 system, and the fact that ICM is interested in doing a mix of traditional HPC and analytics on the system.
This is what Cray is seeing more and more of – HPC users that want to bring analytics workloads into their workflows. It’s something that Cray has been working on with another partner and customer, the National Energy Research Scientific Computing Center (NERSC), which has been developing ways to incorporate analytics intensive workloads on its systems.
For the last couple of years, Cray has been teaming with NERSC through their joint Center of Excellence, to develop Shifter, which is enabling Docker to be brought onto Cray systems.
“That’s probably the sleeper announcement that I think is most important,” said Bolding.
“Our systems are already the most productive systems in the world for highly-scalable applications and for the physics simulation type workloads that are compiled,” he continued. “For Crays, Docker allows users to be more productive for a very wide range of applications because they are able to containerize any application and bring that onto the Cray system regardless of what OS is employed or what libraries it needs.
“NERSC is really interested because they think that their XC systems are actually more productive than clusters and that they can consolidate some of their infrastructure — it’s an attempt at server consolidation to bring those workflows right onto the same machine. And it also provides access to new languages on the Cray: things like Python, R, you can now containerize those and run those on a Cray system.”
“Remember,” Bolding elaborated, “our Cray XC systems were pretty stingy about allowing modifications to the system because they can affect scalability. We do run Linux, and we offer cluster compatibility mode. We’ve tried to make them as open as possible, but we don’t want to give up scale. Docker is a really nice way to now provide a complete infrastructure that’s compatible with the rest of the world and not have to compromise on the big jobs.”
The reason Shifter is needed is to make Docker compatible with the Lustre-based high-performance infrastructure. Docker was created in commodity environments where Lustre doesn’t exist, Bolding explained, so one of the things Shifter allows is the ability to run Docker where Lustre is the file system.
The other customer that Cray announced was the Alfred Wegener Institute in Germany. This is Cray’s first CS announcement with the Omni-Path 100 network in it – which Bolding characterized as “an alternative to InfiniBand today for doing cluster solutions.”
“We’re supporting InfiniBand and the Omni-Path 100 in our CS line,” Bolding shared. “The design of Omni-Path is a good design, a high-performance network much like InfiniBand. It has similar characteristics in many ways, so it will really come down to customer choice on that.”
Upon further prodding, Bolding added: “If you really want scale you need Aries. The Aries interconnect has features that do not exist in InfiniBand or Omni-Path 100, and those features are all built around the scalability, the Dragonfly topology, the adaptive routing, the packet decomposition, where we’re able to break packets and spread them across the Dragonfly network. Those types of features do not exist in any network on the planet and will not exist in any other network for years to come.”
Recall that Cray transferred its interconnect program and expertise to Intel in 2012, but Aries was kept exclusive to Cray under a protected agreement.
Some of the folks who worked on the circuit design went over to Intel as did some of the IP of the future technologies that could be used in future designs, Bolding acknowledged. “We have an agreement to work on that roadmap. The first system you will see with a network that is truly philosophically derivative of that is going to be the Aurora system that we announced earlier this year, which will feature the next-gen Omni-Path.”
Bolding is of course referring to the final piece of the CORAL triumvirate, which has Cray and Intel teaming up to provide Argonne National Laboratory with a 180-petaflops supercomputer in the 2018 timeframe. This will be the first system built on Cray’s “Shasta” architecture, which is a follow on to its current XC series. Shasta is really “an unannounced product,” Bolding commented, but Cray released the name ahead of schedule, a nod no doubt to the significance of this “exascale-oriented” leadership-class system.
“In 2016, we’re going to be announcing products around some of the new processors that are coming out,” Bolding continued. “We’re going to be delivering some Knights Landing-based systems. It’s going to be a really exciting year.”
Naturally, Cray has engineering samples of Knights Landing processors and is doing a lot of testing. The company is also working with partners on getting codes ported over to Knights Landing. They have a lot of codes ported, Bolding noted, but Cray won’t disclose performance numbers until they get closer to launching product.
“We have a good relationship with Intel and we’re a key partner so we can do one thing that many partners cannot which is test samples at scale,” Bolding commented. “And based on that, we are confident that we are going to deliver great systems, including Trinity at Los Alamos National Laboratory (LANL) and Cori at NERSC.”
When Three Become One
The big takeaway from our conversation with Bolding is that Cray is laser focused on a converged analytics/big data/HPC roadmap. He sees wide agreement from the community that this will happen. “They don’t know how it will play out but they all agree, that’s where things are going,” he said.
“To be honest, I don’t even want to call it converged,” he continued. “We believe that the supercomputing infrastructure of the future is a big data infrastructure. They are synonymous. They don’t separate. If you want to be a supercomputing company in 2020, you better have an infrastructure that’s able to do those workloads and do them well.”
Cray believes that with its DataWarp technology and tight interconnects, and software flexibility like Docker, it has all the ingredients necessary to achieving this coming paradigm.
“Short-term, we don’t need just Moore’s law to innovate,” Bolding said. “If you can optimize the workflow, you can push off the Moore’s law wall a little bit. We’re still going to face a tremendous disruption, but I don’t see it as a short-term issue. It’s way out in the next decade.”
“To some extent in the roadmaps of Intel and the other chip providers, you are seeing a lot more flexibility in the way that cores are used,” he added. “And that will push the Moore’s law wall father out. Because productivity is what it’s about right now, which is perfectly situated where Cray has always been. It’s about the productive supercomputer, not about more cores or faster clock speeds. It’s about the least expensive per unit of work, not least expensive per LINPACK cycle.”
The Promise of OpenHPC; the Significance of DataWarp
The OpenHPC play to unify the HPC software stack aligns with Cray’s strategy with the caveat that the effort must truly be processor agnostic. The Linux Foundation involvement was a big factor in Cray signing on, but having seen similar missions thwarted, their approach is one of cautious optimism. The supercomputing company wants to focus on the places where it can differentiate, Bolding shared. “For Cray that’s at scale, but there are other places in the stack that aren’t differentiators, they are rites of passage and if we can collaborate with the community to streamline that, it means more R&D money can be spent on the high value to the customer.”
When it comes to differentiating – for Cray it’s about moving data in and out, moving data around the stack, and this is why DataWarp is such a significant investment.
Bolding said that the DataWarp burst buffer technology is fundamental to Cray’s roadmap going far out into the future.
He highlighted two sites that have pretty large instantiations with hundreds of SSD cards – the Trinity system at LANL, where Cray delivered the Xeon Haswell portion already, and King Abdullah University of Science and Technology (KAUST) .
“People have been historically buying bandwidth by buying more spindles and at systems where we sold tens or twenty thousand spinning disks to build a big Lustre file system, that’s great – but you don’t necessarily want to buy more disks to get more bandwidth,” he said. “You want to buy more disks to get more storage, and then you want to be able to get bandwidth from something that is less expensive and maybe not as a high on capacity. That’s what DataWarp does. So we’re seeing higher bandwidth from this KAUST installation than we’re getting from the biggest Lustre installation we’ve done and very high IOPS too, whereas spinning disks aren’t great at IOPS.”
Cray doesn’t build its own storage, rather it works with Seagate on a version of their ClusterStor storage for Lustre. The Cray-branded Sonexion storage system is a derivative of ClusterStor line, part of an agreement originally started with Xyratex before it was acquired by Seagate in March 2014.
One place where DataWarp is not a replacement for Lustre is for small installations, Bolding noted. “At certain sizes, you need to have that capacity, you can get that balance, and it doesn’t make sense to only have an SSD layer, which is still pretty expensive relative to disk,” he explained.
Big Data Detour?
Given Cray’s current unification mantra, it’s worth noting that its big data bifurcation, Yarc Data, ended with the division being reabsorbed by Cray one year ago. Others in the HPC space that took similar circuitous routes to embrace big data are having the same awakening. I asked Bolding what he makes of it.
“You have to consider the emergence of text-based search being the history of big data. There’s always been BI, databases, structured databases, and it’s still there,” said Bolding. “But the innovation that Google and others brought with MapReduce to be able to just search and open up search as a way to sift through data is tremendously powerful, but they designed it for text-based search. Then, the community decided to use it for cognition, for graphs. Suddenly you had this explosion of use cases that sprung up around text-based search which we call analytics, but analytics always was there. It’s not new, it’s a new way of doing it.
“And now we’re looking at relational databases, MapReduce, Spark and big simulations – wow these are all really powerful. How can we get them all to work together? But you don’t want to have three systems.”
Cray still has its Urika-XA Hadoop platform and Urika-GD, its graph product. Bolding said these have been successful, but because Urika provides such a competitive advantage, customers haven’t been willing to disclose those wins like they have with XC and CS.
But more to the point in terms of future roadmap, they don’t want to buy two infrastructures. Cray is working on bringing the analytics and big data technologies together.
Cray wasn’t ready to disclose actual product details yet, but its strategy is crystal clear: working towards a united platform. It’s what makes sense and it’s what their customers want, noted Bolding.
“Urika is going to continue,” Bolding affirmed, “It’s a big investment for us, but can we marry technologies between these two lines?”
Cray says that more will be revealed soon. The company will be rolling out announcements around partnerships and technologies in the first quarter and into the second quarter. “From a what we can say today, we are fully dedicated to the convergence of analytics and big data and Urika is a big part of that and the evolution of Urika is a big part of that.”
Big Growth Supercomputing
Cray has seen strong commercial growth in the last year. It’s anticipating over 15 percent of its revenue to be in commercial in 2015, which is more than twice as much as a percentage of revenue over last year. This strong growth is driven by multiple segments — namely manufacturing, energy, oil and gas, financial services and life sciences — and by very complex algorithms, Bolding commented. “Many industries are driving high bandwidth, low-latency networks,” he added. “They need productivity, all the things that Cray’s been focused on for the last few years.”
This doubling of percentage of revenue happened while their revenue has grown significantly. Cray’s last earnings call projected $715 million in revenue for 2015, which is a growth of about 20 percent over the previous year in top-line revenue. The stock prices have reflected the positive projections with a rise from the low 20s up to mid 30s.
Cray has been averaging about 20 percent growth the last few years, bucking the common wisdom that says that there’s no money to be made in supercomputing. According to Bolding, Cray sets its targets internally by aiming for about twice the market rate. “So if IDC says it’s 8 percent, we want to grow at 16 percent – that’s the way we look at it – it means that you’re taking market share and you’re growing,” Bolding stated.
“And we can do that for a number of reasons at the high-end,” he continued. “We are more complimentary to cloud than competitive, so cloud is not eating our lunch today, and we are focused on keeping that from happening in the future. Two, there is chaos in the competitive landscape. BlueGene isn’t in the marketplace, POWER is not doing very well, and Lenovo is uncertain for some. It provides an opportunity for us and a few other vendors.”