Cray Lays Out Vision for HPC-Big Data Convergence

By Tiffany Trader

December 3, 2015

Messaging around HPC-big data convergence which had been ramping up all year reached new heights at SC15 — and you’d be hard-pressed to find a bigger champion of the unified platform strategy than American supercomputer-maker Cray. HPCwire met with Cray’s Barry Bolding at the show in Austin last month to discuss the company’s latest customer wins, its take on OpenHPC and its plans to over time coalesce its product lines toward a single flexible infrastructure.

Although the company didn’t have major product refreshes this year, it still had plenty to talk about. “The show for us is focused on a couple of really interesting customer wins that we did press releases on,” said Bolding. “They are interesting because they are indicative of some of the trends we are seeing.”

He’s talking about the Institute for Computational Mathematics (ICM) out of Warsaw, which Cray provided with an XC40 system, and the fact that ICM is interested in doing a mix of traditional HPC and analytics on the system.

This is what Cray is seeing more and more of – HPC users that want to bring analytics workloads into their workflows. It’s something that Cray has been working on with another partner and customer, the National Energy Research Scientific Computing Center (NERSC), which has been developing ways to incorporate analytics intensive workloads on its systems.

For the last couple of years, Cray has been teaming with NERSC through their joint Center of Excellence, to develop Shifter, which is enabling Docker to be brought onto Cray systems.

“That’s probably the sleeper announcement that I think is most important,” said Bolding.

“Our systems are already the most productive systems in the world for highly-scalable applications and for the physics simulation type workloads that are compiled,” he continued. “For Crays, Docker allows users to be more productive for a very wide range of applications because they are able to containerize any application and bring that onto the Cray system regardless of what OS is employed or what libraries it needs.

“NERSC is really interested because they think that their XC systems are actually more productive than clusters and that they can consolidate some of their infrastructure — it’s an attempt at server consolidation to bring those workflows right onto the same machine. And it also provides access to new languages on the Cray: things like Python, R, you can now containerize those and run those on a Cray system.”

“Remember,” Bolding elaborated, “our Cray XC systems were pretty stingy about allowing modifications to the system because they can affect scalability. We do run Linux, and we offer cluster compatibility mode. We’ve tried to make them as open as possible, but we don’t want to give up scale. Docker is a really nice way to now provide a complete infrastructure that’s compatible with the rest of the world and not have to compromise on the big jobs.”

The reason Shifter is needed is to make Docker compatible with the Lustre-based high-performance infrastructure. Docker was created in commodity environments where Lustre doesn’t exist, Bolding explained, so one of the things Shifter allows is the ability to run Docker where Lustre is the file system.

The other customer that Cray announced was the Alfred Wegener Institute in Germany. This is Cray’s first CS announcement with the Omni-Path 100 network in it – which Bolding characterized as “an alternative to InfiniBand today for doing cluster solutions.”

“We’re supporting InfiniBand and the Omni-Path 100 in our CS line,” Bolding shared. “The design of Omni-Path is a good design, a high-performance network much like InfiniBand. It has similar characteristics in many ways, so it will really come down to customer choice on that.”

Upon further prodding, Bolding added: “If you really want scale you need Aries. The Aries interconnect has features that do not exist in InfiniBand or Omni-Path 100, and those features are all built around the scalability, the Dragonfly topology, the adaptive routing, the packet decomposition, where we’re able to break packets and spread them across the Dragonfly network. Those types of features do not exist in any network on the planet and will not exist in any other network for years to come.”

Recall that Cray transferred its interconnect program and expertise to Intel in 2012, but Aries was kept exclusive to Cray under a protected agreement.

Some of the folks who worked on the circuit design went over to Intel as did some of the IP of the future technologies that could be used in future designs, Bolding acknowledged. “We have an agreement to work on that roadmap. The first system you will see with a network that is truly philosophically derivative of that is going to be the Aurora system that we announced earlier this year, which will feature the next-gen Omni-Path.”

Bolding is of course referring to the final piece of the CORAL triumvirate, which has Cray and Intel teaming up to provide Argonne National Laboratory with a 180-petaflops supercomputer in the 2018 timeframe. This will be the first system built on Cray’s “Shasta” architecture, which is a follow on to its current XC series. Shasta is really “an unannounced product,” Bolding commented, but Cray released the name ahead of schedule, a nod no doubt to the significance of this “exascale-oriented” leadership-class system.

“In 2016, we’re going to be announcing products around some of the new processors that are coming out,” Bolding continued. “We’re going to be delivering some Knights Landing-based systems. It’s going to be a really exciting year.”

Naturally, Cray has engineering samples of Knights Landing processors and is doing a lot of testing. The company is also working with partners on getting codes ported over to Knights Landing. They have a lot of codes ported, Bolding noted, but Cray won’t disclose performance numbers until they get closer to launching product.

“We have a good relationship with Intel and we’re a key partner so we can do one thing that many partners cannot which is test samples at scale,” Bolding commented. “And based on that, we are confident that we are going to deliver great systems, including Trinity at Los Alamos National Laboratory (LANL) and Cori at NERSC.”

When Three Become One

The big takeaway from our conversation with Bolding is that Cray is laser focused on a converged analytics/big data/HPC roadmap. He sees wide agreement from the community that this will happen. “They don’t know how it will play out but they all agree, that’s where things are going,” he said.

“To be honest, I don’t even want to call it converged,” he continued. “We believe that the supercomputing infrastructure of the future is a big data infrastructure. They are synonymous. They don’t separate. If you want to be a supercomputing company in 2020, you better have an infrastructure that’s able to do those workloads and do them well.”

Cray believes that with its DataWarp technology and tight interconnects, and software flexibility like Docker, it has all the ingredients necessary to achieving this coming paradigm.

“Short-term, we don’t need just Moore’s law to innovate,” Bolding said. “If you can optimize the workflow, you can push off the Moore’s law wall a little bit. We’re still going to face a tremendous disruption, but I don’t see it as a short-term issue. It’s way out in the next decade.”

“To some extent in the roadmaps of Intel and the other chip providers, you are seeing a lot more flexibility in the way that cores are used,” he added. “And that will push the Moore’s law wall father out. Because productivity is what it’s about right now, which is perfectly situated where Cray has always been. It’s about the productive supercomputer, not about more cores or faster clock speeds. It’s about the least expensive per unit of work, not least expensive per LINPACK cycle.”

The Promise of OpenHPC; the Significance of DataWarp

The OpenHPC play to unify the HPC software stack aligns with Cray’s strategy with the caveat that the effort must truly be processor agnostic. The Linux Foundation involvement was a big factor in Cray signing on, but having seen similar missions thwarted, their approach is one of cautious optimism. The supercomputing company wants to focus on the places where it can differentiate, Bolding shared. “For Cray that’s at scale, but there are other places in the stack that aren’t differentiators, they are rites of passage and if we can collaborate with the community to streamline that, it means more R&D money can be spent on the high value to the customer.”

When it comes to differentiating – for Cray it’s about moving data in and out, moving data around the stack, and this is why DataWarp is such a significant investment.

Bolding said that the DataWarp burst buffer technology is fundamental to Cray’s roadmap going far out into the future.

He highlighted two sites that have pretty large instantiations with hundreds of SSD cards – the Trinity system at LANL, where Cray delivered the Xeon Haswell portion already, and King Abdullah University of Science and Technology (KAUST) .

“People have been historically buying bandwidth by buying more spindles and at systems where we sold tens or twenty thousand spinning disks to build a big Lustre file system, that’s great – but you don’t necessarily want to buy more disks to get more bandwidth,” he said. “You want to buy more disks to get more storage, and then you want to be able to get bandwidth from something that is less expensive and maybe not as a high on capacity. That’s what DataWarp does. So we’re seeing higher bandwidth from this KAUST installation than we’re getting from the biggest Lustre installation we’ve done and very high IOPS too, whereas spinning disks aren’t great at IOPS.”

Cray doesn’t build its own storage, rather it works with Seagate on a version of their ClusterStor storage for Lustre. The Cray-branded Sonexion storage system is a derivative of ClusterStor line, part of an agreement originally started with Xyratex before it was acquired by Seagate in March 2014.

One place where DataWarp is not a replacement for Lustre is for small installations, Bolding noted. “At certain sizes, you need to have that capacity, you can get that balance, and it doesn’t make sense to only have an SSD layer, which is still pretty expensive relative to disk,” he explained.

Big Data Detour?

Given Cray’s current unification mantra, it’s worth noting that its big data bifurcation, Yarc Data, ended with the division being reabsorbed by Cray one year ago. Others in the HPC space that took similar circuitous routes to embrace big data are having the same awakening. I asked Bolding what he makes of it.

“You have to consider the emergence of text-based search being the history of big data. There’s always been BI, databases, structured databases, and it’s still there,” said Bolding. “But the innovation that Google and others brought with MapReduce to be able to just search and open up search as a way to sift through data is tremendously powerful, but they designed it for text-based search. Then, the community decided to use it for cognition, for graphs. Suddenly you had this explosion of use cases that sprung up around text-based search which we call analytics, but analytics always was there. It’s not new, it’s a new way of doing it.

“And now we’re looking at relational databases, MapReduce, Spark and big simulations – wow these are all really powerful. How can we get them all to work together? But you don’t want to have three systems.”

Cray still has its Urika-XA Hadoop platform and Urika-GD, its graph product. Bolding said these have been successful, but because Urika provides such a competitive advantage, customers haven’t been willing to disclose those wins like they have with XC and CS.

But more to the point in terms of future roadmap, they don’t want to buy two infrastructures. Cray is working on bringing the analytics and big data technologies together.

Cray wasn’t ready to disclose actual product details yet, but its strategy is crystal clear: working towards a united platform. It’s what makes sense and it’s what their customers want, noted Bolding.

“Urika is going to continue,” Bolding affirmed, “It’s a big investment for us, but can we marry technologies between these two lines?”

Cray says that more will be revealed soon. The company will be rolling out announcements around partnerships and technologies in the first quarter and into the second quarter. “From a what we can say today, we are fully dedicated to the convergence of analytics and big data and Urika is a big part of that and the evolution of Urika is a big part of that.”

Big Growth Supercomputing

Cray has seen strong commercial growth in the last year. It’s anticipating over 15 percent of its revenue to be in commercial in 2015, which is more than twice as much as a percentage of revenue over last year. This strong growth is driven by multiple segments — namely manufacturing, energy, oil and gas, financial services and life sciences — and by very complex algorithms, Bolding commented. “Many industries are driving high bandwidth, low-latency networks,” he added. “They need productivity, all the things that Cray’s been focused on for the last few years.”

This doubling of percentage of revenue happened while their revenue has grown significantly. Cray’s last earnings call projected $715 million in revenue for 2015, which is a growth of about 20 percent over the previous year in top-line revenue. The stock prices have reflected the positive projections with a rise from the low 20s up to mid 30s.

Cray has been averaging about 20 percent growth the last few years, bucking the common wisdom that says that there’s no money to be made in supercomputing. According to Bolding, Cray sets its targets internally by aiming for about twice the market rate. “So if IDC says it’s 8 percent, we want to grow at 16 percent – that’s the way we look at it – it means that you’re taking market share and you’re growing,” Bolding stated.

“And we can do that for a number of reasons at the high-end,” he continued. “We are more complimentary to cloud than competitive, so cloud is not eating our lunch today, and we are focused on keeping that from happening in the future. Two, there is chaos in the competitive landscape. BlueGene isn’t in the marketplace, POWER is not doing very well, and Lenovo is uncertain for some. It provides an opportunity for us and a few other vendors.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

HPC Career Notes: April 2023 Edition

April 1, 2023

In this monthly feature, we’ll keep you up-to-date on the latest career developments for individuals in the high-performance computing community. Whether it’s a promotion, new company hire, or even an accolade, we’ Read more…

Q&A with Dorian C. Arnold, SC23 General Chair, and an HPCwire Person to Watch in 2023

March 31, 2023

SC23 General Chair Dorian C. Arnold is enthusiastic about this year's conference, which will take place Nov. 12-17 in Denver, Colo. Our exclusive interview with Arnold covers his history with the annual event, what's in store for attendees, and his insights into the HPC landscape writ large. In addition to his work with SC, Arnold is also... Read more…

Intel Issues Roadmap Update, Aims for ‘Scheduled Predictability’

March 30, 2023

Intel held an investor webinar yesterday, with the chip giant working to project consistency and confidence amid slipping roadmaps and market share. At the event, Intel primarily focused on where it stands with four (!) Read more…

Intel’s Server Chips Are ‘Lead Vehicles’ for Manufacturing Strategy

March 30, 2023

…But chipmaker still does not have an integrated product strategy, which puts the company behind AMD and Nvidia. Intel finally has a full complement of server and PC chips it will release in the coming years, which will determine whether it has regained its leadership in chip manufacturing. The chipmaker this week... Read more…

JPMorgan Chase, QC Ware Report Progress in Quantum DL for Deep Hedging

March 30, 2023

Hedging is, of course, a ubiquitous practice in FS and there are well-developed classical computational approaches for implementing this risk mitigation strategy. The challenge has been the computational cost and time-to Read more…

AWS Solution Channel

Shutterstock 531739477

Checkpointing HPC applications using the Spot Instance two-minute notification from Amazon EC2

Amazon Elastic Compute Cloud (Amazon EC2) offers a wide-range of compute instances at different price points, all designed to match different customer’s needs. You can further optimize cost by choosing Reserved Instances (RIs) and even Spot Instances. Read more…

 

Get the latest on AI innovation at NVIDIA GTC

Join Microsoft at NVIDIA GTC, a free online global technology conference, March 20 – 23 to learn how organizations of any size can power AI innovation with purpose-built cloud infrastructure from Microsoft. Read more…

Destination Earth Takes Form as EuroHPC’s Flagship Workload

March 30, 2023

When the EuroHPC Summit was held last week in Gothenburg, there was a distinct shift in tone for the maturing supercomputing play. With LUMI and Leonardo – plus four other petascale systems – already operational, the Read more…

Intel Issues Roadmap Update, Aims for ‘Scheduled Predictability’

March 30, 2023

Intel held an investor webinar yesterday, with the chip giant working to project consistency and confidence amid slipping roadmaps and market share. At the even Read more…

Intel’s Server Chips Are ‘Lead Vehicles’ for Manufacturing Strategy

March 30, 2023

…But chipmaker still does not have an integrated product strategy, which puts the company behind AMD and Nvidia. Intel finally has a full complement of server and PC chips it will release in the coming years, which will determine whether it has regained its leadership in chip manufacturing. The chipmaker this week... Read more…

Destination Earth Takes Form as EuroHPC’s Flagship Workload

March 30, 2023

When the EuroHPC Summit was held last week in Gothenburg, there was a distinct shift in tone for the maturing supercomputing play. With LUMI and Leonardo – pl Read more…

What’s Stirring in Nvidia’s R&D Lab? Chief Scientist Bill Dally Provides a Peek

March 28, 2023

In what’s become a regular GPU Technology Conference feature, Bill Dally, Nvidia chief scientist and SVP of research, provides a glimpse into how Nvidia organ Read more…

Cost-effective Fork of GPT-3 Released to Scientists

March 28, 2023

Researchers looking to create a foundation for a ChatGPT-style application now have an affordable way to do so. Cerebras is releasing open-source learning models for researchers with the ingredients necessary to cook up their own ChatGPT-AI applications. The open-source tools include seven models that form a learning... Read more…

Pegasus ‘Big Memory’ Supercomputer Now Deployed at the University of Tsukuba

March 25, 2023

In the bevy of news from Nvidia's GPU Technology Conference this week, another new system has come to light: Pegasus, which entered operations at the University Read more…

EuroHPC Summit: Tackling Exascale, Energy, Industry & Sovereignty

March 24, 2023

As the 2023 EuroHPC Summit opened in Gothenburg on Monday, Herbert Zeisel – chair of EuroHPC’s Governing Board – commented that the undertaking had “lef Read more…

Nvidia Doubling Down on China Market in the Face of Tightened US Export Controls

March 23, 2023

Chipmakers are tightlipped on China activities following a U.S. crackdown on hardware exports to the country. But Nvidia remains unfazed, and is doubling down o Read more…

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

Leading Solution Providers

Contributors

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

SC22 Booth Videos

AMD @ SC22
Altair @ SC22
AWS @ SC22
Ayar Labs @ SC22
CoolIT @ SC22
Cornelis Networks @ SC22
DDN @ SC22
Dell Technologies @ SC22
HPE @ SC22
Intel @ SC22
Intelligent Light @ SC22
Lancium @ SC22
Lenovo @ SC22
Microsoft and NVIDIA @ SC22
One Stop Systems @ SC22
Penguin Solutions @ SC22
QCT @ SC22
Supermicro @ SC22
Tuxera @ SC22
Tyan Computer @ SC22
  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire