Power8 with NVLink Coming to the Nimbix Cloud

By Tiffany Trader

October 6, 2016

Starting later this month, HPC professionals and data scientists wishing to try out NVLink’d Nvidia Pascal P100 GPUs won’t have to spend upwards of $100,000 on NVIDIA’s DGX-1 server or fork over about half that for IBM’s Power8 server with NVLink and four Pascal GPUs. Soon they’ll be able to get the power of Pascal in a public cloud.

On Wednesday (Oct. 5), the Dallas, Texas-based cloud provider Nimbix revealed that it was adding IBM Power S822LC for HPC systems (codenamed “Minsky”) to its heterogeneous HPC cloud platform. Target markets include high-performance computing, data analytics, in-memory databases, and machine learning.

“We are definitely the first public cloud to deploy the Minsky technology and one of the first to deploy Power8 in a high performance high-scalability setting,” said Leo Reiter, CTO and vice president of software engineering at Nimbix. “Obviously what was really interesting on Minksky in addition to Power8 was the Pascal GPUs and we’ve integrated that with our JARVICE platform so it’s a seamless experience for both end users and developers.”

Unveiled by IBM last month, the new Power8 with NVLink processor features 10 cores running up to 3.26 GHz. The processors have higher memory bandwidth than x86 CPUs at 115 GB/s and can have as much as half a terabyte of system memory per socket. There are larger caches per core inside the Power8 processor, and this coupled with the faster cores and memory bandwidth leads to higher application performance and throughput.

The NVIDIA Tesla P100 for NVLink-optimized servers is Nvidia’s most performant GPU yet, delivering a whopping 5.3 teraflops of double-precision performance, 10.6 teraflops of single-precision, and 21 teraflops of half-precision. The accelerator card includes 16 gigabytes of the HBM2 stacked memory with an on-GPU memory bandwidth of 720 GB/s. The Tesla P100 with NVLink GPU in the SXM2 (Mezzanine) form factor, currently only shipping in the DGX-1 and the Minsky platform, delivers 13 percent more raw compute performance than the PCIe variant due to the higher TDP (300 watts versus 250 watts).

IBM.POWER8.NVLINKCrucially for many users that Nimbix is targeting with the new hardware, the Minsky platform provides high-bandwidth NVLink connections between the CPU and the GPUs and from GPU to GPU. IBM says the NVLink optimized Power8 servers “enable data to flow 5x faster” than on a comparable x86-based system.

Along with its support for HPC and deep learning workflows, Nimbix says adoption of GPU-accelerated databases is advancing quickly. “Accelerated analytics, like in-memory databases, benefit so much from having the Pascal GPUs as well as the high performance link between them to the point where some customers are getting multiple times performance boost for advanced queries,” says Reiter.

Nimbix is working with Kinetica and MapD to facilitate the use of the NVLink optimized Power servers for database acceleration. Reiter says Kinetica has the ability to scale horizontally across multiple systems so it’s not just being able to take advantage of the GPUs on a single box but to scale across using a high performance fabric like the one at Nimbix.

“So you have both the high density in each chassis where you are going to have four Pascal GPUs with this high performance link, a lot of host memory but then also being able to scale that horizontally to dozens of machines at the same time to be able to do these accelerated databases,” says Reiter.

From its start in 2010, Nimbix has focused on high-performance heterogeneous cloud computing. “While it’s true that the market is heavily tilted toward Intel in terms of the system architecture, we already have the capabilities of running heterogeneous compute, both accelerators as well as central processors, so it wasn’t a technology challenge for us to deploy a non-Intel architecture into our existing cloud,” explains Reiter.

“We’re not requiring that people embrace the system architecture change. We’re not selling architecture here, what we’re selling is turnkey workflows that happen to run in a more optimized way when they’re hooked into the right resources.”

Reiter notes that they’ve been working with Nvidia for a while now and they have multiple datacenters outfitted with Tesla K80s and other Nvidia chips.

Now they’re also partnering with IBM, whom they say has provided and continues to provide a lot of support. “They are extremely motivated to speed Power,” says Reiter, “not just in HPC, but in cloud specifically. And there are a couple reasons. One is that the traction in public cloud for Power is almost non-existent, relatively speaking, but more importantly, even a lot of their customers who are buying Power clusters are asking them for an actual cloud bursting strategy. It’s not just for capacity, but it’s for a lot of these bids now, people are looking at emerging technology in the bids.”

The secret sauce of the Nimbix cloud is the JARVICE container runtime. “The native execution model for JARVICE is containers running on bare metal, so there are no virtual machines,” says Reiter. “There are containers but these are custom-built containers, not Docker. There was too much overhead and complexity and performance loss with getting Docker to run, especially for tightly-coupled, high performance workloads, so we designed our container technology from the ground up, and [with our PushtoCompute technology] we accept ordinary Docker containers as input and convert them on the fly to run natively on the JARVICE platform.”

Asked if they would ever productize JARVICE outside of the Nimbix cloud, Reiter said they are happy to discuss that with anyone who is interested but do not have an immediate play to offer it as an off-the-shelf software product. That said, Nimbix does have select datacenter customers who are trying out the software.

Nimbix’s specialty is enabling turnkey cloud via a software-as-a-service delivery model. It’s not for the user looking to spin up virtual clouds.

“IaaS public cloud is fine for dev-tests of single machine instances,” says Reiter. “Sometimes you want to test out some code and see if it works and that’s great. But when we’re talking about deploying tightly-coupled workflows at scale, deploying the software and tuning the software is extremely complicated.”

“What customers enjoy on Nimbix is they look for the workflow they want to run, they click on it, they specify whatever parameters are relevant to that workflow and they click submit and then their data comes back processed the way they want it to without having to care about ‘How am I going to scale this? How am I going to install it? Am I running the right version? Do I have the right libraries installed?’ JARVICE takes care of all of that – and it’s extensible through technologies like PushtoCompute to enable the onboarding of more and more functionality.”

Every machine in the Nimbix cloud is InfiniBand connected via a Mellanox EDR InfiniBand spine and FDR InfiniBand to the compute nodes. JARVICE also employs distributed block storage and distributed storage over InfiniBand, plus 20GB Ethernet (bonded 10GB) for accessing the internet.

Nimbix expects to have customers using the Minsky platform publicly by the end of this month and prior to that will be conducting benchmarking tests with early-access customers. Initially, demand will likely outstrip availability, and Nimbix says it’s already planning the next step of the expansion, essentially taking orders as soon as IBM can ship.

“The line is there and it’s only going to get bigger,” observes Reiter. “We’re excited to be able service these customers and position ourselves to service more in the future.”

Pricing has not yet been disclosed, but Minsky will be offered in single-GPU and quad-GPU units – via subscription or pay-as-you-go.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Nvidia Showcases Work with Quantum Centers at ISC24

May 13, 2024

With quantum computing surging in Europe, Nvidia took advantage of ISC24 to showcase its efforts working with quantum development centers. Currently, Nvidia GPUs are dominant inside classical systems used for quantum sim Read more…

ISC24: Hyperion Research Predicts HPC Market Rebound after Flat 2023

May 13, 2024

First, the top line: the overall HPC market was flat in 2023 at roughly $37 billion, bogged down by supply chain issues and slowed acceptance of some larger systems (e.g. exascale), according to Hyperion Research’s ann Read more…

Top 500: Aurora Breaks into Exascale, but Can’t Get to the Frontier of HPC

May 13, 2024

The 63rd installment of the TOP500 list is available today in coordination with the kickoff of ISC 2024 in Hamburg, Germany. Once again, the Frontier system at Oak Ridge National Laboratory in Tennessee, USA, retains its Read more…

Harvard/Google Use AI to Help Produce Astonishing 3D Map of Brain Tissue

May 10, 2024

Although LLMs are getting all the notice lately, AI techniques of many varieties are being infused throughout science. For example, Harvard researchers, Google, and colleagues published a 3D map in Science this week that Read more…

ISC Preview: Focus Will Be on Top500 and HPC Diversity 

May 9, 2024

Last year's Supercomputing 2023 in November had record attendance, but the direction of high-performance computing was a hot topic on the floor. Expect more of that at the upcoming ISC High Performance 2024, which is hap Read more…

Processor Security: Taking the Wong Path

May 9, 2024

More research at UC San Diego revealed yet another side-channel attack on x86_64 processors. The research identified a new vulnerability that allows precise control of conditional branch prediction in modern processors.� Read more…

ISC24: Hyperion Research Predicts HPC Market Rebound after Flat 2023

May 13, 2024

First, the top line: the overall HPC market was flat in 2023 at roughly $37 billion, bogged down by supply chain issues and slowed acceptance of some larger sys Read more…

Top 500: Aurora Breaks into Exascale, but Can’t Get to the Frontier of HPC

May 13, 2024

The 63rd installment of the TOP500 list is available today in coordination with the kickoff of ISC 2024 in Hamburg, Germany. Once again, the Frontier system at Read more…

ISC Preview: Focus Will Be on Top500 and HPC Diversity 

May 9, 2024

Last year's Supercomputing 2023 in November had record attendance, but the direction of high-performance computing was a hot topic on the floor. Expect more of Read more…

Illinois Considers $20 Billion Quantum Manhattan Project Says Report

May 7, 2024

There are multiple reports that Illinois governor Jay Robert Pritzker is considering a $20 billion Quantum Manhattan-like project for the Chicago area. Accordin Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos of wh Read more…

How Nvidia Could Use $700M Run.ai Acquisition for AI Consumption

May 6, 2024

Nvidia is touching $2 trillion in market cap purely on the brute force of its GPU sales, and there's room for the company to grow with software. The company hop Read more…

Hyperion To Provide a Peek at Storage, File System Usage with Global Site Survey

May 3, 2024

Curious how the market for distributed file systems, interconnects, and high-end storage is playing out in 2024? Then you might be interested in the market anal Read more…

Qubit Watch: Intel Process, IBM’s Heron, APS March Meeting, PsiQuantum Platform, QED-C on Logistics, FS Comparison

May 1, 2024

Intel has long argued that leveraging its semiconductor manufacturing prowess and use of quantum dot qubits will help Intel emerge as a leader in the race to de Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Leading Solution Providers

Contributors

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos of wh Read more…

Intel Plans Falcon Shores 2 GPU Supercomputing Chip for 2026  

August 8, 2023

Intel is planning to onboard a new version of the Falcon Shores chip in 2026, which is code-named Falcon Shores 2. The new product was announced by CEO Pat Gel Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

How the Chip Industry is Helping a Battery Company

May 8, 2024

Chip companies, once seen as engineering pure plays, are now at the center of geopolitical intrigue. Chip manufacturing firms, especially TSMC and Intel, have b Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire