The Cost of HPC in the Cloud

By Scott Clark

May 17, 2010

To continue where we left off with the last blog, this time we are trying to understand cost. When we start considering cloud, the primary driver seems to be economic (cost), thus we need to make sure we address any cost-related barriers associated with adoption of cloud as well as ensure that our expectations are honest and appropriate.

Given that we are talking about HPC, this implies that compute is important to the business in some significant way. So whether your business is in media, EDA, oil & gas, biosciences, pharmaceuticals, financial analysis, or some other computationally intensive field, figuring out how to provide HPC services more efficiently will have an impact on the delivery of the core business product and the bottom line. Because cloud is a combination of people, process, and technology , we should hold off talking about reducing the cost of hardware until later in the blog, and focus on having the appropriate amount of hardware and increasing the efficiency of usage, provisioning, management, etc. We should also focus on linking into the core business, using EDA as an example, the cost of EDA Licenses are a higher order driver that is facilitated by the infrastructure. And finally, what are the consumers of the infrastructure sacrificing because they don’t have enough capacity, resources or lack access to a specific technology or capability?

In the big picture, cloud computing is outsourcing significant portions of what was once IT functions and resources. Some companies have been able to very successfully outsource, and have been happy with their decision and relationship. Others however, have not had such good experiences, and we want to address those negative legacy perception. For any people who survived a bad outsource experience, those bad memories become hurdles that we as an industry must navigate if we intend cloud computing to be successful. Those perceptions come in the form of additional complexity, higher costs, and a disconnect in responsibility.

When we talk about additional complexity, we are commonly referring to inserted complexity from contract structures. One of the smartest people I know once said that managing anything through contracts is no way to run a business. It needs to be about developing relationships, and if we resort to debating points in the contract, the relationship has already broken down, and we should look to mutually vacate that relationship. That is not to say that a contract serves no purpose. It is a statement of intent, and a testimony to the seriousness of the risk one or both of the parties is placing his or her business, reputation, and livelihood in. Contracts are like tactical nukes – it is very important to have them as a mutual deterrent for bad behavior, but using it means we have all lost. Many times in outsource scenarios, the provider wants to charge for any effort not specifically defined in the contract. Technology is a fast paced, evolving world. Things are going to change. In this scenario, focus turns to the contract, or the “rules” and policy, and gets in the way of what it should be about – delivering value and results to the end user.

When we talk about higher costs, we are really talking about additional cost, reduced control, and little or no benefit. In an outsource model, many times the resources are the same or similar to what was in-house originally, only there is less direct control over those resources, and we now have to pay for an additional layer of management from the outsource provider. One of the benefits of using an outsource model is that the provider is part of a larger eco-system, and can deliver a larger variety of resources or different skillsets than what the consumer company could normally provide on their own. If this dynamic is not met by the provider, then are we really gaining anything in exchange for what we gave up. The outsource supplier needs to BE a larger ecosystem. And always the filter of “could this be done better, cheaper, or faster in-house” needs to be passed. We will discuss this point more in the blog on “changes to organizational structure”. Being part of a larger eco-system allows benefits that would be difficult for an isolated entity to achieve:

– Consuming top tier processors for all workloads, trading them for the next generation processor when they are available.

– Access to domain experts at fractional cost in non-dedicated fashion

– Access to very large capacity, paying based on use

– Ability to share cost burden of special resources across multiple customers at different times

The final element is the disconnect in responsibility, where the outsourcer is focused on the business of outsourcing, or the responsibilities of IT, and losing sight of the core business of the customer.  Responsibility to the business is lost in translation. For the customer to feel comfortable relinquishing control of their infrastructure, they need to know that the service provider feels responsible for the success of the customer business and knows the intimate details of how the business works well enough to properly determine how technology could help, and then proactively drive technology to the benefit of the customer. This is an important point. It is not sufficient for the provider to know technology and IT really well, assuming applications and implementations are less relevant. If the customer needs to know his business AND drive the service provider for what he/she needs from technology to help the business, then the customer will continue to own IT in-house, because that cheaper, easier, and faster for the customer. In this new world order (cloud), the service provider needs to be a trusted domain expert advisor to the customer about the customer’s business, which means not just IT. Service providers should not bring in additional distractions, and need to have responsibility as part of the equation

– Maintenance is a series of constant upgrades. Design for change.

– Infrastructure cost should be established as a run rate, not a series of one time buys

– The Infrastructure needs to have an equal seat at the company table (see future blog on control)

– A proper implementation of cloud computing implies changes to organizational structure for control purposes (see future blog on control)

Once we have addressed historical barriers to adoption based on cost, we also need to make sure we appropriately represent the cost benefits that cloud computing promises. Cost savings associated with cloud computing may not be what you think. Cloud is an opportunity to change people, process, and technology. If you are not open to changes on all three fronts, we will not be able to achieve the best value proposition. Just changing technology will result in the same results we have seen in the past (a large source of frustration with IT). We do not expect to get what we use for less money, but the process of getting and using the resource can be made more efficient and therefore the overall solution would be less expensive.

There is a long term opportunity for cost reduction, but it entails some re-education. The hardware industry has been delivering to the Moore’s Law  equation (“2X more every 2 years for the same price” or “1/2 the price every 2 years for the same thing”) for the last 30 years, so the customer’s expectations is that solutions related to hardware fall on that curve as well. As software becomes a larger part of the solution, we have some correcting to do to match the expectations of the customers (and probably not in the direction to make the software industry happy).

Given that we are talking about HPC implies that we are working with a growth oriented workload, so our goal would be to get more value for the same money as the infrastructure grows and evolves out of necessity. In addition, there are lots of things that we are not getting to today that we would like to / should do. Having access to more capacity and different resources would go a long way to correcting the need. Leveraging cloud can eliminate operational activities, making time for design activities. And if we standardize those designs, we improve our negotiating position. The focus should be on differentiating the business, address customer issues and customer of customer issues. More efficient execution in any portion of the process frees up resources to do more in other areas.

Delivery on cloud computing will generate another long term cost benefit. Cost reductions will not be immediate though. Costs will reduce over time as consumption process matures and based on evolution. There will also be a “smushing” effect, much like what happened with the hardware industry, where components at one time were allowed to be priced individually, but then they became component parts of a single system, and the component vendors had to compete for their piece of the pie that was the total system cost. Cloud consumption will also drive commoditization of component resources in a similar way.  Over time, cloud will have the effect of cost reduction of solutions, but not in a simplistic equation. With cloud, we will see costs abstracted to a requirements level, with consumers agreeing to pay for an expected result, and service providers absorbing the responsibility of delivering that expected result, or paying pre-negotiated indemnity. The service provider will then be in the position to provide that service with any combination of technologies they choose, and will be held accountable for meeting the agreed to performance metrics (what, not how). Things like the OS, virtualization technologies, monitoring, provisioning, will become part of the end solution. Customers will be buying services, not products, so name brands start to mean less. Features and capabilities are up to the cloud supplier to provide however they like, as long as the service level (requirement) is met. Customers will want a service and a price appropriate to the service level.  This will take a little time because it is essentially building competitors to what exists today, and existing vendors will also begin competing at these new price points.

We will eventually achieve the cost benefit that is envisioned as the advantage of cloudd computing. But we will need to make that a reality by creating the market through demand, and then holding the suppliers accountable for delivering the solutions necessary at prices that are appropriate.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

HPE and NREL Collaborate on AI Ops to Accelerate Exascale Efficiency and Resilience

November 18, 2019

The ever-expanding complexity of high-performance computing continues to elevate the concerns posed by massive energy consumption and increasing points of failure. Now, the AI Ops collaboration between Hewlett Packard En Read more…

By Oliver Peckham

Intel Debuts New GPU – Ponte Vecchio – and Outlines Aspirations for oneAPI

November 17, 2019

Intel today revealed a few more details about its forthcoming Xe line of GPUs – the top SKU is named Ponte Vecchio and will be used in Aurora, the first planned U.S. exascale computer. Intel also provided a glimpse of Read more…

By John Russell

SC19: Welcome to Denver

November 17, 2019

A significant swath of the HPC community has come to Denver for SC19, which began today (Sunday) with a rich technical program. As is customary, the ribbon cutting for the Expo Hall opening is Monday at 6:45pm, with the Read more…

By Tiffany Trader

SC19’s HPC Impact Showcase Chair: AI + HPC a ‘Speed Train’

November 16, 2019

This year’s chair of the HPC Impact Showcase at the SC19 conference in Denver is Lori Diachin, who has spent her career at the spearhead of HPC. Currently deputy director for the U.S. Department of Energy’s (DOE) Read more…

By Doug Black

Microsoft Azure Adds Graphcore’s IPU

November 15, 2019

Graphcore, the U.K. AI chip developer, is expanding collaboration with Microsoft to offer its intelligent processing units on the Azure cloud, making Microsoft the first large public cloud vendor to offer the IPU designe Read more…

By George Leopold

AWS Solution Channel

Making High Performance Computing Affordable and Accessible for Small and Medium Businesses with HPC on AWS

High performance computing (HPC) brings a powerful set of tools to a broad range of industries, helping to drive innovation and boost revenue in finance, genomics, oil and gas extraction, and other fields. Read more…

IBM Accelerated Insights

Data Management – The Key to a Successful AI Project

 

Five characteristics of an awesome AI data infrastructure

[Attend the IBM LSF & HPC User Group Meeting at SC19 in Denver on November 19!]

AI is powered by data

While neural networks seem to get all the glory, data is the unsung hero of AI projects – data lies at the heart of everything from model training to tuning to selection to validation. Read more…

At SC19: What Is UrgentHPC and Why Is It Needed?

November 14, 2019

The UrgentHPC workshop, taking place Sunday (Nov. 17) at SC19, is focused on using HPC and real-time data for urgent decision making in response to disasters such as wildfires, flooding, health emergencies, and accidents. We chat with organizer Nick Brown, research fellow at EPCC, University of Edinburgh, to learn more. Read more…

By Tiffany Trader

HPE and NREL Collaborate on AI Ops to Accelerate Exascale Efficiency and Resilience

November 18, 2019

The ever-expanding complexity of high-performance computing continues to elevate the concerns posed by massive energy consumption and increasing points of failu Read more…

By Oliver Peckham

Intel Debuts New GPU – Ponte Vecchio – and Outlines Aspirations for oneAPI

November 17, 2019

Intel today revealed a few more details about its forthcoming Xe line of GPUs – the top SKU is named Ponte Vecchio and will be used in Aurora, the first plann Read more…

By John Russell

SC19: Welcome to Denver

November 17, 2019

A significant swath of the HPC community has come to Denver for SC19, which began today (Sunday) with a rich technical program. As is customary, the ribbon cutt Read more…

By Tiffany Trader

SC19’s HPC Impact Showcase Chair: AI + HPC a ‘Speed Train’

November 16, 2019

This year’s chair of the HPC Impact Showcase at the SC19 conference in Denver is Lori Diachin, who has spent her career at the spearhead of HPC. Currently Read more…

By Doug Black

Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020

November 12, 2019

The number of top-tier HPC systems makers has shrunk due to a steady march of M&A activity, but there is increased diversity and choice of processing compon Read more…

By Tiffany Trader

Intel AI Summit: New ‘Keem Bay’ Edge VPU, AI Product Roadmap

November 12, 2019

At its AI Summit today in San Francisco, Intel touted a raft of AI training and inference hardware for deployments ranging from cloud to edge and designed to support organizations at various points of their AI journeys. The company revealed its Movidius Myriad Vision Processing Unit (VPU)... Read more…

By Doug Black

IBM Adds Support for Ion Trap Quantum Technology to Qiskit

November 11, 2019

After years of percolating in the shadow of quantum computing research based on superconducting semiconductors – think IBM, Rigetti, Google, and D-Wave (quant Read more…

By John Russell

Tackling HPC’s Memory and I/O Bottlenecks with On-Node, Non-Volatile RAM

November 8, 2019

On-node, non-volatile memory (NVRAM) is a game-changing technology that can remove many I/O and memory bottlenecks and provide a key enabler for exascale. That’s the conclusion drawn by the scientists and researchers of Europe’s NEXTGenIO project, an initiative funded by the European Commission’s Horizon 2020 program to explore this new... Read more…

By Jan Rowell

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

Using AI to Solve One of the Most Prevailing Problems in CFD

October 17, 2019

How can artificial intelligence (AI) and high-performance computing (HPC) solve mesh generation, one of the most commonly referenced problems in computational engineering? A new study has set out to answer this question and create an industry-first AI-mesh application... Read more…

By James Sharpe

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

D-Wave’s Path to 5000 Qubits; Google’s Quantum Supremacy Claim

September 24, 2019

On the heels of IBM’s quantum news last week come two more quantum items. D-Wave Systems today announced the name of its forthcoming 5000-qubit system, Advantage (yes the name choice isn’t serendipity), at its user conference being held this week in Newport, RI. Read more…

By John Russell

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

Crystal Ball Gazing: IBM’s Vision for the Future of Computing

October 14, 2019

Dario Gil, IBM’s relatively new director of research, painted a intriguing portrait of the future of computing along with a rough idea of how IBM thinks we’ Read more…

By John Russell

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Intel Confirms Retreat on Omni-Path

August 1, 2019

Intel Corp.’s plans to make a big splash in the network fabric market for linking HPC and other workloads has apparently belly-flopped. The chipmaker confirmed to us the outlines of an earlier report by the website CRN that it has jettisoned plans for a second-generation version of its Omni-Path interconnect... Read more…

By Staff report

Kubernetes, Containers and HPC

September 19, 2019

Software containers and Kubernetes are important tools for building, deploying, running and managing modern enterprise applications at scale and delivering enterprise software faster and more reliably to the end user — while using resources more efficiently and reducing costs. Read more…

By Daniel Gruber, Burak Yenier and Wolfgang Gentzsch, UberCloud

Dell Ramps Up HPC Testing of AMD Rome Processors

October 21, 2019

Dell Technologies is wading deeper into the AMD-based systems market with a growing evaluation program for the latest Epyc (Rome) microprocessors from AMD. In a Read more…

By John Russell

Rise of NIH’s Biowulf Mirrors the Rise of Computational Biology

July 29, 2019

The story of NIH’s supercomputer Biowulf is fascinating, important, and in many ways representative of the transformation of life sciences and biomedical res Read more…

By John Russell

Xilinx vs. Intel: FPGA Market Leaders Launch Server Accelerator Cards

August 6, 2019

The two FPGA market leaders, Intel and Xilinx, both announced new accelerator cards this week designed to handle specialized, compute-intensive workloads and un Read more…

By Doug Black

Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020

November 12, 2019

The number of top-tier HPC systems makers has shrunk due to a steady march of M&A activity, but there is increased diversity and choice of processing compon Read more…

By Tiffany Trader

When Dense Matrix Representations Beat Sparse

September 9, 2019

In our world filled with unintended consequences, it turns out that saving memory space to help deal with GPU limitations, knowing it introduces performance pen Read more…

By James Reinders

With the Help of HPC, Astronomers Prepare to Deflect a Real Asteroid

September 26, 2019

For years, NASA has been running simulations of asteroid impacts to understand the risks (and likelihoods) of asteroids colliding with Earth. Now, NASA and the European Space Agency (ESA) are preparing for the next, crucial step in planetary defense against asteroid impacts: physically deflecting a real asteroid. Read more…

By Oliver Peckham

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This