Alces Tries Flight on AWS and Life as an HPC SaaS Provider

By John Russell

June 9, 2016

Today, the community of HPC-in-the-cloud solution providers is fairly limited. Giant hyperscalers – notably Amazon (AWS), Google (GCP) and Microsoft (Azure) – remain at the core and keep expanding their HPC resources. Circling around them are HPC ‘services specialists’ plugging clients into cloud providers and striving to ease delivery of HPC-in-the-cloud to make good on the promise of improved efficiencies and cost reductions. Last week a new HPC Software as a Service (SaaS) player popped up on AWS marketplace – Alces Flight, a UK-based company with roots as an HPC integrator.

Predictably the company’s HPC background and the obvious opportunity drove the initiative explains Wil Mayers, director of research and development for Alces, “Organizations are starting to pivot from an on-premise first attitude to a much more cost-sensitive way of working and thinking. That means looking at public clouds. During this transition period we wanted to have a product that could give end users – scientists and engineers – the same sort of look and feel of an HPC cluster irrespective of the platform.”

For the last twelve months Alces has been working with AWS tweaking its offering which remains modest – maximum cluster deployment of around 1,000 cores (more details below) – but with plans for significant compute capacity growth. Like many such companies, the idea is to present a front-end that simplifies HPC application and workflow deployment and plays nicely with Amazon’s many powerful features. Indeed, Alces boasts its HPC cluster provisioning and SaaS offering already has a library of 750 scientific applications available for AWS users.

There are the big hitters you’d expect, GROMACS and NAMD, for example, “but also many others particularly if you look at something like bioinformatics where there are literally hundred and hundreds of different libraries and tools and utilities that people may want to use. A gene sequencing workflow might include 10 or 20 or 30 different tools, all run in sequence,” says Mayers.

The Alces Flight offering has a typical set of provisioning and SaaS components such as a job scheduler, shared file system, and management tools. Its large library of applications and HPC familiarity are important differentiators. So too, says Mayers is Gridware, its repository and orchestrator described in company literature as “applications, libraries and services you need to get working, packaged with care and attention, favoring convention over configuration.”

“The Gridware project is something we host on GitHub. It’s a big library of applications (about 800) and what Gridware is designed to do is when you invoke it and run it on a cluster, it looks at the cluster, looks at the tool you want to install, and you can tell it to go find the source code for the application and dynamically compile it for the environment that you have. It’s aware of dependencies including things like the operating system,” he says. It’s even aware of the particular MPI, interconnect types, and compilers you have chosen.

alces_flight_marketplace_badge_1One addition made to the most recent Gridware release is “a binary option as well for using those lists of instructions and there are compilations options for different packages. We have actually precompiled a lot of the applications for their instance types that are available on AWS,” he says. “So rather than having to compile on the fly – compilation gives you a lot a flexibility but it can also take a little bit of time – we can on a Alces Flight cluster, go and grab the applications and use them directly from an Amazon S3 bucket that lives on AWS.”

There are management tools including a GUI that allows multiple users to connect to the same cluster at the same time. There are also storage management tools, which not only link to an S3 account if a user has one, but also support back ends such as DropBox and Google drive.

While it’s certainly early days, Alces reports already running on the order of 20 clusters on AWS around world. Given the Euro-centricity of its customer base, complying with privacy requirements is critical especially in the bioscience and health care sectors. During the AWS beta testing a couple UK National Health Service clients were working with annonymized data and running only on systems located in the UK – that’s because of strict NHS requirements health remain on “sovereign” ground.

Even in basic research this is the case, Mayers points out. “While there’s a lot of collaborative work, they want the collaboration to be done in a controlled way. What we have found is European users usually want to stay within their region,” he says, adding the AWS’s plans to ramp up capabilities in London later this year will likely open the doors to do more business.

Given the cost pressures on hospitals there, for example, Mayers thinks the cloud will be very attractive. “Using AWS will sometimes mean not only can they reduce their costs by a factor of three or four, they can also get much bigger clusters and get diagnostic work done much faster.”

The opportunities seem enormous and worldwide thinks Alces. Others do as well. There are a few players already in the HPC-in-the-cloud market each with its own flavor of services and value proposition. Cycle Computing and UberCloud are two that come to mind. It will be interesting to watch Alces navigate the new waters. Scaling is currently modest on the AWS Marketplace version of Flight Compute but a higher ceiling is in the works; and users have other options to achieve higher core counts.

“We have a product today that scales to just over a 1,000 cores, so that’s 32 compute nodes. That limits the market,” agrees Mayers. The reason for the constraint, he says, is “We have a fairly straightforward storage deployment option at the moment. We are working with the Intel team for Lustre and the BeeGFS team in Germany and that will allow us to scale above 32 nodes up to 64, and up to some hundreds of nodes within a single cluster environment.”

Mayers stresses that it is just the AWS Marketplace version of Flight Compute which is limited to 32-nodes. The company has tested launching clusters up to 256 nodes in the AWS spot market (which is around 9,000 cores).

“The Marketplace is all about instant-access for users who want to ‘learn by doing’ rather than interacting with a vendor, so we’ve limited that to 32-nodes for the first release to ensure that new users get a good first-time experience,” explains Mayers. “For future Marketplace releases, we’re hoping to package up larger solutions with an appropriate choice of storage technologies to deliver scalable performance in all directions. Our roadmap has the 2016.3 version of Flight Compute available in Marketplace for September this year, but users can launch a Flight Cluster of any size with BeeGFS or Lustre today using appropriate AWS Cloudformation templates.”

Alces supports both Spot Instances and On-Demand instances, giving users needed choice. The company has also leveraged Amazon’s flexibility here: “So the obvious one is an automatically scaling. The job scheduler feeds into AWS and instructs the infrastructure how big it needs to be in terms of compute node.” It shrinks and expands the environment as needed.

“We try to give people the option with Flight when you launch a cluster you can request it to launch on-demand, which give you a guarantee that these instances are going to be permanent. You can also choose to launch a spot instance and Flight will allow you to enter the bid price you want. You can choose how much you want to spend. If a node does get killed because it’s blocked because your bid is too low, the job will return to a queue state on the cluster and can be submitted when the spot price falls again to a level your happy with,” says Mayers.

How all of this cloud-based HPC capability will be used is still evolving. There are, of course, the usual workflows. There is bursting to the cloud when in-house capacity is strained. That said, Mayers expects HPC in the cloud will be used several ways. For example, it might be used as a cost-effective HPC training tool in a university setting. Parking on-premise HPC capability temporarily in the cloud while the actual on-premise infrastructure undergoes change or maintenance is another. There’s no shortage of ideas.

Currently, Alces Flight’s traditional HPC integration is remains by far the largest piece of its business accounting for around 90 percent. Its integration business model, says Mayers differs from many HPC integration players in that it doesn’t buy and sell equipment; instead it works “mostly tier one” vendors and connects customers directly with them. The revenue comes from services (installation, integration, life cycle servicer, etc.).

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Machines, Connections, Data, and Especially People: OAC Acting Director Amy Friedlander Charts Office’s Blueprint for Innovation

August 3, 2020

The path to innovation in cyberinfrastructure (CI) will require continued focus on building HPC systems and secure connections between them, in addition to the increasingly important goals of data best practices and work Read more…

By Ken Chiacchia, Pittsburgh Supercomputing Center/XSEDE

Nvidia Said to Be Close on Arm Deal

August 3, 2020

GPU leader Nvidia Corp. is in talks to buy U.K. chip designer Arm from parent company Softbank, according to several reports over the weekend. If consummated, analysts said the acquisition would cement Nvidia’s stat Read more…

By George Leopold

Summer Reading: Here’s a Quantum Advantage You Can Bet On!

August 3, 2020

While quantum computing researchers today vigorously chase a demonstration of a quantum advantage – an application which when run on a quantum computer provides sufficient advantage to warrant switching from a classica Read more…

By John Russell

What’s New in HPC Research: the LHC, Nuclear Reactors, Legion & More

August 1, 2020

In this bimonthly feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

By Oliver Peckham

HPC Career Notes: August 2020 Edition

August 1, 2020

In this monthly feature, we’ll keep you up-to-date on the latest career developments for individuals in the high-performance computing community. Whether it’s a promotion, new company hire, or even an accolade, we’ Read more…

By Mariana Iriarte

AWS Solution Channel

AWS announces the release of AWS ParallelCluster 2.8.0

AWS ParallelCluster is a fully supported and maintained open source cluster management tool that makes it easy for scientists, researchers, and IT administrators to deploy and manage High Performance Computing (HPC) clusters in the AWS cloud. Read more…

Intel® HPC + AI Pavilion

Supercomputing the Pandemic: Scientific Community Tackles COVID-19 from Multiple Perspectives

Since their inception, supercomputers have taken on the biggest, most complex, and most data-intensive computing challenges—from confirming Einstein’s theories about gravitational waves to predicting the impacts of climate change. Read more…

Heterogeneous Computing Gets a Code Similarity Tool

July 31, 2020

A machine programming framework for heterogeneous computing championed by Intel Corp. and university partners is built around an automated engine that analyzes code for similarities. The approach could eventually allow n Read more…

By George Leopold

Machines, Connections, Data, and Especially People: OAC Acting Director Amy Friedlander Charts Office’s Blueprint for Innovation

August 3, 2020

The path to innovation in cyberinfrastructure (CI) will require continued focus on building HPC systems and secure connections between them, in addition to the Read more…

By Ken Chiacchia, Pittsburgh Supercomputing Center/XSEDE

Nvidia Said to Be Close on Arm Deal

August 3, 2020

GPU leader Nvidia Corp. is in talks to buy U.K. chip designer Arm from parent company Softbank, according to several reports over the weekend. If consummated Read more…

By George Leopold

Intel’s 7nm Slip Raises Questions About Ponte Vecchio GPU, Aurora Supercomputer

July 30, 2020

During its second-quarter earnings call, Intel announced a one-year delay of its 7nm process technology, which it says it will create an approximate six-month shift for its CPU product timing relative to prior expectations. The primary issue is a defect mode in the 7nm process that resulted in yield degradation... Read more…

By Tiffany Trader

PEARC20 Plenary Introduces Five Upcoming NSF-Funded HPC Systems

July 30, 2020

Five new HPC systems—three National Science Foundation-funded “Capacity” systems and two “Innovative Prototype/Testbed” systems—will be coming onlin Read more…

By Ken Chiacchia, Pittsburgh Supercomputing Center/XSEDE

Nvidia Dominates Latest MLPerf Training Benchmark Results

July 29, 2020

MLPerf.org released its third round of training benchmark (v0.7) results today and Nvidia again dominated, claiming 16 new records. Meanwhile, Google provided e Read more…

By John Russell

$39 Billion Worldwide HPC Market Faces 3.7% COVID-related Drop in 2020

July 29, 2020

Global HPC market revenue reached $39 billion in 2019, growing a healthy 8.2 percent over 2018, according to the latest analysis from Intersect360 Research. A 3 Read more…

By Tiffany Trader

Agenting Change: PEARC20 Keynote Encourages Cultural Change to Make Tech Better, More Diverse

July 29, 2020

The tech world will need to become more diverse if it is to thrive and survive, said Cherri Pancake, director of the Northwest Alliance for Computational Resear Read more…

By Ken Chiacchia, Pittsburgh Supercomputing Center/XSEDE

In Big Win for COVID-19 Research, Neocortix Brings Arm Support to [email protected], [email protected]

July 28, 2020

Normally, Neocortix offers distributed cloud computing for its clients by way of PhonePaycheck, an app that pays users in exchange for the idle processing time Read more…

By Oliver Peckham

Supercomputer Modeling Tests How COVID-19 Spreads in Grocery Stores

April 8, 2020

In the COVID-19 era, many people are treating simple activities like getting gas or groceries with caution as they try to heed social distancing mandates and protect their own health. Still, significant uncertainty surrounds the relative risk of different activities, and conflicting information is prevalent. A team of Finnish researchers set out to address some of these uncertainties by... Read more…

By Oliver Peckham

Supercomputer-Powered Research Uncovers Signs of ‘Bradykinin Storm’ That May Explain COVID-19 Symptoms

July 28, 2020

Doctors and medical researchers have struggled to pinpoint – let alone explain – the deluge of symptoms induced by COVID-19 infections in patients, and what Read more…

By Oliver Peckham

Supercomputer Simulations Reveal the Fate of the Neanderthals

May 25, 2020

For hundreds of thousands of years, neanderthals roamed the planet, eventually (almost 50,000 years ago) giving way to homo sapiens, which quickly became the do Read more…

By Oliver Peckham

Intel’s 7nm Slip Raises Questions About Ponte Vecchio GPU, Aurora Supercomputer

July 30, 2020

During its second-quarter earnings call, Intel announced a one-year delay of its 7nm process technology, which it says it will create an approximate six-month shift for its CPU product timing relative to prior expectations. The primary issue is a defect mode in the 7nm process that resulted in yield degradation... Read more…

By Tiffany Trader

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

By Doug Black

Neocortex Will Be First-of-Its-Kind 800,000-Core AI Supercomputer

June 9, 2020

Pittsburgh Supercomputing Center (PSC - a joint research organization of Carnegie Mellon University and the University of Pittsburgh) has won a $5 million award Read more…

By Tiffany Trader

Honeywell’s Big Bet on Trapped Ion Quantum Computing

April 7, 2020

Honeywell doesn’t spring to mind when thinking of quantum computing pioneers, but a decade ago the high-tech conglomerate better known for its control systems waded deliberately into the then calmer quantum computing (QC) waters. Fast forward to March when Honeywell announced plans to introduce an ion trap-based quantum computer whose ‘performance’ would... Read more…

By John Russell

Nvidia’s Ampere A100 GPU: Up to 2.5X the HPC, 20X the AI

May 14, 2020

Nvidia's first Ampere-based graphics card, the A100 GPU, packs a whopping 54 billion transistors on 826mm2 of silicon, making it the world's largest seven-nanom Read more…

By Tiffany Trader

Leading Solution Providers

Contributors

Australian Researchers Break All-Time Internet Speed Record

May 26, 2020

If you’ve been stuck at home for the last few months, you’ve probably become more attuned to the quality (or lack thereof) of your internet connection. Even Read more…

By Oliver Peckham

15 Slides on Programming Aurora and Exascale Systems

May 7, 2020

Sometime in 2021, Aurora, the first planned U.S. exascale system, is scheduled to be fired up at Argonne National Laboratory. Cray (now HPE) and Intel are the k Read more…

By John Russell

‘Billion Molecules Against COVID-19’ Challenge to Launch with Massive Supercomputing Support

April 22, 2020

Around the world, supercomputing centers have spun up and opened their doors for COVID-19 research in what may be the most unified supercomputing effort in hist Read more…

By Oliver Peckham

Joliot-Curie Supercomputer Used to Build First Full, High-Fidelity Aircraft Engine Simulation

July 14, 2020

When industrial designers plan the design of a new element of a vehicle’s propulsion or exterior, they typically use fluid dynamics to optimize airflow and in Read more…

By Oliver Peckham

$100B Plan Submitted for Massive Remake and Expansion of NSF

May 27, 2020

Legislation to reshape, expand - and rename - the National Science Foundation has been submitted in both the U.S. House and Senate. The proposal, which seems to Read more…

By John Russell

John Martinis Reportedly Leaves Google Quantum Effort

April 21, 2020

John Martinis, who led Google’s quantum computing effort since establishing its quantum hardware group in 2014, has left Google after being moved into an advi Read more…

By John Russell

Google Cloud Debuts 16-GPU Ampere A100 Instances

July 7, 2020

On the heels of the Nvidia’s Ampere A100 GPU launch in May, Google Cloud is announcing alpha availability of the A100 “Accelerator Optimized” VM A2 instance family on Google Compute Engine. The instances are powered by the HGX A100 16-GPU platform, which combines two HGX A100 8-GPU baseboards using... Read more…

By Tiffany Trader

[email protected] Rallies a Legion of Computers Against the Coronavirus

March 24, 2020

Last week, we highlighted [email protected], a massive, crowdsourced computer network that has turned its resources against the coronavirus pandemic sweeping the globe – but [email protected] isn’t the only game in town. The internet is buzzing with crowdsourced computing... Read more…

By Oliver Peckham

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This