AWS ParallelCluster

By Mark Duffield

November 13, 2018

Orchestration software has played a key role in cluster bring-up and management for decades. Dating back to solutions like SunCluster, PSSP, and community solutions such as CFEngine, the need to launch many resources together to enable large parallel applications continues to be a vital part of the High Performance Computing (HPC) environment. AWS has many cloud native approaches to running your clustered workloads on AWS, but the need to recreate or replicate an environment similar or nearly identical to what you are currently running in your data center may be a necessary first step in moving workloads to AWS.

What if you could build a familiar cluster environment using AWS cloud native resources?

Today we announce AWS ParallelCluster, an AWS supported, open source cluster management tool that makes it easy for scientists, researchers, and IT administrators to deploy and manage High Performance Computing (HPC) clusters in the AWS cloud. With AWS ParallelCluster, many AWS cloud native products are used to launch a cluster environment that should be familiar to those running HPC workloads. For example, AWS CloudFormation, AWS Identity and Access Management (IAM), Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (Amazon SQS), Amazon Elastic Compute Cloud (Amazon EC2), Amazon EC2 Auto Scaling, Amazon Elastic Block Store (Amazon EBS), Amazon Simple Storage Service (Amazon S3), and Amazon DynamoDB.

AWS ParallelCluster is released via the Python Package Index (PyPI) and can be installed via pip. It is available at no additional cost, and you only pay for the AWS resources needed to run your applications. ParallelCluster leverages CloudFormation to build out your cluster environment. This is the same CloudFormation that you can use to launch just one instance, or a VPC, or an S3 bucket, but now you’re using it launch an entire HPC cluster environment.

Many of you will be familiar with CfnCluster. ParallelCluster used the code base that CfnCluster was built upon, and then we extended it to include new features, functionality, and (of course) bug improvements and fixes. If you are a previous user of CfnCluster, we encourage you to start using ParallelCluster when you can, and going forward create new clusters only using ParallelCluster. You can use your existing CfnCluster config files with ParallelCluster. (Although you can still use CfnCluster, it will no longer be developed.)

Some key features in the initial release of ParallelCluster that were not in CfnCluster are:

  • AWS Batch integration
  • Multiple EBS volumes
  • Better scaling performance – faster, with updates AutoScaling all at once
  • Support for “bring your own AMI” Custom AMI
  • Private cluster using proxy

And we’re not even close to done! We will continue to iterate ParallelCluster based on customer requests and feedback.

Getting Started

Grab a cup of caffeine, and let’s get to it!

You will need:

Decision time #1. You can use ParallelCluster anywhere you can access the internet, but you will need your AWS API keys, or you will need to set up an IAM Role and assign that to an instance to launch the necessary resources for your cluster. For this post, I’ll assume you are using either a Linux or MacOS operating system, you have admin access, and you have access to your API Keys. Please reach out to an AWS Solutions Architect if you have questions about using an IAM Role instead.

Before I install ParallelCluster, I’ll make sure I can access the console using the AWS CLI. To install the AWS CLI you can follow the steps Installing the AWS Command Line Interface, or to install in a Python virtual environment you can followInstall the AWS Command Line Interface in a Virtual Environment. I’ll be using a Python virtual environment for everything.

An optional first step for those wanting to use a Python virtual environment:

[duff]$ virtualenv ~/Envs/pcluster-virtenv
[duff]$ source ~/Envs/pcluster-virtenv/bin/activate
(pcluster-virtenv) [duff]$ 

Now let’s install the AWS CLI and verify functionality by creating a bucket:

(pcluster-virtenv) [duff]$ pip install --upgrade awscli
(pcluster-virtenv) [[email protected]]$ aws configure
AWS Access Key ID []: <aws_access_key>
AWS Secret Access Key []: <aws_secret_access_key>
Default region name []: us-east-1
Default output format []: json
(pcluster-virtenv) [duff]$ aws s3 mb s3://duff-parallelcluster
make_bucket: duff-parallelcluster 

I’ve installed, setup, and verified functionality of the AWS CLI. Let’s install ParallelCluster now.

Decision time #2. The VPC that ParallelCluster will use must have DNS Resolution = yes and DNS Hostnames = yes. It should also have DHCP options with the correct domain-name for the region, as defined in the docs: VPC DHCP Options. The subnet that will be used will need to have access to the internet, and there are several way to enable this. For this blog, I will use a Public subnet (a subnet that has an IGW attached and routes to the internet), but you can use a Private subnet as long as the subnet routes to the internet (e.g. through a NAT Gateway or a proxy server).

The VPC settings can be verified by going to the Console and looking at the configuration, you should see this:

Now I’ll install ParallelCluster using the virtual environment I setup:

(pcluster-virtenv) [duff]$  pip install aws-parallelcluster
... output snipped...
Successfully installed aws-parallelcluster-2.0.0rc1 ...

Before I can launch a cluster I’ll need to configure ParallelCluster. Note that I leave “AWS Access Key ID” and “AWS Secret Access Key ID” blank, as I already configured this with the AWS CLI setup. Also, because we really want to make this easy on you, we’ll display possible values from your account:

(pcluster-virtenv) [[email protected]]$ pcluster configure
Cluster Template [default]:
AWS Access Key ID []:  <blank>
AWS Secret Access Key ID []: <blank>
Acceptable Values for AWS Region ID:
    ap-south-1
    eu-west-3
    eu-west-2
    eu-west-1
    ap-northeast-2
    ap-northeast-1
    sa-east-1
    ca-central-1
    ap-southeast-1
    ap-southeast-2
    eu-central-1
    us-east-1
    us-east-2
    us-west-1
    us-west-2
AWS Region ID []: us-east-1
VPC Name [public]:
Acceptable Values for Key Name: <blank>
    duff_key_us-east-1
Key Name []: duff_key_us-east-1
Acceptable Values for VPC ID:
    vpc-12345678901234567
    vpc-abcdefghigjlmnopq
VPC ID []: vpc-abcdefghigjlmnopq
Acceptable Values for Master Subnet ID:
    subnet-abcdefghigjlmnop1
    subnet-abcdefghigjlmnop2
    subnet-abcdefghigjlmnop3
    subnet-abcdefghigjlmnop4
    subnet-abcdefghigjlmnop5
    subnet-abcdefghigjlmnop6
Master Subnet ID []: subnet-abcdefghigjlmnop1

Okay, let’s see what that did. It created the file ~/.parallelcluster/config, let’s cat that and have a look.

(pcluster-virtenv) [duff]$ cat ~/.parallelcluster/config
[aws]
aws_region_name = us-east-1

[cluster default]
vpc_settings = public
key_name = duff_key_us-east-1

[vpc public]
master_subnet_id = subnet-abcdefghigjlmnop1
vpc_id = vpc-abcdefghigjlmnopq

[global]
update_check = true
sanity_check = true
cluster_template = default

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

ParallelCluster uses the file ~/.parallelcluster/config by default for all configuration parameters. You can see an example configuration file site-packages/aws-parallelcluster/examples/config in the github repo. The config file has several sections (if you’re a Python programmer we’re using ConfigParser). Each section has a set of parameters that used to launch the cluster. If I’m not careful, and I accidentally put a config parameter in the wrong section, it will be silently ignored and I’ll be stuck wondering what happened. Refer to the ParallelCluster Configuration docs for more info. If the parameter is not specified in the config file, then the default value is used.

Currently, ParallelCluster supports three schedulers: sge, torque, and slurm. The default is sge, and that’s what I’ll be using.

For now, the only changes I will make in the config file is to add the SSH source location ssh_from in the VPC section, and change the compute_instance_type in the cluster section.

By default, we will allow SSH inbound from any source IP (0.0.0.0/0), and I want to restrict this to just my IP address. I recommend that you do something similar by adding your IP address or trusted CIDR block (e.g. 10.10.0.0/16). I updated my [vpc public] section:

[vpc public]
master_subnet_id = subnet-abcdefghigjlmnop1
vpc_id = vpc-abcdefghigjlmnopq
ssh_from = 11.22.33.44/32

And I will also update the [cluster default] section, and change the compute instance type to c4.large, rather than using the default instance t2.micro:

[cluster default]
vpc_settings = public
key_name = duff_key_us-east-1
compute_instance_type = c4.large

Now that we understand a bit about the config file and we know how to add configuration parameters, let’s launch our first cluster with the create command:

(pcluster-virtenv) [duff]$ pcluster create hello-cluster1

When we start the cluster create, we’ll see a status update as the resources are being brought up. And because I’m interested to see how long it takes to launch a cluster, I’ll be using time:

(pcluster-virtenv) [duff]$ time pcluster create hello-cluster1
Beginning cluster creation for cluster: hello-cluster1
Creating stack named: parallelcluster-hello-cluster1
Status: parallelcluster-hello-cluster1 - CREATE_IN_PROGRESS

When the cluster creation has completed, I have both the public and private IP addresses and the username for login. And because I used time, I see that it took 8 mins and 33 seconds to create the cluster:

MasterPublicIP: 35.153.251.20
ClusterUser: ec2-user
MasterPrivateIP: 172.31.0.14

real	8m33.425s
user	0m2.620s
sys	    0m0.353s

Let’s login with the built-in ssh alias we give you with ParallelCluster pcluster ssh <cluster_name>, and see what cluster resources are already avaiablbe.

(pcluster-virtenv) [[email protected]]$ pcluster list
hello-cluster1

(pcluster-virtenv) [[email protected]]$ pcluster ssh hello-cluster1
The authenticity of host '35.153.251.20 (35.153.251.20)' can't be established.
ECDSA key fingerprint is SHA256:u9+A0i6Y94JcRGYW8eyi5e4N+iiNtpPTPAwPY5PQcWk.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '35.153.251.20' (ECDSA) to the list of known hosts.
Last login: Sun Nov 11 20:12:12 2018

       __|  __|_  )
       _|  (     /   Amazon Linux AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2018.03-release-notes/

[[email protected] ~]$ qhost
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
ip-172-31-10-95         lx-amd64        2    1    1    2  0.02    3.7G  156.2M     0.0     0.0
ip-172-31-13-199        lx-amd64        2    1    1    2  0.02    3.7G  156.8M     0.0     0.0

From the output above, you can see that I already have a cluster of instances running. By default, we’re going to use t2.micro for the compute instance type, but I configured this cluster to use the c4.large, and because hyper-threading is on, we see two CPUs and one core for each instance.

Let’s submit a simple hostname job that will show the AutoScaling feature of ParallelCluster using the mpiruncommand.

[[email protected] ~]$ echo /usr/lib64/openmpi/bin/mpirun hostname | qsub -pe mpi 16
Your job 1 ("STDIN") has been submitted
[[email protected] ~]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
      1 0.00000 STDIN      ec2-user     qw    11/11/2018 20:25:38                                   16

Now I have a job requesting more instances than I have, which kicks off scaling action. When I have enough instances, in this case I’ll need 8 total instances, the job will run. A few minutes later, I have the resources and the job has already run to completion:

[[email protected] ~]$ qhost
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
ip-172-31-0-72          lx-amd64        2    1    1    2  0.11    3.7G  189.0M     0.0     0.0
ip-172-31-10-65         lx-amd64        2    1    1    2  0.29    3.7G  189.2M     0.0     0.0
ip-172-31-14-49         lx-amd64        2    1    1    2  0.11    3.7G  189.1M     0.0     0.0
ip-172-31-2-78          lx-amd64        2    1    1    2  0.06    3.7G  189.4M     0.0     0.0
ip-172-31-3-226         lx-amd64        2    1    1    2  0.11    3.7G  185.5M     0.0     0.0
ip-172-31-4-248         lx-amd64        2    1    1    2  0.11    3.7G  186.2M     0.0     0.0
ip-172-31-5-112         lx-amd64        2    1    1    2  0.08    3.7G  188.9M     0.0     0.0
ip-172-31-5-50          lx-amd64        2    1    1    2  0.08    3.7G  189.0M     0.0     0.0
[[email protected] ~]$ qstat

Now that the job has run and I have these instnaces just sitting there doing nothing, what happens now? If the instances have been running for more than 10 minutes, but are not running a job, we will terminate those instnaces for you. So after 10 minutes I look at qhost again:

[[email protected] ~]$ qhost
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
[[email protected] ~]$ qstat

The instances have been terminated, and I’m not being charged for idle instances. The scaling features are configurable.

Okay. I have launched what looks and acts like a traditional HPC environment using many AWS Cloud native resources, to include an AutoScaling cluster that will terminate instances when that are not being used. What about using an environment without the scheduler overhead?

Say hello to AWS Batch.

AWS Batch dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted.

With AWS Batch, there is no need to install and manage batch computing software or server clusters that you use to run your jobs, allowing you to focus on analyzing results and solving problems. AWS Batch plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2 and Spot Instances.

So now I’ll launch a Batch enviroment and let ParallelCluster do all of the work for me. When launching a AWS Batch enviroment, we’ll leverage even more AWS resources. For example, AWS CodeBuild, Amazon Elastic Container Registry (Amazon ECR), and NFS server will be brought up on the master instance.

I’ll start by editing my config file: ~/.parallelcluster/config, and add this section using some of the same parameters from the [cluster default] section.

[cluster awsbatch]
scheduler = awsbatch
key_name = duff_key_us-east-1
vpc_settings = public

Now that I have a separate cluster template defined, I can launch a separate master instance that will be both the NFS server for my Batch jobs, and will also be the submit host for my batch jobs. I’ll create a cluster now, specifying my awsbatch cluster.

(pcluster-virtenv) [[email protected]]$ pcluster create awsbatch --cluster-template awsbatch
Beginning cluster creation for cluster: awsbatch
Creating stack named: parallelcluster-awsbatch
Status: parallelcluster-awsbatch - CREATE_COMPLETE
MasterPublicIP: 54.158.75.19
ClusterUser: ec2-user
MasterPrivateIP: 172.31.15.217
ResourcesS3Bucket: parallelcluster-awsbatch-6wjsibr8elx9km0r

From the output above, you can see I’ve successfully created an AWS Batch submit host. I’ll log in and see what’s there:

(pcluster-virtenv) [[email protected]]$ pcluster ssh awsbatch
The authenticity of host '54.158.75.19 (54.158.75.19)' can't be established.
ECDSA key fingerprint is SHA256:/K8LQYyLliS0+Q7+BZtkhe6ChyM9Oz/RZz0aTCKJ3KQ.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '54.158.75.19' (ECDSA) to the list of known hosts.
Last login: Tue Nov 13 00:46:30 2018

       __|  __|_  )
       _|  (     /   Amazon Linux AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2018.03-release-notes/
[[email protected] ~]$ awsbhosts
ec2InstanceId        instanceType    privateIpAddress    publicIpAddress      runningJobs
-------------------  --------------  ------------------  -----------------  -------------
i-05af380e4950366d4  c4.xlarge       172.31.4.66         18.209.11.53                   0

I see that I have a c4.xlarge instance ready to run jobs. I’ll test with hello world.

[[email protected] ~]$ awsbsub echo hello world
Job 2387b7f5-14c7-41c1-bbf8-c5e50017580a (echo) has been submitted.

The job is submitted, and should go from RUNNABLE to STARTING to RUNNING, and then either SUCCEEDED or FAIL.

[[email protected] ~]$ awsbstat
jobId                                 jobName    status    startedAt    stoppedAt    exitCode
------------------------------------  ---------  --------  -----------  -----------  ----------
2387b7f5-14c7-41c1-bbf8-c5e50017580a  echo       RUNNABLE  -            -            -
[[email protected] ~]$ set -o vi
[[email protected] ~]$ awsbstat
jobId                                 jobName    status    startedAt    stoppedAt    exitCode
------------------------------------  ---------  --------  -----------  -----------  ----------
2387b7f5-14c7-41c1-bbf8-c5e50017580a  echo       STARTING  -            -            -

[[email protected] ~]$ awsbstat
jobId                                 jobName    status    startedAt            stoppedAt    exitCode
------------------------------------  ---------  --------  -------------------  -----------  ----------
2387b7f5-14c7-41c1-bbf8-c5e50017580a  echo       RUNNING   2018-11-13 00:52:31  -            -

Now I see that my job is running, and I can also check with the awsbout command:

[[email protected] ~]$ awsbout 2387b7f5-14c7-41c1-bbf8-c5e50017580a
2018-11-13 00:52:31: Starting Job 2387b7f5-14c7-41c1-bbf8-c5e50017580a
2018-11-13 00:52:31: hello world

After my job has completed, I can check the status with the awsbstat command:

[[email protected] ~]$ awsbstat -s SUCCEEDED
jobId                                 jobName    status     startedAt            stoppedAt              exitCode
------------------------------------  ---------  ---------  -------------------  -------------------  ----------
2387b7f5-14c7-41c1-bbf8-c5e50017580a  echo       SUCCEEDED  2018-11-13 00:52:31  2018-11-13 00:53:02           0

With AWS ParallelCluster you can leverage the benefits of the AWS Cloud, while maintaining a faimiliar, cluster environment. We’re excited about ParallelCluster and we look forward to hearing from you!

Cheers,
Mark

Return to Solution Channel Homepage
Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

LLNL Leverages Supercomputing to Identify COVID-19 Antibody Candidates

March 30, 2020

As COVID-19 sweeps the globe to devastating effect, supercomputers around the world are spinning up to fight back by working on diagnosis, epidemiology, treatment and vaccine development. Now, Lawrence Livermore National Read more…

By Staff report

Weather at Exascale: Load Balancing for Heterogeneous Systems

March 30, 2020

The first months of 2020 were dominated by weather and climate supercomputing news, with major announcements coming from the UK, the European Centre for Medium-Range Weather Forecasts and the U.S. National Oceanic and At Read more…

By Oliver Peckham

Q&A Part Two: ORNL’s Pooser on Progress in Quantum Communication

March 30, 2020

Quantum computing seems to get more than its fair share of attention compared to quantum communication. That’s despite the fact that quantum networking may be nearer to becoming a practical reality. In this second inst Read more…

By John Russell

SiFive Accelerates Chip Design with Cloud Tools

March 25, 2020

Chip designers are drawing on new cloud resources along with conventional electronic design automation (EDA) tools to accelerate IC templates from tape-out to custom silicon. Among the challengers to chip design leade Read more…

By George Leopold

What’s New in Computing vs. COVID-19: White House Initiative, Frontera, RIKEN & More

March 25, 2020

Supercomputing, big data and artificial intelligence are crucial tools in the fight against the coronavirus pandemic. Around the world, researchers, corporations and governments are urgently devoting their computing reso Read more…

By Oliver Peckham

AWS Solution Channel

Amazon FSx for Lustre Update: Persistent Storage for Long-Term, High-Performance Workloads

Last year I wrote about Amazon FSx for Lustre and told you how our customers can use it to create pebibyte-scale, highly parallel POSIX-compliant file systems that serve thousands of simultaneous clients driving millions of IOPS (Input/Output Operations per Second) with sub-millisecond latency. Read more…

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its scope and operation in a briefing led by Undersecretary of Ener Read more…

By John Russell

Weather at Exascale: Load Balancing for Heterogeneous Systems

March 30, 2020

The first months of 2020 were dominated by weather and climate supercomputing news, with major announcements coming from the UK, the European Centre for Medium- Read more…

By Oliver Peckham

Q&A Part Two: ORNL’s Pooser on Progress in Quantum Communication

March 30, 2020

Quantum computing seems to get more than its fair share of attention compared to quantum communication. That’s despite the fact that quantum networking may be Read more…

By John Russell

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its sco Read more…

By John Russell

[email protected] Rallies a Legion of Computers Against the Coronavirus

March 24, 2020

Last week, we highlighted [email protected], a massive, crowdsourced computer network that has turned its resources against the coronavirus pandemic sweeping the globe – but [email protected] isn’t the only game in town. The internet is buzzing with crowdsourced computing... Read more…

By Oliver Peckham

Conversation: ANL’s Rick Stevens on DoE’s AI for Science Project

March 23, 2020

With release of the Department of Energy’s AI for Science report in late February, the effort to build a national AI program, modeled loosely on the U.S. Exascale Initiative, enters a new phase. Project leaders have already had early discussions with Congress... Read more…

By John Russell

Servers Headed to Junkyard Find 2nd Life Fighting Cancer in Clusters

March 20, 2020

Ottawa-based charitable organization Cancer Computer is on a mission to stamp out cancer and other life-threatening diseases, including coronavirus, by putting Read more…

By Tiffany Trader

Kubernetes and HPC Applications in Hybrid Cloud Environments – Part II

March 19, 2020

With the rise of cloud services, CIOs are recognizing that applications, middleware, and infrastructure running in various compute environments need a common management and operating model. Maintaining different application and middleware stacks on-premises and in cloud environments, by possibly using different specialized infrastructure and application... Read more…

By Daniel Gruber,Burak Yenier and Wolfgang Gentzsch, UberCloud

Intel’s Neuromorphic Chip Scales Up (and It Smells)

March 18, 2020

Neuromorphic chips attempt to directly mimic the behavior of the human brain. Intel, which introduced its Loihi neuromorphic chip in 2017, has just announced that Loihi has been scaled up into a system that simulates over 100 million neurons. Furthermore, it announced that the chip smells. Read more…

By Oliver Peckham

[email protected] Turns Its Massive Crowdsourced Computer Network Against COVID-19

March 16, 2020

For gamers, fighting against a global crisis is usually pure fantasy – but now, it’s looking more like a reality. As supercomputers around the world spin up Read more…

By Oliver Peckham

Julia Programming’s Dramatic Rise in HPC and Elsewhere

January 14, 2020

Back in 2012 a paper by four computer scientists including Alan Edelman of MIT introduced Julia, A Fast Dynamic Language for Technical Computing. At the time, t Read more…

By John Russell

Global Supercomputing Is Mobilizing Against COVID-19

March 12, 2020

Tech has been taking some heavy losses from the coronavirus pandemic. Global supply chains have been disrupted, virtually every major tech conference taking place over the next few months has been canceled... Read more…

By Oliver Peckham

[email protected] Rallies a Legion of Computers Against the Coronavirus

March 24, 2020

Last week, we highlighted [email protected], a massive, crowdsourced computer network that has turned its resources against the coronavirus pandemic sweeping the globe – but [email protected] isn’t the only game in town. The internet is buzzing with crowdsourced computing... Read more…

By Oliver Peckham

Steve Scott Lays Out HPE-Cray Blended Product Roadmap

March 11, 2020

Last week, the day before the El Capitan processor disclosures were made at HPE's new headquarters in San Jose, Steve Scott (CTO for HPC & AI at HPE, and former Cray CTO) was on-hand at the Rice Oil & Gas HPC conference in Houston. He was there to discuss the HPE-Cray transition and blended roadmap, as well as his favorite topic, Cray's eighth-gen networking technology, Slingshot. Read more…

By Tiffany Trader

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its sco Read more…

By John Russell

Fujitsu A64FX Supercomputer to Be Deployed at Nagoya University This Summer

February 3, 2020

Japanese tech giant Fujitsu announced today that it will supply Nagoya University Information Technology Center with the first commercial supercomputer powered Read more…

By Tiffany Trader

Tech Conferences Are Being Canceled Due to Coronavirus

March 3, 2020

Several conferences scheduled to take place in the coming weeks, including Nvidia’s GPU Technology Conference (GTC) and the Strata Data + AI conference, have Read more…

By Alex Woodie

Leading Solution Providers

SC 2019 Virtual Booth Video Tour

AMD
AMD
ASROCK RACK
ASROCK RACK
AWS
AWS
CEJN
CJEN
CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
IBM
IBM
MELLANOX
MELLANOX
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
SIX NINES IT
SIX NINES IT
VERNE GLOBAL
VERNE GLOBAL
WEKAIO
WEKAIO

Cray to Provide NOAA with Two AMD-Powered Supercomputers

February 24, 2020

The United States’ National Oceanic and Atmospheric Administration (NOAA) last week announced plans for a major refresh of its operational weather forecasting supercomputers, part of a 10-year, $505.2 million program, which will secure two HPE-Cray systems for NOAA’s National Weather Service to be fielded later this year and put into production in early 2022. Read more…

By Tiffany Trader

Exascale Watch: El Capitan Will Use AMD CPUs & GPUs to Reach 2 Exaflops

March 4, 2020

HPE and its collaborators reported today that El Capitan, the forthcoming exascale supercomputer to be sited at Lawrence Livermore National Laboratory and serve Read more…

By John Russell

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

IBM Unveils Latest Achievements in AI Hardware

December 13, 2019

“The increased capabilities of contemporary AI models provide unprecedented recognition accuracy, but often at the expense of larger computational and energet Read more…

By Oliver Peckham

IBM Debuts IC922 Power Server for AI Inferencing and Data Management

January 28, 2020

IBM today launched a Power9-based inference server – the IC922 – that features up to six Nvidia T4 GPUs, PCIe Gen 4 and OpenCAPI connectivity, and can accom Read more…

By John Russell

TACC Supercomputers Run Simulations Illuminating COVID-19, DNA Replication

March 19, 2020

As supercomputers around the world spin up to combat the coronavirus, the Texas Advanced Computing Center (TACC) is announcing results that may help to illumina Read more…

By Staff report

University of Stuttgart Inaugurates ‘Hawk’ Supercomputer

February 20, 2020

This week, the new “Hawk” supercomputer was inaugurated in a ceremony at the High-Performance Computing Center of the University of Stuttgart (HLRS). Offici Read more…

By Staff report

Summit Joins the Fight Against the Coronavirus

March 6, 2020

With the coronavirus sweeping the globe, tech conferences and supply chains are being hit hard – but now, tech is hitting back. Oak Ridge National Laboratory Read more…

By Staff report

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This