AWS ParallelCluster

By Mark Duffield

November 13, 2018

Orchestration software has played a key role in cluster bring-up and management for decades. Dating back to solutions like SunCluster, PSSP, and community solutions such as CFEngine, the need to launch many resources together to enable large parallel applications continues to be a vital part of the High Performance Computing (HPC) environment. AWS has many cloud native approaches to running your clustered workloads on AWS, but the need to recreate or replicate an environment similar or nearly identical to what you are currently running in your data center may be a necessary first step in moving workloads to AWS.

What if you could build a familiar cluster environment using AWS cloud native resources?

Today we announce AWS ParallelCluster, an AWS supported, open source cluster management tool that makes it easy for scientists, researchers, and IT administrators to deploy and manage High Performance Computing (HPC) clusters in the AWS cloud. With AWS ParallelCluster, many AWS cloud native products are used to launch a cluster environment that should be familiar to those running HPC workloads. For example, AWS CloudFormation, AWS Identity and Access Management (IAM), Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (Amazon SQS), Amazon Elastic Compute Cloud (Amazon EC2), Amazon EC2 Auto Scaling, Amazon Elastic Block Store (Amazon EBS), Amazon Simple Storage Service (Amazon S3), and Amazon DynamoDB.

AWS ParallelCluster is released via the Python Package Index (PyPI) and can be installed via pip. It is available at no additional cost, and you only pay for the AWS resources needed to run your applications. ParallelCluster leverages CloudFormation to build out your cluster environment. This is the same CloudFormation that you can use to launch just one instance, or a VPC, or an S3 bucket, but now you’re using it launch an entire HPC cluster environment.

Many of you will be familiar with CfnCluster. ParallelCluster used the code base that CfnCluster was built upon, and then we extended it to include new features, functionality, and (of course) bug improvements and fixes. If you are a previous user of CfnCluster, we encourage you to start using ParallelCluster when you can, and going forward create new clusters only using ParallelCluster. You can use your existing CfnCluster config files with ParallelCluster. (Although you can still use CfnCluster, it will no longer be developed.)

Some key features in the initial release of ParallelCluster that were not in CfnCluster are:

  • AWS Batch integration
  • Multiple EBS volumes
  • Better scaling performance – faster, with updates AutoScaling all at once
  • Support for “bring your own AMI” Custom AMI
  • Private cluster using proxy

And we’re not even close to done! We will continue to iterate ParallelCluster based on customer requests and feedback.

Getting Started

Grab a cup of caffeine, and let’s get to it!

You will need:

Decision time #1. You can use ParallelCluster anywhere you can access the internet, but you will need your AWS API keys, or you will need to set up an IAM Role and assign that to an instance to launch the necessary resources for your cluster. For this post, I’ll assume you are using either a Linux or MacOS operating system, you have admin access, and you have access to your API Keys. Please reach out to an AWS Solutions Architect if you have questions about using an IAM Role instead.

Before I install ParallelCluster, I’ll make sure I can access the console using the AWS CLI. To install the AWS CLI you can follow the steps Installing the AWS Command Line Interface, or to install in a Python virtual environment you can followInstall the AWS Command Line Interface in a Virtual Environment. I’ll be using a Python virtual environment for everything.

An optional first step for those wanting to use a Python virtual environment:

[duff]$ virtualenv ~/Envs/pcluster-virtenv
[duff]$ source ~/Envs/pcluster-virtenv/bin/activate
(pcluster-virtenv) [duff]$ 

Now let’s install the AWS CLI and verify functionality by creating a bucket:

(pcluster-virtenv) [duff]$ pip install --upgrade awscli
(pcluster-virtenv) [[email protected]]$ aws configure
AWS Access Key ID []: <aws_access_key>
AWS Secret Access Key []: <aws_secret_access_key>
Default region name []: us-east-1
Default output format []: json
(pcluster-virtenv) [duff]$ aws s3 mb s3://duff-parallelcluster
make_bucket: duff-parallelcluster 

I’ve installed, setup, and verified functionality of the AWS CLI. Let’s install ParallelCluster now.

Decision time #2. The VPC that ParallelCluster will use must have DNS Resolution = yes and DNS Hostnames = yes. It should also have DHCP options with the correct domain-name for the region, as defined in the docs: VPC DHCP Options. The subnet that will be used will need to have access to the internet, and there are several way to enable this. For this blog, I will use a Public subnet (a subnet that has an IGW attached and routes to the internet), but you can use a Private subnet as long as the subnet routes to the internet (e.g. through a NAT Gateway or a proxy server).

The VPC settings can be verified by going to the Console and looking at the configuration, you should see this:

Now I’ll install ParallelCluster using the virtual environment I setup:

(pcluster-virtenv) [duff]$  pip install aws-parallelcluster
... output snipped...
Successfully installed aws-parallelcluster-2.0.0rc1 ...

Before I can launch a cluster I’ll need to configure ParallelCluster. Note that I leave “AWS Access Key ID” and “AWS Secret Access Key ID” blank, as I already configured this with the AWS CLI setup. Also, because we really want to make this easy on you, we’ll display possible values from your account:

(pcluster-virtenv) [[email protected]]$ pcluster configure
Cluster Template [default]:
AWS Access Key ID []:  <blank>
AWS Secret Access Key ID []: <blank>
Acceptable Values for AWS Region ID:
    ap-south-1
    eu-west-3
    eu-west-2
    eu-west-1
    ap-northeast-2
    ap-northeast-1
    sa-east-1
    ca-central-1
    ap-southeast-1
    ap-southeast-2
    eu-central-1
    us-east-1
    us-east-2
    us-west-1
    us-west-2
AWS Region ID []: us-east-1
VPC Name [public]:
Acceptable Values for Key Name: <blank>
    duff_key_us-east-1
Key Name []: duff_key_us-east-1
Acceptable Values for VPC ID:
    vpc-12345678901234567
    vpc-abcdefghigjlmnopq
VPC ID []: vpc-abcdefghigjlmnopq
Acceptable Values for Master Subnet ID:
    subnet-abcdefghigjlmnop1
    subnet-abcdefghigjlmnop2
    subnet-abcdefghigjlmnop3
    subnet-abcdefghigjlmnop4
    subnet-abcdefghigjlmnop5
    subnet-abcdefghigjlmnop6
Master Subnet ID []: subnet-abcdefghigjlmnop1

Okay, let’s see what that did. It created the file ~/.parallelcluster/config, let’s cat that and have a look.

(pcluster-virtenv) [duff]$ cat ~/.parallelcluster/config
[aws]
aws_region_name = us-east-1

[cluster default]
vpc_settings = public
key_name = duff_key_us-east-1

[vpc public]
master_subnet_id = subnet-abcdefghigjlmnop1
vpc_id = vpc-abcdefghigjlmnopq

[global]
update_check = true
sanity_check = true
cluster_template = default

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

ParallelCluster uses the file ~/.parallelcluster/config by default for all configuration parameters. You can see an example configuration file site-packages/aws-parallelcluster/examples/config in the github repo. The config file has several sections (if you’re a Python programmer we’re using ConfigParser). Each section has a set of parameters that used to launch the cluster. If I’m not careful, and I accidentally put a config parameter in the wrong section, it will be silently ignored and I’ll be stuck wondering what happened. Refer to the ParallelCluster Configuration docs for more info. If the parameter is not specified in the config file, then the default value is used.

Currently, ParallelCluster supports three schedulers: sge, torque, and slurm. The default is sge, and that’s what I’ll be using.

For now, the only changes I will make in the config file is to add the SSH source location ssh_from in the VPC section, and change the compute_instance_type in the cluster section.

By default, we will allow SSH inbound from any source IP (0.0.0.0/0), and I want to restrict this to just my IP address. I recommend that you do something similar by adding your IP address or trusted CIDR block (e.g. 10.10.0.0/16). I updated my [vpc public] section:

[vpc public]
master_subnet_id = subnet-abcdefghigjlmnop1
vpc_id = vpc-abcdefghigjlmnopq
ssh_from = 11.22.33.44/32

And I will also update the [cluster default] section, and change the compute instance type to c4.large, rather than using the default instance t2.micro:

[cluster default]
vpc_settings = public
key_name = duff_key_us-east-1
compute_instance_type = c4.large

Now that we understand a bit about the config file and we know how to add configuration parameters, let’s launch our first cluster with the create command:

(pcluster-virtenv) [duff]$ pcluster create hello-cluster1

When we start the cluster create, we’ll see a status update as the resources are being brought up. And because I’m interested to see how long it takes to launch a cluster, I’ll be using time:

(pcluster-virtenv) [duff]$ time pcluster create hello-cluster1
Beginning cluster creation for cluster: hello-cluster1
Creating stack named: parallelcluster-hello-cluster1
Status: parallelcluster-hello-cluster1 - CREATE_IN_PROGRESS

When the cluster creation has completed, I have both the public and private IP addresses and the username for login. And because I used time, I see that it took 8 mins and 33 seconds to create the cluster:

MasterPublicIP: 35.153.251.20
ClusterUser: ec2-user
MasterPrivateIP: 172.31.0.14

real	8m33.425s
user	0m2.620s
sys	    0m0.353s

Let’s login with the built-in ssh alias we give you with ParallelCluster pcluster ssh <cluster_name>, and see what cluster resources are already avaiablbe.

(pcluster-virtenv) [[email protected]]$ pcluster list
hello-cluster1

(pcluster-virtenv) [[email protected]]$ pcluster ssh hello-cluster1
The authenticity of host '35.153.251.20 (35.153.251.20)' can't be established.
ECDSA key fingerprint is SHA256:u9+A0i6Y94JcRGYW8eyi5e4N+iiNtpPTPAwPY5PQcWk.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '35.153.251.20' (ECDSA) to the list of known hosts.
Last login: Sun Nov 11 20:12:12 2018

       __|  __|_  )
       _|  (     /   Amazon Linux AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2018.03-release-notes/

[[email protected] ~]$ qhost
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
ip-172-31-10-95         lx-amd64        2    1    1    2  0.02    3.7G  156.2M     0.0     0.0
ip-172-31-13-199        lx-amd64        2    1    1    2  0.02    3.7G  156.8M     0.0     0.0

From the output above, you can see that I already have a cluster of instances running. By default, we’re going to use t2.micro for the compute instance type, but I configured this cluster to use the c4.large, and because hyper-threading is on, we see two CPUs and one core for each instance.

Let’s submit a simple hostname job that will show the AutoScaling feature of ParallelCluster using the mpiruncommand.

[[email protected] ~]$ echo /usr/lib64/openmpi/bin/mpirun hostname | qsub -pe mpi 16
Your job 1 ("STDIN") has been submitted
[[email protected] ~]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
      1 0.00000 STDIN      ec2-user     qw    11/11/2018 20:25:38                                   16

Now I have a job requesting more instances than I have, which kicks off scaling action. When I have enough instances, in this case I’ll need 8 total instances, the job will run. A few minutes later, I have the resources and the job has already run to completion:

[[email protected] ~]$ qhost
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
ip-172-31-0-72          lx-amd64        2    1    1    2  0.11    3.7G  189.0M     0.0     0.0
ip-172-31-10-65         lx-amd64        2    1    1    2  0.29    3.7G  189.2M     0.0     0.0
ip-172-31-14-49         lx-amd64        2    1    1    2  0.11    3.7G  189.1M     0.0     0.0
ip-172-31-2-78          lx-amd64        2    1    1    2  0.06    3.7G  189.4M     0.0     0.0
ip-172-31-3-226         lx-amd64        2    1    1    2  0.11    3.7G  185.5M     0.0     0.0
ip-172-31-4-248         lx-amd64        2    1    1    2  0.11    3.7G  186.2M     0.0     0.0
ip-172-31-5-112         lx-amd64        2    1    1    2  0.08    3.7G  188.9M     0.0     0.0
ip-172-31-5-50          lx-amd64        2    1    1    2  0.08    3.7G  189.0M     0.0     0.0
[[email protected] ~]$ qstat

Now that the job has run and I have these instnaces just sitting there doing nothing, what happens now? If the instances have been running for more than 10 minutes, but are not running a job, we will terminate those instnaces for you. So after 10 minutes I look at qhost again:

[[email protected] ~]$ qhost
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
[[email protected] ~]$ qstat

The instances have been terminated, and I’m not being charged for idle instances. The scaling features are configurable.

Okay. I have launched what looks and acts like a traditional HPC environment using many AWS Cloud native resources, to include an AutoScaling cluster that will terminate instances when that are not being used. What about using an environment without the scheduler overhead?

Say hello to AWS Batch.

AWS Batch dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted.

With AWS Batch, there is no need to install and manage batch computing software or server clusters that you use to run your jobs, allowing you to focus on analyzing results and solving problems. AWS Batch plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2 and Spot Instances.

So now I’ll launch a Batch enviroment and let ParallelCluster do all of the work for me. When launching a AWS Batch enviroment, we’ll leverage even more AWS resources. For example, AWS CodeBuild, Amazon Elastic Container Registry (Amazon ECR), and NFS server will be brought up on the master instance.

I’ll start by editing my config file: ~/.parallelcluster/config, and add this section using some of the same parameters from the [cluster default] section.

[cluster awsbatch]
scheduler = awsbatch
key_name = duff_key_us-east-1
vpc_settings = public

Now that I have a separate cluster template defined, I can launch a separate master instance that will be both the NFS server for my Batch jobs, and will also be the submit host for my batch jobs. I’ll create a cluster now, specifying my awsbatch cluster.

(pcluster-virtenv) [[email protected]]$ pcluster create awsbatch --cluster-template awsbatch
Beginning cluster creation for cluster: awsbatch
Creating stack named: parallelcluster-awsbatch
Status: parallelcluster-awsbatch - CREATE_COMPLETE
MasterPublicIP: 54.158.75.19
ClusterUser: ec2-user
MasterPrivateIP: 172.31.15.217
ResourcesS3Bucket: parallelcluster-awsbatch-6wjsibr8elx9km0r

From the output above, you can see I’ve successfully created an AWS Batch submit host. I’ll log in and see what’s there:

(pcluster-virtenv) [[email protected]]$ pcluster ssh awsbatch
The authenticity of host '54.158.75.19 (54.158.75.19)' can't be established.
ECDSA key fingerprint is SHA256:/K8LQYyLliS0+Q7+BZtkhe6ChyM9Oz/RZz0aTCKJ3KQ.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '54.158.75.19' (ECDSA) to the list of known hosts.
Last login: Tue Nov 13 00:46:30 2018

       __|  __|_  )
       _|  (     /   Amazon Linux AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2018.03-release-notes/
[[email protected] ~]$ awsbhosts
ec2InstanceId        instanceType    privateIpAddress    publicIpAddress      runningJobs
-------------------  --------------  ------------------  -----------------  -------------
i-05af380e4950366d4  c4.xlarge       172.31.4.66         18.209.11.53                   0

I see that I have a c4.xlarge instance ready to run jobs. I’ll test with hello world.

[[email protected] ~]$ awsbsub echo hello world
Job 2387b7f5-14c7-41c1-bbf8-c5e50017580a (echo) has been submitted.

The job is submitted, and should go from RUNNABLE to STARTING to RUNNING, and then either SUCCEEDED or FAIL.

[[email protected] ~]$ awsbstat
jobId                                 jobName    status    startedAt    stoppedAt    exitCode
------------------------------------  ---------  --------  -----------  -----------  ----------
2387b7f5-14c7-41c1-bbf8-c5e50017580a  echo       RUNNABLE  -            -            -
[[email protected] ~]$ set -o vi
[[email protected] ~]$ awsbstat
jobId                                 jobName    status    startedAt    stoppedAt    exitCode
------------------------------------  ---------  --------  -----------  -----------  ----------
2387b7f5-14c7-41c1-bbf8-c5e50017580a  echo       STARTING  -            -            -

[[email protected] ~]$ awsbstat
jobId                                 jobName    status    startedAt            stoppedAt    exitCode
------------------------------------  ---------  --------  -------------------  -----------  ----------
2387b7f5-14c7-41c1-bbf8-c5e50017580a  echo       RUNNING   2018-11-13 00:52:31  -            -

Now I see that my job is running, and I can also check with the awsbout command:

[[email protected] ~]$ awsbout 2387b7f5-14c7-41c1-bbf8-c5e50017580a
2018-11-13 00:52:31: Starting Job 2387b7f5-14c7-41c1-bbf8-c5e50017580a
2018-11-13 00:52:31: hello world

After my job has completed, I can check the status with the awsbstat command:

[[email protected] ~]$ awsbstat -s SUCCEEDED
jobId                                 jobName    status     startedAt            stoppedAt              exitCode
------------------------------------  ---------  ---------  -------------------  -------------------  ----------
2387b7f5-14c7-41c1-bbf8-c5e50017580a  echo       SUCCEEDED  2018-11-13 00:52:31  2018-11-13 00:53:02           0

With AWS ParallelCluster you can leverage the benefits of the AWS Cloud, while maintaining a faimiliar, cluster environment. We’re excited about ParallelCluster and we look forward to hearing from you!

Cheers,
Mark

Return to Solution Channel Homepage

AWS Resources

Follow @awscloud

AWS on Facebook

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Supercomputers Generate Universes to Illuminate Galactic Formation

August 20, 2019

With advanced imaging and satellite technologies, it’s easier than ever to see a galaxy – but understanding how they form (a process that can take billions of years) is a different story. Now, a team of researchers f Read more…

By Oliver Peckham

Singularity Moves Up the Container Value Chain

August 20, 2019

The enterprise version of the Singularity HPC container platform released this week by Sylabs is designed to allow users to create, secure and share the high-end containers in self-hosted production deployments. The e Read more…

By George Leopold

IBM Deepens Plunge into Open Source; OpenPOWER to Join Linux Foundation

August 20, 2019

IBM today announced it was contributing the instruction set (ISA) for its Power microprocessor and the designs for the Open Coherent Accelerator Processor Interface (OpenCAPI) and Open Memory Interface (OMI) to the Linux Read more…

By John Russell

AWS Solution Channel

Efficiency and Cost-Optimization for HPC Workloads – AWS Batch and Amazon EC2 Spot Instances

High Performance Computing on AWS leverages the power of cloud computing and the extreme scale it offers to achieve optimal HPC price/performance. With AWS you can right size your services to meet exactly the capacity requirements you need without having to overprovision or compromise capacity. Read more…

HPE Extreme Performance Solutions

Bring the combined power of HPC and AI to your business transformation

FPGA (Field Programmable Gate Array) acceleration cards are not new, as they’ve been commercially available since 1984. Typically, the emphasis around FPGAs has centered on the fact that they’re programmable accelerators, and that they can truly offer workload specific hardware acceleration solutions without requiring custom silicon. Read more…

IBM Accelerated Insights

Keys to Attracting the Newest HPC Talent – Post-Millennials

[Connect with HPC users and learn new skills in the IBM Spectrum LSF User Community.]

For engineers and scientists growing up in the 80s, the current state of HPC makes perfect sense. Read more…

Stampede2 ‘Shocks’ with New Shock Turbulence Insights

August 19, 2019

Shockwaves play roles in everything from high-speed aircraft to supernovae – and now, supercomputer-powered research from the Texas A&M University and the Texas Advanced Computing Center (TACC) is helping to shed l Read more…

By Oliver Peckham

IBM Deepens Plunge into Open Source; OpenPOWER to Join Linux Foundation

August 20, 2019

IBM today announced it was contributing the instruction set (ISA) for its Power microprocessor and the designs for the Open Coherent Accelerator Processor Inter Read more…

By John Russell

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

Scientists to Tap Exascale Computing to Unlock the Mystery of our Accelerating Universe

August 14, 2019

The universe and everything in it roared to life with the Big Bang approximately 13.8 billion years ago. It has continued expanding ever since. While we have a Read more…

By Rob Johnson

AI is the Next Exascale – Rick Stevens on What that Means and Why It’s Important

August 13, 2019

Twelve years ago the Department of Energy (DOE) was just beginning to explore what an exascale computing program might look like and what it might accomplish. Today, DOE is repeating that process for AI, once again starting with science community town halls to gather input and stimulate conversation. The town hall program... Read more…

By Tiffany Trader and John Russell

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

Lenovo Drives Single-Socket Servers with AMD Epyc Rome CPUs

August 7, 2019

No summer doldrums here. As part of the AMD Epyc Rome launch event in San Francisco today, Lenovo announced two new single-socket servers, the ThinkSystem SR635 Read more…

By Doug Black

Building Diversity and Broader Engagement in the HPC Community

August 7, 2019

Increasing diversity and inclusion in HPC is a community-building effort. Representation of both issues and individuals matters - the more people see HPC in a w Read more…

By AJ Lauer

High Performance (Potato) Chips

May 5, 2006

In this article, we focus on how Procter & Gamble is using high performance computing to create some common, everyday supermarket products. Tom Lange, a 27-year veteran of the company, tells us how P&G models products, processes and production systems for the betterment of consumer package goods. Read more…

By Michael Feldman

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

Cray, AMD to Extend DOE’s Exascale Frontier

May 7, 2019

Cray and AMD are coming back to Oak Ridge National Laboratory to partner on the world’s largest and most expensive supercomputer. The Department of Energy’s Read more…

By Tiffany Trader

Graphene Surprises Again, This Time for Quantum Computing

May 8, 2019

Graphene is fascinating stuff with promise for use in a seeming endless number of applications. This month researchers from the University of Vienna and Institu Read more…

By John Russell

AMD Verifies Its Largest 7nm Chip Design in Ten Hours

June 5, 2019

AMD announced last week that its engineers had successfully executed the first physical verification of its largest 7nm chip design – in just ten hours. The AMD Radeon Instinct Vega20 – which boasts 13.2 billion transistors – was tested using a TSMC-certified Calibre nmDRC software platform from Mentor. Read more…

By Oliver Peckham

TSMC and Samsung Moving to 5nm; Whither Moore’s Law?

June 12, 2019

With reports that Taiwan Semiconductor Manufacturing Co. (TMSC) and Samsung are moving quickly to 5nm manufacturing, it’s a good time to again ponder whither goes the venerable Moore’s law. Shrinking feature size has of course been the primary hallmark of achieving Moore’s law... Read more…

By John Russell

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

Deep Learning Competitors Stalk Nvidia

May 14, 2019

There is no shortage of processing architectures emerging to accelerate deep learning workloads, with two more options emerging this week to challenge GPU leader Nvidia. First, Intel researchers claimed a new deep learning record for image classification on the ResNet-50 convolutional neural network. Separately, Israeli AI chip startup Hailo.ai... Read more…

By George Leopold

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Nvidia Embraces Arm, Declares Intent to Accelerate All CPU Architectures

June 17, 2019

As the Top500 list was being announced at ISC in Frankfurt today with an upgraded petascale Arm supercomputer in the top third of the list, Nvidia announced its Read more…

By Tiffany Trader

Top500 Purely Petaflops; US Maintains Performance Lead

June 17, 2019

With the kick-off of the International Supercomputing Conference (ISC) in Frankfurt this morning, the 53rd Top500 list made its debut, and this one's for petafl Read more…

By Tiffany Trader

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

A Behind-the-Scenes Look at the Hardware That Powered the Black Hole Image

June 24, 2019

Two months ago, the first-ever image of a black hole took the internet by storm. A team of scientists took years to produce and verify the striking image – an Read more…

By Oliver Peckham

Cray – and the Cray Brand – to Be Positioned at Tip of HPE’s HPC Spear

May 22, 2019

More so than with most acquisitions of this kind, HPE’s purchase of Cray for $1.3 billion, announced last week, seems to have elements of that overused, often Read more…

By Doug Black and Tiffany Trader

Chinese Company Sugon Placed on US ‘Entity List’ After Strong Showing at International Supercomputing Conference

June 26, 2019

After more than a decade of advancing its supercomputing prowess, operating the world’s most powerful supercomputer from June 2013 to June 2018, China is keep Read more…

By Tiffany Trader

In Wake of Nvidia-Mellanox: Xilinx to Acquire Solarflare

April 25, 2019

With echoes of Nvidia’s recent acquisition of Mellanox, FPGA maker Xilinx has announced a definitive agreement to acquire Solarflare Communications, provider Read more…

By Doug Black

Qualcomm Invests in RISC-V Startup SiFive

June 7, 2019

Investors are zeroing in on the open standard RISC-V instruction set architecture and the processor intellectual property being developed by a batch of high-flying chip startups. Last fall, Esperanto Technologies announced a $58 million funding round. Read more…

By George Leopold

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This