Simulating 44-Qubit quantum circuits using AWS ParallelCluster

By Amazon Web Services

September 20, 2022

Dr. Fabio Baruffa, Sr. HPC & QC Solutions Architect
Dr. Pavel Lougovski, Pr. QC Research Scientist
Tyson Jones, Doctoral researcher, University of Oxford

Introduction

Currently, an enormous effort is underway to develop quantum computing hardware capable of scaling to hundreds, thousands, and even millions of physical (non-error-corrected) qubits. Ultimately, this is to build fault-tolerant quantum computers. Classically simulating the behavior of systems with a large number of qubits is a key to understanding the behavior of physical quantum systems under varying noise conditions as they scale.

Simulations are also invaluable to understand the noise resilience of quantum algorithms. Because the noise characteristics of today’s hardware prototypes often defy analytic treatment, they are instead investigated through small-scale experiments and intensive numerical modelling. Even performance evaluations of perfect noise-free quantum algorithms typically require some form of classical emulation.

Unsurprisingly, such emulation tasks are computationally demanding and memory intensive, so the researchers must use high performance computing (HPC) strategies like data and algorithm distribution when modelling even modestly-sized present-day quantum experiments. HPC simulators of quantum computers are therefore an indispensable tool in the advancement of experimental and algorithmic research.

In this blog post, we describe how to perform large-scale quantum circuits simulations using AWS ParallelCluster with QuEST, the Quantum Exact Simulation Toolkit. We demonstrate a simple and rapid deployment of computational resources up to 4,096 compute instances to simulate random quantum circuits with up to 44 qubits.

Prerequisites

Quantum computing has the potential to accelerate current computation capabilities using the principles of quantum physics, and possibly solve specific complex problems that are difficult to address with conventional computers. This is a major area of research field, where new hardware and software needs to be developed. Currently, a crucial role is played by classical simulations of quantum computers for demonstrating and proofing new ideas and experimenting before a production environment is developed.

Classic simulations

Quantum computers can be classically simulated using a variety of algorithmic paradigms, each with their own costs and performance trade-offs. The choice of the simulation algorithm is often determined by the nature of the questions asked about the emulated quantum device, such as the probability of a particular error occurring, or the expected value of an observable. We will introduce two ubiquitous paradigms: state-vector (SV) and tensor-network (TN) simulation.

SV simulators, also known as “full-state”, “brute-force” and “Schrödinger-style” simulators, maintain a complete numerical description of the evolving quantum state of a quantum circuit. As such, they require memory that scales exponentially with the number of qubits in the circuit, but their runtime scales linearly with the quantum circuit depth. Since their complete quantum state output permits the precise and efficient a posteriori calculation of any property, they are the conventional first choice of simulator for much of quantum computing research.

In contrast, TN simulators have constant growing memory requirements as the number of qubits increases. TN simulators are exponentially slowed by deepening circuits and increasing state complexity. This makes them cheaper and faster in the study of shallow circuits with a suitable structure, and the simulation can potentially scale to many qubits.

The performance bottleneck of SV simulators is the propagation of a quantum state, while for TN simulators, it is the propagation of a particular observable. QuEST is a SV simulator and in this blog post, we will employ it for the study of circuits for which SV simulation is particularly well suited

In a State-vector (SV) simulation, an N-qubit register is represented by a state-vector of 2complex amplitudes and can be numerically instantiated as an array of 2×2N real floating-point numbers. SV simulation of N=40 qubits at double precision would therefore require 16,384 GiB, well beyond the capacity of a typical HPC compute node. This makes the use of distributed memory systems essential. To date, large-scale SV simulations were performed exclusively on purpose-built supercomputers and required a long lead time just to allocate the resources.

AWS resources

If you are interested in simulating small to moderately-sized quantum circuits, Amazon Braket offers the choice of several simulators. These include the local simulator that is included in the Braket SDK and three on-demand simulators. The local simulator can run on a laptop or within an Braket managed notebook and supports simulation of quantum circuits with and without noise

The on-demand simulators are SV1, a general-purpose state vector simulator; DM1, a density matrix simulator that supports noise modeling; and TN1, a tensor network simulator that specializes in certain larger scale structured quantum circuits. SV1 is suitable for circuits up to 34 qubits, and DM1 supports the simulation of circuits up to 17 qubits. While TN1 can simulate up to 50 qubits, it can be used only for suitably structured quantum circuits. This blog complements the Braket simulators by exploring the scalability of larger SV simulation circuits with up to 44 qubits using the QuEST simulator on Amazon Elastic Compute Cloud (Amazon EC2).

Amazon EC2 provides a wide selection of instance types optimized to fit different use cases. Amazon EC2 compute-optimized instances are ideal for compute bound workloads and intensive numerical modeling. For example, 256 c5.18xlarge (144 GiB of memory) instances would together contain sufficient memory to store the distributed state-vector for a 40-qubit circuit, including the doubled memory costs of storing the necessary auxiliary buffers for MPI communication. Of course, simulating just an additional qubit will double the total memory requirement. Simulation of an N=44 qubit register requires 562,950 GiB (~0.5 PiB) of memory or 4,096 c5.18xlarge instances.

To orchestrate your compute resources, AWS developed an open-source cluster management tool, AWS ParallelCluster, which simplifies deploying and managing HPC clusters on AWS. AWS ParallelCluster enables the rapid deployment of virtual clusters with varying architectures to meet the requirements of different applications and workflows. You can also run your computation immediately when needed without waiting in a queue for a shared compute resource. As a result, many scientists and companies worldwide are looking to use cloud computing to find solutions to their problems in an efficient and cost-effective manner.

The remainder of this blog post demonstrates an HPC deployment of QuEST with AWS ParallelCluster to simulate random circuits. Random circuits appear both in the verification of real quantum computers and in the performance benchmarking of quantum computing simulations.

Circuit Details

We use QuEST to simulate a generic quantum circuit in a distributed memory system. We sample the probability distribution over N-bit strings produced by N-qubit circuits using one- and two-qubit gates and multi-qubit controlled gates. We implemented a set of random N-qubit quantum circuit using the following algorithm:

  • Set the total number of qubits N and gates Gn in a circuit
  • Looping for each gate in Gn:
    • toss an unbiased coin
    • if the outcome of the coin toss is heads:
      • choose two indices (q1, q2) randomly, each from 1 to N
      • apply two-qubit CZ gate between qubits q1 and q2
    • if the outcome of the coin toss is tails:
      • choose an index q1 randomly from 1 to N
      • choose a single qubit gate G from {RX, RY, RZ, H} uniformly at random
      • if G is H:
        • apply H to qubit q1
      • if G is RX, RY, or RZ
        • choose a random number θ between 0 and pi (3.1415…)
        • apply the corresponding rotation by the angle θ to the qubit q1

The single qubit gates RX, RY, RZ are the rotation gate along the respectively axis and the H is the Hadamard gate. The two-qubit gate CZ is the controlled phase flip.

The randomness of the circuits prevents particular symmetries being explored to optimize the classical simulation.

We run circuit simulations in QuEST by iterating over the number of qubits, starting from N=40 to N=44 and using the following number of gates Gn​=(100, 200, 400, 600, 800, 1000) for each value of N. We always initialize the quantum state of the circuit to ∣0⟩N and compute 2N complex amplitudes of the final state after the random circuit is applied to the initial state. Because SV simulations are implemented as a sequence of Gn​ matrix-vector multiplications, we estimate the total number of floating-point operations (FLOP) complexity of simulating a single complex amplitude in the final state vector by recording the number of elementary multiplication and addition operations and dividing them by the total number of amplitudes (2N).

Circuit Complexity

The computational complexity of simulating a random N-qubit circuit using an SV simulator, such as QuEST, grows exponentially with N but scales linearly with the number of single- and two-bit gates Gn​. In other words, the computational cost does not discriminate between different circuit structures.

Other simulation approaches, such as tensor network (TN) simulations, are much more sensitive to random circuit structure. TN simulators do not compute an entire N-qubit state vector but rather can find an optimal contraction path for estimating a single amplitude in the state vector. Many amplitudes in a state vector generated by a random circuit can be 0 and do not need to be evaluated explicitly and TN simulators can help identify amplitudes for which this holds.

However, random circuits with circuit depth greater than 400 gates incur a large computational cost per amplitude that grows polynomially with the circuit depth. These circuits are better suited for SV simulations where simulation cost grows linearly with the depth.

Resources deployment

We demonstrate large-scale simulations of quantum circuits using QuEST, an open-source quantum state vector simulator. QuEST can run multithread and distributed calculations using MPI/OpenMP to accelerate simulations on HPC systems. The HPC infrastructure is deployed using AWS ParallelCluster. The following diagram shows the HPC architecture.

Figure 1: HPC Architecture

The Head Node is used to log in to the cluster, compile the application, submit the job, and set up Compute Nodes, which are dynamically provisioned according to the size of the problem (number of qubits).

We use the EC2 c5.18xlarge compute-optimized instances with Intel Xeon Scalable Processors with a sustained all core Turbo frequency of 3.4GHz. The instances are equipped with 36 cores and 144 GiB of memory per node, which gives the best compromise between resources required for the circuit and performance. The memory-per-core ratio is 4 GiB, which allows for an efficient usage of 2 MPI tasks per instance. The following table shows the required resources for simulations with 36 to 44 qubits.

Number of qubitsMemory Required (GiB)Number of instancesTotal available memory EC2 (GiB)Total number of cores

Number of qubitsMemory Required (GiB)Number of instancesTotal available memory EC2 (GiB)Total number of cores

36 2,199 16 2,304 576
37 4,398 32 4,608 1152
38 8,796 64 9,216 2,304
39 17,592 128 18,432 4,608
40 35,184 256 36,864 9,216
41 70,369 512 73,728 18,432
42 140,737 1,024 147,456 36,864
43 281,475 2,048 294,912 73,728
44 562,950 4,096 589,824 147,456

Table 1: Resources required by the state vector simulator to simulate a circuit with the given number of qubits.

We compiled QuEST version 3.5.0 from source code with the Intel OneAPI HPC toolkit, version 2022.2, to take advantage of the performance optimization provided by the AVX512 and AVX2 vector instructions available on C5 instances. We used Amazon Linux 2 for the operating system, and the Intel OneAPI MPI 2022.2 for the network library.

Performance results

We explore the scalability of the simulation with respect to the number of instances. Adding one additional qubit doubles the memory requirements and the number of instances required by the state vector simulator. In all experiments, we use 2 MPI tasks per instance with 18 OMP threads, and we disable hyperthreading…

Read the full blog to learn more. Reminder: You can learn a lot from AWS HPC engineers by subscribing to the HPC Tech Short YouTube channel, and following the AWS HPC Blog channel.

 

Return to Solution Channel Homepage
Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Point. The system includes Intel's research chip called Loihi 2, Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Research senior analyst Steve Conway, who closely tracks HPC, AI, Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, and this day of contemplation is meant to provide all of us Read more…

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

April 22, 2024

As we find ourselves on the brink of a technological revolution, the need for efficient and sustainable computing solutions has never been more critical.  A computer system that can mimic the way humans process and s Read more…

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Poin Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Resear Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel’s Xeon General Manager Talks about Server Chips 

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire