Pure Vectors/Massive SIMD’s providing High Bytes/Flops for efficient and balanced supercomputing for all applications

February 10, 2020

Sponsored Content by NEC

Supercomputing today has turned its vision into more reliable directions such as: 1. Power efficiency or eco-friendly 2. Affordable computing for all 3. Heterogeneous system architecture 4. Free or open-source software 5. Balanced systems or in other words achieving low TCO. In a single sentence “to achieve High Bytes/Flops” in addition to just flops only.

Achieving a balanced system has been the initial goal for the best designers of supercomputers from the past (“Anyone can build a fast CPU. The trick is to build a fast system.”: Seymour Cray). This is in contrast to the much recent past of just achieving fast TFLOPs and not being shy of becoming a power hungry system.

Various presentations from weather, oil and gas, DoD and DoE at conferences and workshops have revealed that for earth system prediction requires balanced computing or high bytes per flops. However, vendors have been focusing on the petaFLOPs while neglecting the other important parameters.

The reason for the new reliable direction is both for business and our environment is because current humans is a generation of scientists from an early age. Huge amounts of information or data through open source on internet is available for all to experiment, invent something new and betterment of the human race and our planet. For performing these experiments, supercomputers today in the modern form are available from clouds or data centers to accelerators cards in the desktops or laptops for all. However, it is the responsibility of the system designers to provide it at a cost, power, form factor and speed with accuracy, which does not affect the very reason it is used: to prevent or global warming.

For example, in a typical modern day, as we pull out our mobile phones every morning to check the weather today or price of oil, market trends etc., the huge demand from these compute intensive and data intensive applications to deliver fast and accurate results is the basic expectation of the user. In another news section, we also get to read the effect of use of fossil fuels for generating power, running supercomputers and moving cars or emissions is degrading the climate health of our planet or “Weather supercomputer used to predict climate change is one of Britain’s worst polluters”^[1].

The goal is of achieving best performance or accuracy, speed and efficiency with availability to all is the need of the hour. The challenge is taken up by two major architectures existing today, scalar and SIMT or GPUs. Both have a huge capability of becoming massively parallel or fast systems to overcome the mentioned challenges. Ironically, the software or algorithms required to run over these architecture do not follow the rules of parallelism strictly. A typical algorithm or software is a mix of sequential and parallel codes. This inhibits the architectures to show work on its full potential. But the scalar as well as SIMT or GPUs have exactly evolved for this situations by adopting SIMD or pipelined architecture or Vectors on their chips. It is the oldest and well proven form of achieving parallelism or fast system in an environment, where algorithms have a mix of sequential and parallel operations or codes. In other words, it all started with vectors and will continue to be vectors for achieving fast systems.

Vector processors or SIMD, have ruled the supercomputing or HPC domain since, the start of thoughts about parallelism or fast and accurate supercomputing. Every computing hardware architecture uses SIMD or vectors today from ARM, Intel, AMD, Nvidia, IBM and even RISC-V in their own way. However, none of them are pure SIMD or pure vectors. Processors hardware architectures today have to tradeoff between scalar (SISD) and vector (SIMD) operations for achieving the ultimate goal of a desired system with a capability to solve all applications with full compute and power efficiency. Lately everybody has realized one golden truth, one hardware architecture or a single chip cannot run all software applications or algorithms with same full efficiency and performance. We now see a big trend towards, huge investments on interconnects, power efficient processor architecture and heterogeneous system architectures like CPU+GPU. The major reason for the same is the amount of data required by software and algorithms is ranging from small to huge. For small and moderate data intensive and highly compute intensive applications CPU+GPU have been showing promising results. Extrapolating the same results for highly data intensive applications is not possible with CPU+GPU. No wonder the processor designer now are investing a lot on interconnects. But, looking at the history of supercomputing, the missing piece to solving this puzzle is a massive SIMD on a single chip or a pure vector processor or Vector Engine (VE) or SX-Aurora TSUBASA (one instruction controls operations for 256 elements) and delivering high bytes per flops. Pure vectors or massive SIMD will complete the trinity or CPU+GPU+VE and is required for solving supercomputing needs of today.

*Figure 1 Mapping software applications to processor architectures*

The graph in the figure above, explains various applications, which require massive data processing belongs to the top left area of moderate compute but massive data processing. The overall system achieved with this trinity, will cover the entire application space from traditional HPC or simulations to HPDA or AI/Machine Learning. Apparently, pure vector architectures requires large memory closer to the processing cores and was the major reason to phase out of the market almost two decades ago. The Vector Engine (VE) or SX-Aurora TSUBASA, a dedicated modular (PCIe based AIC form factor) World’s Highest Memory Bandwidth Vector processor is introduced by NEC to overcome road blocks earlier of old vector processors and bring back era of pure Vector Processors.

*Figure 2 Deep inside out of NEC SX-Aurora TSUBASA*

NEC VE has big multiple Vector processor cores (8 cores per card with 307 Gflops DP per core) along with a Scalar processor core on the same chip with High Bandwidth (HBM2 with 1.35 TB/s) and huge size on-chip memory (48GB) along with huge Low Level Cache of 16 MB (with 3 TB/s memory bandwidth). Hence, a fully equipped processor reduced memory latency or no bottlenecks leading to industry leading sustained or balanced supercomputing performance. For example, 64 VE nodes or cards in a 42U rack can deliver more than 157 TF and power under 30 KWatts. The application offload model of VE, allows the entire application to run on the card. Additionally, the models proposed in the figure below, allows application execution along with CPU and GPU to cater to the challenges.

*Figure 3 NEC VE OSS called VEOS allows different execution models for different application requirements [2]*

Additionally, NEC VE software supports Hybrid MPI and Direct IO (VE to VE and VE to IO) for supporting running applications over CPU and file transfer independent of the processing of application over VE. NEC supercomputers are famous from over 35 years for delivering the most efficient or delivering high bytes per flops. Continuing the same SX-Aurora TSUBASA or Vector Engine (VE) legacy is delivering the highest efficiency at the HPCG website results of November-2019^[3]. Delivering almost 6 percent fraction of peak performance. Showcasing the capability required to live up to the challenges of supercomputing today. Verticals like Oil and Gas, Weather and Statistical Machine Learning are the areas requiring heavy data processing. VE card or the pure vector processors are the missing piece in the trinity (CPU+GPU+VE) and achieving the balanced system with high bytes per Flops and low TCO. More than 12000 VE cards have been preferred by customers across the globe by more than 100 hundred customers within 2 years of the relaunch or resurrection of massive SIMD or vector processors called the NEC SX-Aurora TUSBASA or pure vectors, the missing piece in the “Trinity achieving the most efficient and balanced supercomputing systems for all applications and users”.

For more details and trying the system please visit our website ^[4] and join our forum today^[5]. Hence, concluding the same in order to achieve the challenges of achieving best performance or accuracy, speed and efficiency with availability to all users and applications. Following the current trend, where latest processors from CPU to GPU moving towards providing power efficient specific scalar solutions rather than providing all in one chip solution, which was tried and not succeeded. Adding Vector processing or VE to this class of processors or the trinity of CPU+GPU+VE along with fast interconnects and will meet and overcome the challenge. At NEC, we have already tested CPU+VE and our future step is towards the ultimate trinity of CPU+GPU+VE. Hence, VE or vector processor (massive SIMD) will deliver the missing high bytes per flops, which is the need of the hour.

References:

https://www.dailymail.co.uk/sciencetech/article-1209430/Weather-supercomputer-used-predict-climate-change-Britains-worst-polluters.html#ixzz0PPPxmQF3
https://www.hpc.nec/api/v1/forum/file/download?id=LbGhNY
Entry number 158, 149 and 49; https://www.hpcg-benchmark.org/custom/index.html?lid=155&slid=302
Please contact on: https://www.nec.com/en/global/solutions/hpc/sx/vector_engine.html
Join our forum at: https://www.hpc.nec/

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Point. The system includes Intel's research chip called Loihi 2, Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Research senior analyst Steve Conway, who closely tracks HPC, AI, Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, and this day of contemplation is meant to provide all of us Read more…

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

April 22, 2024

As we find ourselves on the brink of a technological revolution, the need for efficient and sustainable computing solutions has never been more critical. A computer system that can mimic the way humans process and s Read more…

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Poin Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Resear Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Google Addresses the Mysteries of Its Hypercomputer

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel’s Xeon General Manager Talks about Server Chips

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

Click Here for More Headlines

HPCwire is a registered trademark of Tabor Communications, Inc. Use of this site is governed by our Terms of Use and Privacy Policy.

Reproduction in whole or in part in any form or medium without express written permission of Tabor Communications, Inc. is prohibited.

Leading Solution Providers

Off The Wire

Industry Headlines

April 24, 2024

April 23, 2024

April 22, 2024

April 19, 2024

Subscribe to HPCwire's Weekly Update!

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

AI Saves the Planet this Earth Day

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

Empowering High-Performance Computing for Artificial Intelligence

Kathy Yelick on Post-Exascale Challenges

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

AI Saves the Planet this Earth Day

Kathy Yelick on Post-Exascale Challenges

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

MLCommons Launches New AI Safety Benchmark Initiative

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

Nvidia H100: Are 550,000 GPUs Enough for This Year?

Synopsys Eats Ansys: Does HPC Get Indigestion?

Intel’s Server and PC Chip Development Will Blur After 2025

Choosing the Right GPU for LLM Inference and Training

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

Google Addresses the Mysteries of Its Hypercomputer

How AMD May Get Across the CUDA Moat

Leading Solution Providers

Contributors

Tiffany Trader

Editorial Director

Douglas Eadline

Managing Editor

John Russell

Senior Editor

Kevin Jackson

Contributing Editor

Ali Azhar

Contributing Editor

Alex Woodie

Contributing Editor

Addison Snell

Contributing Editor

Drew Jolly

Assistant Editor

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

China Is All In on a RISC-V Future

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

Eyes on the Quantum Prize – D-Wave Says its Time is Now

GenAI Having Major Impact on Data Culture, Survey Says

The GenAI Datacenter Squeeze Is Here

Intel’s Xeon General Manager Talks about Server Chips

The Information Nexus of Advanced Computing and Data systems for a High Performance World

Share

Copy short link