Many efforts are underway to make RISC-V production ready for servers and supercomputing, though the architecture is still years away from viability. China and Europe detailed new high-performance chips, and the EU is building an experimental RISC-V cloud computing environment built on open-source software. Separately, researchers are testing new RISC-V chips, including Tenstorrent’s Grayskull.
RISC-V is an alternative to the x86 and ARM architectures, which dominate the server market. Although RISC-V is years away from being a practical choice for servers and high-performance computers, academic and research institutions are bridging the gap to make that a reality.
The momentum behind RISC-V is undeniable. RISC-V is tied to strategic national interests of EU, Russia, and China, which want to build sovereign chips around the architectture.
RISC-V helps countries build their own destiny in semiconductor technology. The ISA is free to license, has an open design, and isn’t ruled by national interests. The U.S. is weaponizing its chip and AI technologies to choke off China’s access to CPU and GPU technologies.
A host of Chinese organizations working together this year plan to release the open-source Xiangshan K100 CPU, which runs at 3GHz. It is a high-performance chip, and China claims performance advantages over some ARM server processors, but take this with a pinch of salt.
Chinese institutions started developing the XiangShan family of chips in 2020.
The K100 chip design is open source, meaning anyone can take up the design. China is a member of RISC-V, though members of the U.S. Congress want to investigate the country’s participation in RISC-V International, the standards-setting organization for the ISA.
Researchers from Europe and the U.S. also published a paper detailing a 432-core RISC-V chip called Occamy, which has HBM2e memory, a chiplet design, and is made on the 12-nm process.
Faster RISC-V chips are becoming available, but more work is needed in software and hardware to drive adoption of the architecture in high-performance computing, said Nick Brown, senior research fellow at the University of Edinburgh, in a paper.
“In recent years, we have seen closer integration between GPUs and CPUs in HPC by the provision of a unified memory space, with obvious benefits, and RISC-V provides the potential to push this a step further by unifying the ISA and programming model,” Brown said.
He pointed out that companies such as Esperanto, Sophon, and Tenstorrent released many server chips, and more progress is expected in 2024 and beyond.
EU-backed institutions are picking up the slack in software efforts related to RISC-V. The European Union is funding an effort called Vitamin-V, which aims to port the software necessary for RISC-V to cloud environments.
The researchers want to create an equivalent software toolchain to match ARM and x86 deployments in the cloud.
“Vitamin-V will deliver a complete build toolchain based on LLVM. Apart from more conventional, already supported HLLs (High-Level Languages), we will add support for GO, Python3, and Rust,” researchers said in a paper.
The cloud development will revolve around developing Kubernetes, Docker, and OpenStack. Researchers in the project are already developing OpenStack on a RISC-V server with a cluster of Sipeed’s Lichee PI 4A development boards, which include the TH1520 RISC-V CPU (4 Threads), 16GB of RAM, and 128GB of storage.
The developers are using a version of Debian Linux that already supports many of the project’s packages. It is important to note that RISC-V still isn’t a first-class citizen of Linux, with many applications and drivers still being developed and upstreamed.
However, researchers are having fundamental software issues.
“Updating the operating system packages and configurations on all nodes is also challenging due to the maturity of the software,” the researchers said.
The researchers explored using tools such as Devstack and Kolla, which “download specific versions of packages and dependencies, which turned into many compilation issues on RISC-V,” the researchers said.
The RISC-V standards committee is developing a standard server design as a blueprint for makers to create RISC-V servers for web serving, gaming, and databases.
In early August, RISC-V published the latest version of a server standard for hardware companies to build barebones servers based on the ISA.
“The RISC-V server platform is defined as the collection of SoC hardware, platform firmware, boot/runtime services, and security services,” says a PDF document defining the platform.
The platform has a central layer that includes modules for boot, firmware, and security to protect against break-ins from hackers. The server platform supports the CXL and PCIe 6.0 interfaces.
The central layer branches into the operating system and hypervisor layers, orchestrating the software and virtual machines. Another branch is the baseboard management controller, which manages provisioning, hardware and interfaces on the server.
The server design initiative resembles an effort by the Open Compute Project to build standard server designs for x86 and ARM architectures. Those designs are now being used by the top server makers in databases to scale AI and web workloads.
Separately, a study conducted by the Technical University of Munich investigated Tenstorrent’s Grayskull AI chip, which includes RISC-V processors and 120 Tensix cores. Researcher Moritz Thüning chose the Grayskull e150 AI developer kit –which is available from the company for $799 – and implemented and optimized specific operations used in attention mechanisms.
The Grayskull chip has a 10×12 grid of Tensix cores. Each core has five RISC-V cores, compute engines, a data movement engine, and 1 MB of SRAM. In total, Grayskull has 120MB of SRAM, which is more than 80MB SRAM in Nvidia’s H100 GPU. A network-on-chip has a torus topology for communications between cores.
SRAM allows faster access to data related to the attention mechanism, which allows the model to focus on relevant parts of the input data when producing each part of the output.
The study focused on fused implementation, which includes optimizing specific operations such as matrix multiplication, scaling, and Softmax. Softmax is a critical function that takes the magnitude of preferences related to object classification and turns them into probabilities.
The researcher observed a speedup of 17 times for the fused implementation compared to a CPU implementation with caching. Grayskull has more SRAM than GPUs and parallel processing capabilities for efficient processing.
Grayskull isn’t as fast in overall computational performance as H100, but it can be more cost-efficient in specific computations. Grayskull has 92 and 332 TFLOPs for 16- and 8-bit floats, respectively, compared to 1513 and 3026 TFLOPs for Nvidia’s PCIe version of H100.
But Thüning reminded us that H100 PCIe “is approximately 30x more expensive for the general public.”
“It would be interesting to port the implementation to newer generations (e.g., Tenstorrent Wormhole) and to scale it on multiple cards,” Thüning said.