Researchers in Japan Develop ‘Fugaku-LLM’

May 10, 2024

KAWASAKI, Japan, May 10, 2024 — A team of researchers in Japan released Fugaku-LLM, a large language model with enhanced Japanese language capability, using the RIKEN supercomputer Fugaku. The team is led by Professor Rio Yokota of Tokyo Institute of Technology, Associate Professor Keisuke Sakaguchi of Tohoku University, Koichi Shirahata of Fujitsu Limited, Team Leader Mohamed Wahib of RIKEN, Associate Professor Koji Nishiguchi of Nagoya University, Shota Sasaki of CyberAgent, Inc, and Noriyuki Kojima of Kotoba Technologies Inc.

Fugaku at RIKEN

To train large language models on Fugaku, the researchers developed distributed training methods, including porting the deep learning framework Megatron-DeepSpeed to Fugaku in order to optimize the performance of Transformers on Fugaku. They accelerated the dense matrix multiplication library for Transformers, and optimized communication performance for Fugaku by combining three types of parallelization techniques and accelerated the collective communication library on the Tofu interconnect D.

Fugaku-LLM has 13 billion parameters and is larger than the 7-billion-parameter models that have been developed widely in Japan. Fugaku-LLM has enhanced Japanese capabilities, with an average score of 5.5 on the Japanese MT-Bench, the highest performance among open models that are trained using original data produced in Japan. In particular, the benchmark performance for humanities and social sciences tasks reached a remarkably high score of 9.18.

Fugaku-LLM was trained on proprietary Japanese data collected by CyberAgent, along with English data, and other data. The source code of Fugaku-LLM is available on GitHub and the model is available on Hugging Face. Fugaku-LLM can be used for research and commercial purposes as long as users comply with the license.

In the future, as more researchers and engineers participate in improving the models and their applications, the efficiency of training will be improved, leading to next-generation innovative research and business applications, such as the linkage of scientific simulation and generative AI, and social simulation of virtual communities with thousands of AIs.

In recent years, the development of large language models (LLMs) has been active, especially in the United States. In particular, the rapid spread of ChatGPT, developed by OpenAI, has profoundly impacted research and development, economic systems, and national security. Countries other than the U.S. are also investing enormous human and computational resources to develop LLMs in their own countries.

Japan, too, needs to secure computational resources for AI research so as not to fall behind in this global race. There are high expectations for Fugaku, the flagship supercomputer system in Japan, and it is necessary to improve the computational environment for large-scale distributed training on Fugaku to meet these expectations.

Therefore, Tokyo Institute of Technology, Tohoku University, Fujitsu, RIKEN, Nagoya University, CyberAgent, and Kotoba Technologies have started a joint research project on the development of large language models.

Research Outcome

1. Significantly improved the computational performance of training large language models on the supercomputer Fugaku:
GPUs are the common choice of hardware for training large language models. However, there is a global shortage of GPUs due to the large investment from many countries to train LLMs. Under such circumstances, it is important to show that large language models can be trained using Fugaku, which uses CPUs instead of GPUs. The CPUs used in Fugaku are Japanese CPUs manufactured by Fujitsu, and play an important role in terms of revitalizing Japanese semiconductor technology.

By extracting the full potential of Fugaku, this study succeeded in increasing the computation speed of the matrix multiplication by a factor of 6, and the communication speed by a factor of 3. To maximize the distributed training performance on Fugaku, the deep learning framework Megatron-DeepSpeed was ported to Fugaku, and the dense matrix multiplication library was accelerated for Transformer.

For communication acceleration, the researchers optimized communication performance for Fugaku by combining three types of parallelization techniques and accelerated the collective communication on the Tofu interconnect D. The knowledge gained from these efforts can be utilized in the design of the next-generation computing infrastructure after Fugaku and will greatly enhance Japan’s future advantage in the field of AI.

2. An easy-to-use, open, and secure, large language model with 13 billion parameters:
In 2023, many large language models were developed by Japanese companies, but most of them have less than 7 billion parameters. Since the performance of large-scale language models generally improves as the number of parameters increases, the 13-billion-parameter model the research team developed is likely to be more powerful than other Japanese models. Although larger models have been developed outside of Japan, large language models also require large computational resources, making it difficult to use models with too many parameters. Fugaku-LLM is both high performance and well-balanced.

In addition, most models developed by Japanese companies employ continual learning (8), in which open models developed outside of Japan are continually trained on Japanese data. In contrast, Fugaku-LLM is trained from scratch using the team’s own data, so the entire learning process can be understood, which is superior in terms of transparency and safety.

Fugaku-LLM was trained on 380 billion tokens using 13,824 nodes of Fugaku, with about 60% of the training data being Japanese, combined with English, mathematics, and code. Compared to models that continually train on Japanese, Fugaku-LLM learned much of its information in Japanese. Fugaku-LLM is the best model among open models that are produced in Japan and trained with original data. In particular, it was confirmed that the model shows a high benchmark score of 9.18 in the humanities and social sciences tasks. It is expected that the model will be able to perform natural dialogue based on keigo (honorific speech) and other features of the Japanese language.

Future Development

The results from this research are being made public through GitHub and Hugging Face so that other researchers and engineers can use them to further develop large language models. Fugaku-LLM can be used for research and commercial purposes as long as users comply with the license. Fugaku-LLM will be also offered to users via the Fujitsu Research Portal from May 10th, 2024.

In the future, as more researchers and engineers participate in improving the models and their applications, the efficiency of training will be improved, leading to next-generation innovative research and business applications, such as the linkage of scientific simulation and generative AI, and social simulation of virtual communities with thousands of AIs.

This research was supported by the Fugaku policy-supporting proposal “Development of Distributed Parallel Training for Large Language Models Using Fugaku” (proposal number: hp230254).


Source: Fujitsu

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Core42 Building an 172 Million-core AI Supercomputer in Texas

May 20, 2024

UAE-based Core42 is building an AI supercomputer with 172 million cores which will become operational later this year. The system, Condor Galaxy 3, was announced earlier this year and will have 192 nodes with Cerebras Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's latest weapon in the AI battle with GPU maker Nvidia and clou Read more…

ISC 2024 Student Cluster Competition

May 16, 2024

The 2024 ISC 2024 competition welcomed 19 virtual (remote) and eight in-person teams. The in-person teams participated in the conference venue and, while the virtual teams competed using the Bridges-2 supercomputers at t Read more…

Grace Hopper Gets Busy with Science 

May 16, 2024

Nvidia’s new Grace Hopper Superchip (GH200) processor has landed in nine new worldwide systems. The GH200 is a recently announced chip from Nvidia that eliminates the PCI bus from the CPU/GPU communications pathway.  Read more…

Europe’s Race towards Quantum-HPC Integration and Quantum Advantage

May 16, 2024

What an interesting panel, Quantum Advantage — Where are We and What is Needed? While the panelists looked slightly weary — their’s was, after all, one of the last panels at ISC 2024 — the discussion was fascinat Read more…

The Future of AI in Science

May 15, 2024

AI is one of the most transformative and valuable scientific tools ever developed. By harnessing vast amounts of data and computational power, AI systems can uncover patterns, generate insights, and make predictions that Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's l Read more…

Europe’s Race towards Quantum-HPC Integration and Quantum Advantage

May 16, 2024

What an interesting panel, Quantum Advantage — Where are We and What is Needed? While the panelists looked slightly weary — their’s was, after all, one of Read more…

The Future of AI in Science

May 15, 2024

AI is one of the most transformative and valuable scientific tools ever developed. By harnessing vast amounts of data and computational power, AI systems can un Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

ISC 2024 Keynote: High-precision Computing Will Be a Foundation for AI Models

May 15, 2024

Some scientific computing applications cannot sacrifice accuracy and will always require high-precision computing. Therefore, conventional high-performance c Read more…

Shutterstock 493860193

Linux Foundation Announces the Launch of the High-Performance Software Foundation

May 14, 2024

The Linux Foundation, the nonprofit organization enabling mass innovation through open source, is excited to announce the launch of the High-Performance Softw Read more…

ISC 2024: Hyperion Research Predicts HPC Market Rebound after Flat 2023

May 13, 2024

First, the top line: the overall HPC market was flat in 2023 at roughly $37 billion, bogged down by supply chain issues and slowed acceptance of some larger sys Read more…

Top 500: Aurora Breaks into Exascale, but Can’t Get to the Frontier of HPC

May 13, 2024

The 63rd installment of the TOP500 list is available today in coordination with the kickoff of ISC 2024 in Hamburg, Germany. Once again, the Frontier system at Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

Leading Solution Providers

Contributors

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel Plans Falcon Shores 2 GPU Supercomputing Chip for 2026  

August 8, 2023

Intel is planning to onboard a new version of the Falcon Shores chip in 2026, which is code-named Falcon Shores 2. The new product was announced by CEO Pat Gel Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos of wh Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

How the Chip Industry is Helping a Battery Company

May 8, 2024

Chip companies, once seen as engineering pure plays, are now at the center of geopolitical intrigue. Chip manufacturing firms, especially TSMC and Intel, have b Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire