One Group’s Answer to Transistors Behaving Badly

By Michael Feldman

May 11, 2010

Over the last 50 years, the semiconductor business has enjoyed what is perhaps the most thrilling ride of any industry ever conceived. Today semiconductors are a $250 billion business that account for nearly 10 percent of the world’s GDP. At the foundation of its success is Moore’s Law, the chipmaker’s mantra that promises better, faster and cheaper transistors every 18 to 24 months. But the laws of physics are conspiring to bring this ride to an end.

The problems are well known. CMOS-based transistors are increasingly harder to manufacture at nanometer scale. And even as technologies are perfected to do so, the materials themselves are becoming unsuitable for such small geometries. At 22 nm, Intel’s process node slated for 2011, gate oxide will be only 4 to 5 atoms thick and the gates themselves will be 42 atoms across. Manufacturing these devices in reasonable volumes and within reasonable power envelopes is going to be a challenge.

In fact, the analyst team at iSuppli has predicted that the expense of manufacturing sub-20nm devices would not be economically feasible. That is, the cost of the fabs could not be recouped by the volume of chips produced at those process nodes. Thus, they concluded, Moore’s Law would be repealed in about five years.

Most of the efforts to address the problem of shrinking transistor geometries have focused on making the devices behave more precisely, using technologies like X-ray lithography and hafnium insulators, to name just two. But what if instead of trying to make the transistors better, we purposefully try to make them worse.

Although it sounds counter-intuitive, developing processors that are naturally error-prone is exactly what one team of researchers from the University of Illinois and the University of California, San Diego has set out to do. Called stochastic processors, the idea is to under-design the hardware, such that it is allowed to behave non-deterministically under both stressful and nominal conditions. Error tolerance can be provided by either the hardware or the software.

The rationale is that by relaxing the design and manufacturing constraints, it will be much simpler and much cheaper to produce such processors in volume. And because voltage scaling and clock frequency restrictions are eased, significant power savings and performance increases can be realized.

The stochastic model would represent a significant departure from the way semiconductor devices are designed today. Even though processors have evolved significantly over the decades — scalar to superscalar, single-core to multicore, etc. — the basic assumption has always been that the hardware must behave flawlessly. “It’s the contract that the hardware provides to the software today,” says Rakesh Kumar, a computer scientist at the University of Illinois, Urbana-Champaign, who is part of the Stochastic Processor Research group there. The research is being funded by Intel, DARPA, the NSF, and the GigaScale Systems Research Center (GSRC), a consortium of academic, government and industry organizations devoted to next-generation hardware and software.

The idea behind stochastic processors is relatively simple: Build a chip that computes correctly, say, 99 percent of the time. Such a device is specifically designed to let errors occur under both worst-case and nominal conditions. The advantage of this model is that, compared to a 100 percent error-free processor, a stochastic implementation requires a lot less manufacturing precision and takes a lot less power to run.

Kumar’s stochastic research group has designed a Niagara processor (an open source processor design developed by Sun Microsystems) that allows for a 1 to 4 percent error rate. Based on circuit level simulation with CAD design tools, the researchers determined they could save between 25 to 40 percent on power compared to the default (deterministic) design. That might seem like a lot, but it points to how much of a traditional processor design is now being devoted to keeping the transistors from throwing off errors.

It also explains why multicore designs introduce another level of challenges for chipmakers. For example, if two of the cores on a quad-core processor can run (flawlessly) at 2.0 GHz, one can run at 1.5 GHz, and the last core can only run error-free at 1.0 GHz, the chip has to be binned at 1.0 GHz. That’s money down the drain as far as the chipmaker is concerned. Ideally, they would like to ship a 2.0 GHz product and use some sort of scheme to compensate for the variability in the other two cores. A stochastic design would make this possible.

Of course, compensating for that variability is the tricky part. Kumar says error tolerance can be accomplished in hardware or in software. Hardware correction would be the most obvious and, from the programmer’s perspective, the most palatable way to ensure correct program execution. But error tolerance in software provides more flexibility.

“Our vision is that all the errors that are produced get tolerated by the software,” says Kumar. Part of the group’s research involves how to write application software in such a way that takes into account a non-deterministic processor. Kumar believes this shift in thinking is inevitable. Because the hardware variability problem is going to keep getting worse as process geometries shrink, it will eventually make more sense for the programmer to code for non-determinism rather that write the software for the least common denominator hardware. On balance, Kumar believes the ideal would be to employ hardware correction only when it is too onerous to compensate for the errors in software.

HPC applications might be especially at home on stochastic processors since many of these codes are fundamentally optimization problems. In other words, they are noise tolerant to a great extent, relying on probability distributions rather than a single correct computation. Monte Carlo methods are just one example of a class of algorithms used in HPC that rely on optimization techniques, but almost any simulation or matrix math-based code has some level of optimization built in — think climate modeling, data mining, and object recognition apps. In these cases, says Kumar, “you’re not going after one answer, you’re going after a good answer.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

Nvidia Appoints Andy Grant as EMEA Director of Supercomputing, Higher Education, and AI

March 22, 2024

Nvidia recently appointed Andy Grant as Director, Supercomputing, Higher Education, and AI for Europe, the Middle East, and Africa (EMEA). With over 25 years of high-performance computing (HPC) experience, Grant brings a Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Houston We Have a Solution: Addressing the HPC and Tech Talent Gap

March 15, 2024

Generations of Houstonian teachers, counselors, and parents have either worked in the aerospace industry or know people who do - the prospect of entering the fi Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire