The Cambrian explosion of about 500 million years ago is a popular metaphor for the tail end of Moore’s Law scaling and the emergence of a variety of processing accelerators used primarily to digest data torrents at the network edge. Autonomous vehicle sensors are among the examples, generating on average about 4 Tb of data per day per instrumented vehicle.
The promise and pitfalls of the chip industry’s version of the Cambrian explosion along with the need to find new ways to reboot and scale computing in the post-Moore’s Law era are creating requirements for new chip architectures that decouple processing from memory along with new interconnects that address broader computing applications.
In short, according to experts at this year’s IEEE Rebooting Computing conference, the end of Moore’s Law requires a move away from today’s custom CPUs, GPUs and deep learning accelerators geared toward specific applications and toward “domain-specific” processors.
The rub, said Paolo Faraboschi, vice president and research fellow at Hewlett Packard Enterprise, is this kind of “system balance at scale is hard.” Among the reasons are the astronomical costs of chip manufacturing where line widths for production silicon are down to 7 nanometers.
“We need to reboot computing at the system level,” said Faraboschi, who heads HPE’s Systems Research Lab.
CPU extensions along with GPUs and deep learning accelerators like Google TPUs are all the rage for specific applications, but justifying current fabrication costs has become the real challenge, Faraboschi added. For one thing, extreme ultraviolet lithography techniques used for leading-edge chips operate in a vacuum, thereby reducing chip yields.
One way out of this dilemma, Faraboschi argued, is to shift from component specialization and custom accelerators to “balanced” systems that can address larger data-driven domains like edge computing.
Like the rise, fall and resurgence of the Cambrian explosion, “The computing world has become heterogeneous [and] there is no turning back,” the HPE engineer said. “We have to do something else [because] it’s getting a lot more expensive” to manufacture advanced chips.
A prime example of the domain-specific approach advocated by Faraboschi and others are applications like autonomous vehicles and Internet of Things deployments that use CPU, GPU and TPU accelerators for local processing and reduction of data movement. “There is clearly demand at the edge [and] we’re moving toward some kind of edge processing and control” that flows back and forth to datacenters, Faraboschi told the IEEE conference on Thursday (Nov. 8).
Chip experts note that current accelerators represent the tail end of Moore’s Law scaling, with a shuffling of digital gates on silicon but not much more in terms of chip innovation. For hardware engineers, the future may lie in “unconventional accelerators” like analog neuromorphic processors, a frequent topic at this week’s computing conference. Domain-specific applications include much faster AI training and inference, proponents said, along with the ability to execute “matrix calculations” in a single step rather than thousands.
Hence, neuromorphic chips are seen as ideal for data analytics applications ranging from deep learning and streaming analytics to signal processing.
But Faraboschi and other advocates of rebooting computing note there’s room for other approaches such as optical and quantum computing. “No one size fits all,” he stressed.
Along with processing, optical technologies are also widely seen as a way for breaking the interconnect logjam, as are data fabrics that are the focus of several open networking protocol efforts.
The current “interconnect crisis” is driven by the current two-socket server-based computing infrastructure that is being overwhelmed by accelerators and other high-end processors that are often underutilized due to system latency. “The conventional infrastructure is stretched to the limits [and] memory is chained to the CPU,” said Faraboschi.
Among the interconnect options vying to become an industry standard are: OpenCAPI, which Faraboschi argued does not scale; CCIX, championed by Arm, Xilinx and other; and the HPE-backed Gen-Z open interconnect protocol.
Besides openness, HPE’s pitch for Gen-Z focuses on attributes such as “memory semantics,” a communication protocol that Faraboschi said “speaks the same language as memory” as a way of reducing latency. “There is an industry need to decouple processor and memory,” he added.
HPE is currently developing a Gen-Z-based chip set, included an optical bridge and switch, as part of the Energy Department’s Exascale Computing Project.
Regardless of which interconnect scheme wins, Faraboschi stressed, “The [chip] industry desperately needs one of them.”