Silicon Startup Raises ‘Prodigy’ for Hyperscale/AI Workloads

By Tiffany Trader

May 23, 2018

There’s another silicon startup coming onto the HPC/hyperscale scene with some intriguing and bold claims. Silicon Valley-based Tachyum Inc., which has been emerging from stealth over the last year and a half, is unveiling a processor codenamed “Prodigy,” said to combine features of both CPUs and GPUs in a way that offers a purported 10x performance-per-watt advantage over current technologies. The company is primarily focused on the hyperscale datacenter market, but has aspirations to support brainier applications, noting that “Prodigy will enable a super-computational system for real-time full capacity human brain neural network simulation by 2020.”

Tachyum says that its Prodigy universal processing architecture marries the programmability of CPUs with the power efficiency and performance features of the GPGPU.

“Rather than build separate infrastructures for AI, HPC and conventional compute, the Prodigy chip will deliver all within one unified simplified environment, so for example AI or HPC algorithms can run while a machine is otherwise idle or underutilized,” said Tachyum CEO and Cofounder Radoslav ‘Rado’ Danilak. “Instead of supercomputers with a price tag in the hundreds of millions, Tachyum will make it possible to empower hyperscale datacenters to produce more work in a radically more efficient and powerful format, at a lower cost.”

AI was a focus during the press activities that accompanied Tachyum’s participation at the GlobeSec conference in Bratislava, Slovakia, last week. Danilak indicated the technology is in the running for a prominent brain modeling project, but otherwise downplayed the AI use case when we interviewed him for this story, affirming the hyperscale datacenter as the company’s primary target. He said, “AI is just 3-5 percent of silicon today, and 95 percent is server, so our chip is shooting for that 95 percent of market.”

The CEO further clarified: “We don’t sell into enterprise market – that would not be fruitful. Our market is hyperscalers. Most of the [target] customers have their own application source code and we provide the full compiler toolchain from open source, like GCC and so on, porting Linux and baseline applications. So in our primary market, we provide tools so they can recompile and go, they don’t need to rewrite applications.”

Source: Tachyum

The thrust of Tachyum’s proposition is that hyperscale servers are only being utilized at 30-40 percent, and are not used in the night because they are off-peak. Prodigy chips can be software reconfigured to run AI at night, enabling “10x more AI for free,” said Danilak.

In a presentation at Flash Memory Summit last year, the CEO discussed the coming datacenter power wall, noting “a new computational mechanism is needed to overcome this plateau.” Further, “ARM A72 not an answer; Intel Atom has similar performance & power; FPGA, GPU, TPU apply only to limited applications versus CPU.”

The Prodigy platform has 64 cores with fully coherent memory, barrier, lock and standard synchronization, including transactional memory. Single-threaded performance will be higher than a conventional core, the CEO said. Each chip will have two 400 Gigabit Ethernet ports.

Power efficiencies are gained by moving out-of-order execution capability to software. “All the register rename, checkpointing, seeking, retiring, which is consuming majority of the power, is basically gone, replaced with simple hardware. All the smartness of out-of-order execution was put to compiler,” the CEO told us.

“We are kind of a hybrid,” he continued. “[The industry has] in-order-execution machines like low-power Arm, but they have not demonstrated good performance on single thread, then you have big machines like Intel Xeon which have very good performance per thread but they are very power hungry. We are able to get the performance of Xeon per thread but power comparable to low power Arm, so we attack and reduce that cost of scheduling by moving hardware to a very complicated piece of the software.”

Citing a paper by Google’s Urs Hölzle enumerating the failings of wimpy cores, Danilak asserted that Google and other hyperscalers passed on low-power Arm because of low-performance, single-thread performance. “So from day one we designed our platform to go to into the server,” Danilak said. “We built a machine which is fastest on single-threaded but also on parallel applications because if you don’t do that, Amdahl’s law will get you. You need to have the non-vectorized parts of the application be really fast too to get the good scaling.”

Danilak claims that that by enabling a 4x reduction in datacenter TCO through improved power efficiency and reduced footprint, hyperscalers like Google and Facebook could save billions of dollars by moving to Prodigy. In terms of performance, the CEO said that a 256,000 server configuration based on Prodigy chips would deliver 32 exaflops of Tensorflow performance. That’s 125 teraflops per Prodigy chip. As a point of reference, Google’s new TPU (v3) chip promises 90 teraflops of unspecified floating performance; Volta with NVlink offers 125 mixed-precision Tensor teraflops. The pitch for Prodigy is that it is applicable for a wider range of datacenter applications.

The Prodigy architecture is fully compliant with IEEE-standard double-precision, single- and half-precision and also 8-bit floating point. The programming model includes C, C++, Java, Fortran, and Ada. “We support full staging, memory system, precise exception, and full coherency system so that allows you to run existing applications and simplifies use and deployment of applications,” the CEO said.

Tachyum says it has found a way around the “slow wire” limitations that impede today’s semiconductor devices. It is working with a fab on a semi-custom COT-flow (customer-owned tooling) design, using 7-nm technology, and expects to have prototypes out next year with sampling to follow. Ahead of tape-out, Tachyum will provide early adopters and other partners with FPGA-based emulation systems.

The CEO acknowledged the non-recurring engineering costs are significant, but indicated that the chips will be priced below Xeons and will offer a performance-per-dollar advantage over today’s high-end CPUs and GPUs.

Danilak has an accomplished track record as a technologist and entrepreneur. He founded ultra-dense flash storage company Skyera and SandForce, supplier of SSD controllers. Skyera was acquired by Western Digital in 2014 and SandForce was sold to LSI in 2011 for $377 million (LSI’s SSD business was later acquired by Seagate in 2014). He was also part of the Wave Computing team that built the 10GHz processing element of deep learning DPU.

Tachyum’s technology has garnered an endorsement from Christos Kozyrakis, professor of electrical engineering and computer science at Stanford. “Despite efficiency gains from virtualization, cloud computing, and parallelism, there are still critical problems with datacenter resource utilization particularly at a size and scale of hundreds of thousands of servers. Tachyum’s breakthrough processor architecture will deliver unprecedented performance and productivity,” said Kozyrakis, who joined Tachyum as a corporate advisor in January.

Tachyum received venture funding earlier this year from European investment company IPM Growth and says it will do one more round at the end of this year to get the chip to production. In March, Tachyum moved its headquarters to a larger facility in San Jose, Calif., and announced it was looking to expand its team.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

The Case for an Edge-Driven Future for Supercomputing

September 24, 2021

“Exascale only becomes valuable when it’s creating and using data that we care about,” said Pete Beckman, co-director of the Northwestern-Argonne Institute of Science and Engineering (NAISE), at the most recent HPC Read more…

Three Universities Team for NSF-Funded ‘ACES’ Reconfigurable Supercomputer Prototype

September 23, 2021

As Moore’s law slows, HPC developers are increasingly looking for speed gains in specialized code and specialized hardware – but this specialization, in turn, can make testing and deploying code trickier than ever. Now, researchers from Texas A&M University, the University of Illinois at Urbana... Read more…

Qubit Stream: Monte Carlo Advance, Infosys Joins the Fray, D-Wave Meeting Plans, and More

September 23, 2021

It seems the stream of quantum computing reports never ceases. This week – IonQ and Goldman Sachs tackle Monte Carlo on quantum hardware, Cambridge Quantum pushes chemistry calculations forward, D-Wave prepares for its Read more…

Asetek Announces It Is Exiting HPC to Protect Future Profitability

September 22, 2021

Liquid cooling specialist Asetek, well-known in HPC circles for its direct-to-chip cooling technology that is inside some of the fastest supercomputers in the world, announced today that it is exiting the HPC space amid multiple supply chain issues related to the pandemic. Although pandemic supply chain... Read more…

TACC Supercomputer Delves Into Protein Interactions

September 22, 2021

Adenosine triphosphate (ATP) is a compound used to funnel energy from mitochondria to other parts of the cell, enabling energy-driven functions like muscle contractions. For ATP to flow, though, the interaction between the hexokinase-II (HKII) enzyme and the proteins found in a specific channel on the mitochondria’s outer membrane. Now, simulations conducted on supercomputers at the Texas Advanced Computing Center (TACC) have simulated... Read more…

AWS Solution Channel

Introducing AWS ParallelCluster 3

Running HPC workloads, like computational fluid dynamics (CFD), molecular dynamics, or weather forecasting typically involves a lot of moving parts. You need a hundreds or thousands of compute cores, a job scheduler for keeping them fed, a shared file system that’s tuned for throughput or IOPS (or both), loads of libraries, a fast network, and a head node to make sense of all this. Read more…

The Latest MLPerf Inference Results: Nvidia GPUs Hold Sway but Here Come CPUs and Intel

September 22, 2021

The latest round of MLPerf inference benchmark (v 1.1) results was released today and Nvidia again dominated, sweeping the top spots in the closed (apples-to-apples) datacenter and edge categories. Perhaps more interesti Read more…

The Case for an Edge-Driven Future for Supercomputing

September 24, 2021

“Exascale only becomes valuable when it’s creating and using data that we care about,” said Pete Beckman, co-director of the Northwestern-Argonne Institut Read more…

Three Universities Team for NSF-Funded ‘ACES’ Reconfigurable Supercomputer Prototype

September 23, 2021

As Moore’s law slows, HPC developers are increasingly looking for speed gains in specialized code and specialized hardware – but this specialization, in turn, can make testing and deploying code trickier than ever. Now, researchers from Texas A&M University, the University of Illinois at Urbana... Read more…

Qubit Stream: Monte Carlo Advance, Infosys Joins the Fray, D-Wave Meeting Plans, and More

September 23, 2021

It seems the stream of quantum computing reports never ceases. This week – IonQ and Goldman Sachs tackle Monte Carlo on quantum hardware, Cambridge Quantum pu Read more…

Asetek Announces It Is Exiting HPC to Protect Future Profitability

September 22, 2021

Liquid cooling specialist Asetek, well-known in HPC circles for its direct-to-chip cooling technology that is inside some of the fastest supercomputers in the world, announced today that it is exiting the HPC space amid multiple supply chain issues related to the pandemic. Although pandemic supply chain... Read more…

TACC Supercomputer Delves Into Protein Interactions

September 22, 2021

Adenosine triphosphate (ATP) is a compound used to funnel energy from mitochondria to other parts of the cell, enabling energy-driven functions like muscle contractions. For ATP to flow, though, the interaction between the hexokinase-II (HKII) enzyme and the proteins found in a specific channel on the mitochondria’s outer membrane. Now, simulations conducted on supercomputers at the Texas Advanced Computing Center (TACC) have simulated... Read more…

The Latest MLPerf Inference Results: Nvidia GPUs Hold Sway but Here Come CPUs and Intel

September 22, 2021

The latest round of MLPerf inference benchmark (v 1.1) results was released today and Nvidia again dominated, sweeping the top spots in the closed (apples-to-ap Read more…

Why HPC Storage Matters More Now Than Ever: Analyst Q&A

September 17, 2021

With soaring data volumes and insatiable computing driving nearly every facet of economic, social and scientific progress, data storage is seizing the spotlight. Hyperion Research analyst and noted storage expert Mark Nossokoff looks at key storage trends in the context of the evolving HPC (and AI) landscape... Read more…

GigaIO Gets $14.7M in Series B Funding to Expand Its Composable Fabric Technology to Customers

September 16, 2021

Just before the COVID-19 pandemic began in March 2020, GigaIO introduced its Universal Composable Fabric technology, which allows enterprises to bring together Read more…

Ahead of ‘Dojo,’ Tesla Reveals Its Massive Precursor Supercomputer

June 22, 2021

In spring 2019, Tesla made cryptic reference to a project called Dojo, a “super-powerful training computer” for video data processing. Then, in summer 2020, Tesla CEO Elon Musk tweeted: “Tesla is developing a [neural network] training computer called Dojo to process truly vast amounts of video data. It’s a beast! … A truly useful exaflop at de facto FP32.” Read more…

Enter Dojo: Tesla Reveals Design for Modular Supercomputer & D1 Chip

August 20, 2021

Two months ago, Tesla revealed a massive GPU cluster that it said was “roughly the number five supercomputer in the world,” and which was just a precursor to Tesla’s real supercomputing moonshot: the long-rumored, little-detailed Dojo system. “We’ve been scaling our neural network training compute dramatically over the last few years,” said Milan Kovac, Tesla’s director of autopilot engineering. Read more…

Esperanto, Silicon in Hand, Champions the Efficiency of Its 1,092-Core RISC-V Chip

August 27, 2021

Esperanto Technologies made waves last December when it announced ET-SoC-1, a new RISC-V-based chip aimed at machine learning that packed nearly 1,100 cores onto a package small enough to fit six times over on a single PCIe card. Now, Esperanto is back, silicon in-hand and taking aim... Read more…

CentOS Replacement Rocky Linux Is Now in GA and Under Independent Control

June 21, 2021

The Rocky Enterprise Software Foundation (RESF) is announcing the general availability of Rocky Linux, release 8.4, designed as a drop-in replacement for the soon-to-be discontinued CentOS. The GA release is launching six-and-a-half months after Red Hat deprecated its support for the widely popular, free CentOS server operating system. The Rocky Linux development effort... Read more…

Intel Completes LLVM Adoption; Will End Updates to Classic C/C++ Compilers in Future

August 10, 2021

Intel reported in a blog this week that its adoption of the open source LLVM architecture for Intel’s C/C++ compiler is complete. The transition is part of In Read more…

Hot Chips: Here Come the DPUs and IPUs from Arm, Nvidia and Intel

August 25, 2021

The emergence of data processing units (DPU) and infrastructure processing units (IPU) as potentially important pieces in cloud and datacenter architectures was Read more…

AMD-Xilinx Deal Gains UK, EU Approvals — China’s Decision Still Pending

July 1, 2021

AMD’s planned acquisition of FPGA maker Xilinx is now in the hands of Chinese regulators after needed antitrust approvals for the $35 billion deal were receiv Read more…

Google Launches TPU v4 AI Chips

May 20, 2021

Google CEO Sundar Pichai spoke for only one minute and 42 seconds about the company’s latest TPU v4 Tensor Processing Units during his keynote at the Google I Read more…

Leading Solution Providers

Contributors

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

HPE Wins $2B GreenLake HPC-as-a-Service Deal with NSA

September 1, 2021

In the heated, oft-contentious, government IT space, HPE has won a massive $2 billion contract to provide HPC and AI services to the United States’ National Security Agency (NSA). Following on the heels of the now-canceled $10 billion JEDI contract (reissued as JWCC) and a $10 billion... Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

Quantum Roundup: IBM, Rigetti, Phasecraft, Oxford QC, China, and More

July 13, 2021

IBM yesterday announced a proof for a quantum ML algorithm. A week ago, it unveiled a new topology for its quantum processors. Last Friday, the Technical Univer Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

Frontier to Meet 20MW Exascale Power Target Set by DARPA in 2008

July 14, 2021

After more than a decade of planning, the United States’ first exascale computer, Frontier, is set to arrive at Oak Ridge National Laboratory (ORNL) later this year. Crossing this “1,000x” horizon required overcoming four major challenges: power demand, reliability, extreme parallelism and data movement. Read more…

Intel Unveils New Node Names; Sapphire Rapids Is Now an ‘Intel 7’ CPU

July 27, 2021

What's a preeminent chip company to do when its process node technology lags the competition by (roughly) one generation, but outmoded naming conventions make it seem like it's two nodes behind? For Intel, the response was to change how it refers to its nodes with the aim of better reflecting its positioning within the leadership semiconductor manufacturing space. Intel revealed its new node nomenclature, and... Read more…

Top500: Fugaku Still on Top; Perlmutter Debuts at #5

June 28, 2021

The 57th Top500, revealed today from the ISC 2021 digital event, showcases many of the same systems as the previous edition, with Fugaku holding its significant lead and only one new entrant in the top 10 cohort: the Perlmutter system at the DOE Lawrence Berkeley National Laboratory enters the list at number five with 65.69 Linpack petaflops. Perlmutter is the largest... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire