May 03, 2012
There's an interesting use case for GPU computing written up over at EE Times this week. Uri Tal, founder of Rocketick, an Israeli company that provides GPU-accelerated software for electronic design automation (EDA), authored the article and went into some detail about why GPUs are such great computational engines for doing chip simulations.
First, though, he described why EDA applications are not so great when running on CPUs. Most of it centers around the lack of data locality these applications exhibit. CPUs rely heavily on large caches to avoid the considerable latency costs of accessing main memory. But since EDA data sets are too large to fit into cache and the application's access pattern is somewhat random, memory bandwidth becomes a bottleneck. And since chip designs are getting larger and more complex, a cache-based architecture probably won't be able to catch up.
Not so for GPUs, which are built with data parallelism in mind and are hooked to graphics memory (GDDR5) that provide higher bandwidth than CPU grade memory. Writes Tal:
GPU’s are perfectly suited for data-parallel algorithms with huge datasets. In the most recently developed GPUs there are more than a thousand processing cores, organized in SIMD groups. All that is required is that you launch several million short-lived independent threads that need not communicate with each other. The memory latency can be perfectly hidden by switching between “waiting” threads to “ready” threads very efficiently. Instead of optimizing for the latency of the single thread, optimization is for throughput – the number of threads that can be processed in specific time duration.
But getting the EDA to take advantage GPUs is not a slam dunk. It rests on being able to parallelize the application such than dependencies between all the threads are minimized. Tal said they had to redesign both the EDA software structure and the underlying algorithms to make that happen.
According to him, the redesign paid off, resulting in chip simulations that ran 10 to 30 times faster. Better yet, the Rocketick software can run on multiple GPUs and will automatically deliver more performance as newer, bigger, and quicker GPUs are rolled out.
Although not mentioned in Tal's writeup, it's worth mentioning that NVIDIA uses GPU-accelerated tools to design and verify its own hardware. Back in 2010, at least, NVIDIA was using Agilent software as part of their chip design workflow, employing a small in-house GPU cluster. At the time, the GPU maker was evaluating Rocketick's offering and the early results looked "promising."
Full story at EE Times
The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
Read more...
Although Horst Simon was named Deputy Director of Lawrence Berkeley National Laboratory, he maintains his strong ties to the scientific computing community as an editor of the TOP500 list and as an invited speaker at conferences.
Read more...
Supercomputing veteran, Bo Ewald, has been neck-deep in bleeding edge system development since his twelve-year stint at Cray Research back in the mid-1980s, which was followed by his tenure at large organizations like SGI and startups, including Scale Eight Corporation and Linux Networx. He has put his weight behind quantum company....
Read more...
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.