Visit additional Tabor Communication Publications
November 25, 2005
TRIPS (The Tera-op, Reliable, Intelligently adaptive Processing System) is a new microprocessor architecture being designed at the University of Texas at Austin. The work is currently funded by the Defense Advanced Research Projects Agency as part if its Polymorphous Computing Architectures program. The TRIPS team consists of 30 faculty members, research scientists, graduate students, undergraduates, and post docs, and is led by professors Doug Burger and Stephen Keckler. Professor Kathryn McKinley leads the TRIPS compiler effort. The project's ambitious goal is to produce a scalable architecture that provides a general-purpose microprocessor capable of executing more than a trillion calculations per second.
It is well understood that traditional microprocessor architectures, either CISC or RISC, are fast approaching their practical limitations. Core design complexity and increasing power consumption have pushed modern microprocessor architectures up against a performance wall. In an attempt to execute more than one instruction per cycle, pipeline technology has been built into microprocessors. But the complexity of pipelines limits overall performance increases. "A lot of features have been added to make microprocessors more amenable to high clock rates," said Doug Burger, project co-lead on TRIPS. "But if you look at processor performance today, in terms of instruction concurrency per cycle, it is no better than it was ten years ago - maybe worse."
To compensate, chip designers are placing multiple cores on a chip to gain increased performance. The downside with this approach is that it requires thread-aware software that can take advantage of the multiple cores. Also, a lot of software applications are, by nature, single-threaded. "Our goal was to a come up with a uniprocessor architecture that was scalable to a much greater degrees of instruction-level parallelism," explained Stephen Keckler, project co-lead for TRIPS. "The reason this is of interest to the HPC community is that, although they are comfortable with very large numbers of processors, high-performance multiprocessors start with very powerful uniprocessors."
TRIPS is fundamentally a data flow architecture. What this means is that instructions execute as soon as their operands arrive, rather than in some sequence imposed by the compiler or the programmer. In contrast to von Neumann architectures, data flow architectures use the availability of data to fetch instructions rather than the availability of instructions to fetch data.
A basic characteristic of the TRIPS design is its block-oriented execution. The processor loads a block of up to 128 instructions, as if it were a single instruction, greatly decreasing the overhead associated with instruction handling and scheduling. So rather than operating on only a few computations at a time, the TRIPS processor operates on many instructions, mapped to a grid of execution nodes.
Each relatively simple execution node contains an integer ALU, an FPU and a set of reservation stations - effectively an instruction queue for the node. The simplicity of the execution nodes is one of the architectural features that enables easy scalability.
Integral to the project is the TRIPS compiler, which converts standard C or FORTRAN into execution blocks. "The TRIPS compiler's job is to code dependencies between the instructions and map the instructions on to the grid of [execution nodes]," explained Kathyrn McKinley, TRIPS compiler lead. "Current ISAs have no notion of instruction position - all is determined in the pipeline at runtime," Instruction scheduling is still performed in the hardware in order to handle dynamic latencies. But the TRIPS model shifts the responsibilities of instruction dependence analysis and instruction mapping from run-time to compile-time, where the known dependencies can be encoded. This saves precious hardware resources that can better be used to accelerate run-time performance.
Also, because the architecture does not depend on explicit code parallelization to achieve high performance, it can be used with conventional programming models - a key advantage for existing software.
A major design goal of the TRIPS architecture is to support "polymorphism," that is, the capability to provide high-performance execution for many different application domains. Polymorphism is one of the main capabilities sought by DARPA, TRIPS' principal sponsor. The objective is to enable a single processor to perform as if it were a heterogeneous set of special-purpose processors. The advantages of this approach, in terms of scalability and simplicity of design, are obvious.
To implement polymorphism, the TRIPS architecture employs three levels of concurrency: instruction-level, thread-level and data-level parallelism (ILP, TLP, and DLP, respectively). At run-time, the grid of execution nodes can be dynamically reconfigured so that the hardware can obtain the best performance based on the type of concurrency inherent to the application. In this way, the TRIPS architecture can adapt to a broad range of application types, including desktop, signal processing, graphics, server, scientific and embedded.
In collaboration with engineers at IBM, the TRIPS team is now developing a prototype system, slated for completion in March 2006. The prototype chip will contain up to four processor cores, each capable of executing 16 instructions - integer or floating point - per clock cycle, and a partitioned cache structure designed to offer higher performance than traditional approaches. The chip will contain more than 250 million transistors and will operate at 500 megahertz. The objective is to demonstrate the feasibility of full-scale industrial development that could offer a 10-gigahertz chip capable of executing more than a trillion instructions per second.
The TRIPS team is eagerly awaiting the completion of the prototype. According to team members, preliminary performance results based on execution simulations have demonstrated "substantially better instruction-level parallelism than anything else out there." The project has formed well-established relationships with several semi-conductor companies and is actively pursuing commercialization of the TRIPS technology.
For more information on the TRIPS project, visit http://www.cs.utexas.edu/users/cart/trips/.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
May 09, 2013 |
The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
May 08, 2013 |
For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.