Visit additional Tabor Communication Publications
September 06, 2012
We're only a little more than halfway through 2012, but Intel has already announced the 2013 versions Parallel Studio XE and Cluster Studio XE, two software suites that support x86-based parallel programming for high performance computing and beyond. Intel refreshes their software development offerings each year at about this time to sync up its tool support with the latest and greatest silicon and to add new features for developers. And since the chipmaker has been busy churning out new microarchitectures, there's lots of new software gadgetry.
The refresh will be especially interesting for HPC developers, since Intel is including full support for its upcoming Xeon Phi coprocessor, the chipmaker's manycore product line that is set to debut before the end of the year. Although Intel had beta versions of some of these Phi-capable tools and libraries prior to this, the 2013 toolset will provide complete support for HPC programmers developing codes for Knights Corner, the company's first commercial manycore offering.
By design, Xeon CPUs and Xeon Phi share the same basic x86 ISA. However, the SIMD instructions and vector width are not shared, so it's up to the compiler and libraries to abstract away that difference by automatically generating the appropriate code for the intended target -- which it does. But as we've reported before, tuning applications for optimal performance on Xeon Phi is more than likely going to involve code changes. Nevertheless, the ability to do a simple recompile and link on existing code to get a working Xeon Phi executable will remove a lot of pain and suffering while porting HPC applications.
The new Parallel Studio will also include compiler and tool support for "Ivy Bridge," the 22nm shrink of the Sandy Bridge microarchitecture. Again, Intel had support for these processors prior to this release, but they've been able to tune performance thanks to early customer feedback and in-house experience with the chips. Ivy Bridge parts for desktop and mobile platforms are already in the field; server versions are set to arrive next year.
Support for "Haswell," Intel's next microarchitecture following Ivy Bridge, has also been added. Haswell will include interesting goodies like transactional memory support, a feature that is designed to make parallel programming much easier since it automates the protection of shared data across threads. IBM's Blue Gene/Q chip implements this feature today and it's no big surprise that Intel has followed suit. The first Haswell CPUs should start shipping in 2013, although the server chips are not likely to arrive until the following year.
Beyond just supporting new silicon, Intel has also added a bunch of enhancements designed to make programming and debugging parallel apps easier. Some of major new features include:
Java profiling: Although Java is not used much in HPC codes, some financial applications do wrap Java around their performance-sensitive algorithms. This new profiling capability could help those users determine if those code bits are causing bottlenecks.
CPU power analysis: This is used to determine the sleep state of the processor to make sure unused resources are in their proper low-power mode. Obviously, if unused cores are spinning rather than sleeping, that just heats up the datacenter and make the utility companies richer.
Pointer checking: An option for the C/C++ compiler that determines when a pointer with a specified address range (one attached to a malloc, array or other data structure) starts accessing data outside its specified limits. This can be quite a useful feature since rogue pointers can silently corrupt your data, which as far as programmers are concerned, is the devil's work.
Heap growth analysis: Intel added a variety of new ways to run down memory leaks. Tracking them manually with a debugger or printf statement can be one of the most frustrating and tedious endeavors. Even if this feature only works some of the time, it's still worth it.
Conditional numerical reproducibility: This ensures that floating point calculations produce consistent results every time they are executed (assuming the same machine). Since the order of operations can change across different runs, rounding errors can generate somewhat different results, which while still valid, can be problematic for things like test suites and acceptance testing. The only downside to turning on this feature is a 10 to 20 percent performance penalty.
Fortunately, performance is usually going in the other direction. According to Intel Software director James Reinders, these latest C/C++ and Fortran compilers and runtime libraries are speedier than ever and among the best in the business. For AVX floating point operations in particular, the Intel C++ compiler outruns some of the more popular competition by a wide margin. Using the SPECfp_base2006 floating point benchmark, Intel generates code that executes 97 percent faster and 164 percent faster than that of Microsoft's Visual C++ and GCC, respectively.
Not everyone relies on fast compilers though. Reinders says their most demanding customers will resort to the analysis tools to get the ultimate in performance. "If you just want to do a recompile and link with a library, you can get a good speedup," he explains. "But if you want to start chasing how many TLB misses you have and get the compiler to push pages around so you can get the top score on something, we support that too."
Parallel Studio XE 2013 is available starting this week and retails at $1,599-$2,299 -- depending on if you want Fortran, C/C++ or both. Cluster Studio XE 2013 is basically a superset of Parallel Studio, adding MPI libraries and analysis tools, as well as a cluster installation utility. It retails for $2,949, and is scheduled to ship in the fourth quarter of 2012.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
May 09, 2013 |
The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
May 08, 2013 |
For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.