Collecting my thoughts after a thrilling and enlightening Supercomputing Conference in Denver, I want to discuss some of the key trends and highlights observed. The annual conference is always a hotbed of product announcements and this year, the 25th running, was no exception. There are many articles covering the key announcements, so I instead want to focus on some of the overarching themes.
Big Data & Hadoop
It’s no surprise that big data, the industry buzzword of the year, has filtered down to the world of supercomputing and it was out in full force at SC13, on the booths and in the talks. The challenges being addressed by big data technology, such as Hadoop, are very similar to those that the HPC community have been dealing with for decades. The novel concept with platforms like Hadoop however is to move the compute to the data rather than the other way around. Great in theory however there are some serious limitations with Hadoop when being applied to typical HPC workloads. For starters, Hadoop spins-up JVMs up the yin yang so is inappropriate for fine-grained high throughput workloads. Secondly, HDFS (the distributed file system) has serious limitations, particularly with smaller files. It is however easy enough to plug in other high performance file systems like Gluster or Lustre as demonstrated by Intel and Cray, and this was certainly a common approach observed at SC13. Some of the more HPC-friendly implementations use the YARN resource manager in Hadoop 2, some non-MapReduce job execution and Lustre file system – that doesn’t leave much of any Hadoop…
Hadoop is a very trendy hammer and an excellent one at that. Sure you can hammer screws, but it ain’t very elegant. I get the impression many are trying to fashion their screws into nails so they can get on the bandwagon.
Cloud
At SC11 two years ago in Seattle, cloud was barely mentioned once. In Salt Lake City last year, there were a significant number of talks on cloud and HPC but not much on the show floor. This year there was an increase in talks again but in addition, every other exhibitor was talking about cloud in some shape or form. Sure, a lot of it was due to naïve marketing attempts to cloudwash some dying product line but the point is that cloud and HPC are indisputable bedfellows. We are now seeing more specialized hardware better suited to HPC workloads feature in many of the major public clouds including AWS with their new C3 instances with “enhanced networking” which supposedly supports lower latency, and of course Windows Azure and their InfiniBand-based HPC hardware. Bare-metal clouds as with IBM Softlayer and OpenStack support are also becoming more widespread which address many of the performance concerns “the cloud resistance” still hold.
Exascale
Once again there was much talk of exascale and the steep mountain of challenges that lie ahead to achieve this important milestone. When you boil it down, exascale is going to require parallelism over a billion cores – or more – assuming power consumption will continue to limit core speeds. Whilst the hardware continues to move forward rapidly with manycore advances being announced; NVIDIA with their new Tesla X40 card and Intel their latest Xeon Phi incarnation named Knights Landing, it’s the software to support the level of parallelism required that remains the real challenge.
All in all, another fantastic SC. The organizers do a stunning job with this conference. The technical program is awesome and the exhibitors put on a great show. I can’t wait to see what progress is made over the next 12 months!