July 3, 2014

TACC’s New Director Shares Strategy, System Futures

Nicole Hemsoth

Each of the national labs and supercomputing sites have defining characteristics or “personalities” that are most often driven by the user communities that exploit their computational resources. Certain centers are affiliated with particular missions or needs. Some tend to prefer architectures that maximize overall performance and size in a way that tops the Top 500 charts; others are conceived around specific application needs in energy, astrophysics, life sciences or other areas—and still others are known for making bold, diverse architectural decisions because their user bases are so varied.

Among those centers that fall into the last category is certainly the Texas Advanced Computing Center (TACC), which over the last several years, has become a site to watch because of the constant string of innovative choices. With a very broad user base coming in from NSF-funded projects, the team has had to balance the desire for high performance, availability, efficiency and accessibility with their goals to explore the potentially disruptive supercomputing technologies.

For instance, the Stampede system was the first supercomputer to successfully blend together GPU and Xeon Phis to create a hybrid that could allow users to test, optimize, and run their applications across different architectures for variable performance gains. Other machines, including most recently, Wrangler, are dedicated to exploring the emerging class of data-intensive or “big data” problems in science that traditional supercomputers aren’t designed to tackle. This has meant the team behind machines like these has had to think beyond normal courses of system design and explore new technologies.

dan_stanzioneLeading the charge for all of these missions—and several of TACC’s most interesting systems (Stampede and Wrangler in particular) over the last several years—is Dan Stanzione. He was formally named Executive Director of TACC this week, following his long stint as deputy director, which began in 2009. Before that, he was actively involved in architectural and system design choices at other centers, including his role as founding director of the Fulton High Performance Computing Initiative at Arizona State University.

Stanzione has led some bold choices at TACC and is making system architecture diversity a central theme in his tenure going forward. Instead of just focusing on large-scale HPC and simulation resources (represented by Stampede), he is pushing big data and cloud computing as other initiatives. ““We need an ecosystem of different kinds of systems to support the growing diversity of scientific computing workloads,” he told us this week. He said the team will be leveraging lessons learned on both the scalable manycore architectural front from Stampede with those on the data-intensive side as represented by the Wrangler machine. “Our future systems will fuse these two system types—blending scalable manycore techniques with superfast IO.” Further, leveraging all of this using cloud models via their OpenStack-based Rodeo cluster for front-end applications when appropriate will add further possibilities for their many users.

Just to level set on the current supercomputing stable at TACC, Stanzione described the future of the #7 ranked Dell-built Stampede syoercomputer, which is set for an upgrade cycle within the next year and a half with some variant of the upcoming Knight’s Landing architecture at the core. While we weren’t able to determine details about their architectural selection (and since Intel hasn’t officially released dates and full specs for the coming self-hosted parts) we’ll have to wait until more is clear this summer The team is currently also looking into possibilities for its next big system, which he says we’ll learn more about sometime this year.

Other important machines at TACC, including the interactive visualization cluster, Maverick, and Lonestar, which is now being used as a throughput-oriented system are seeing solid utilization but as it stands, their big clusters are saturated. “In this quarter alone the demand for Stampede was 6x what we had available,” Stanzione said. They will push some of those workloads over to Wrangler when it goes live at the beginning of 2015, but that system is dedicated to exploring some non-traditional HPC problems given its unique architecture and purpose-built design for handling large-scale graph algorithms and problems and massive analytics across large datasets.

The key to these capabilities is one of the more interesting stories out there—even if details are still somewhat light. At the heart of Wrangler is the technology provided by DSSD, the Andy Bechtolsheim startup that had a long life in stealth and then immediately went into the paws of EMC. We talked technical detail about the system with co-PI Chris Jordan back in 2013 if you’d like to pop over and review, but needless to say, seeing what’s possible with 100,000 flash chips stacked into an array and attached directly via PCI will be interesting to watch. Stanzione was giddy about this machine in particular, noting that more details about it will emerge in the next couple of months.

The important thing, he says, is that having a dedicated data-intensive machine will let TACC handle a new class of applications that have never fit into their HPC portfolio to date. This includes a broad array of life sciences, economics, humanities and other projects that require analytics operations that aren’t a good fit for traditional supercomputers.

“It’s a great time to be alive in HPC,” Stanzione said. And it seems especially good times at TACC where ground will be broken on new facilities to host coming systems, workspaces for the systems groups, a visualization area, and much-needed office space to accommodate the additional 80 people the center is bringing into the fold.

“I want to continue with the successes we’ve had and push with our NSF, open science and HPC projects. I look forward to diversifying these to help more users across different disciplines and missions. We want to be a leader in HPC but also in data and cloud, pushing manycore and other new technologies.”