The NSF recently sent out a high performance system solicitation to broaden their range of capabilities and provide a more “inclusive computing environment” for science and engineering, which while closed to new submissions, has opened the door to a few questions.
According to the agency, some of the new problem areas they want to address involve applications “that are extremely data intensive and may not be dominated by floating point operation speed. While a number of the earlier acquisitions have addressed a subset of these issues, the current solicitation emphasizes this even further.”
With NSF-funded systems like Blue Waters and Stampede up and running, the agency says that there are other needs the scientific community has expressed, particularly as they relate to solving data-intensive challenges. Although this is not to say that they’ve turned a blind eye to hyper-performance systems, the solicitation makes little mention of what similar solicitations yielded when they decided on systems like Stampede, for instance,
In other words, we gave your FLOPs already, folks. It’s time for something new.
Among the elements that the NSF has deemed worthy of funding are:
- Complement existing XD capabilities with new types of computational resources attuned to less traditional computational science communities;
- Incorporate innovative and reliable services within the HPC environment to deal with complex and dynamic workflows that contribute significantly to the advancement of science and are difficult to achieve within XD;
- Facilitate transition from local to national environments via the use of virtual machines;
- Introduce highly useable and cost efficient cloud computing capabilities into XD to meet national scale requirements for new modes of computationally intensive scientific research;
- Expand the range of data intensive and/or computationally-challenging science and engineering applications that can be tackled with current XD resources;
- Provide reliable approaches to scientific communities needing a high-throughput capability:
- Provide a useful interactive environment for users needing to develop and debug codes using hundreds of cores or for scientific workflows/gateways requiring highly responsive computation;
- Deal effectively with scientific applications needing a few hundred to a few thousand cores;
- Efficiently provide a high degree of stability and usability by January, 2015
To better understand how these “big data” driven needs intersect with other large-scale computing initiatives, including exascale ambitions, we talked with Barry Schneider and Irene Qualters, both program directors in the division of advanced cyberinfrastructure in the computer and information scinces directorate.
The two dealt directly with the acquisitions of Blue Waters, Stampede, Kraken, Gordon, Blacklight, and other research systems. They also work within the XSEDE program to ensure that researchers have access to required computational resources. Qualters says that the NSF has focused on large-scale, high performance systems in the form of Blue Waters and Stampede, “and those are highly usable and fit what people need computationally.” Still, she says, the NSF is not just trying to expand the number of services—they’re trying to broaden the scope of them.
Qualters and Schneider agree that when it comes to pushing funding toward exascale systems or data-intensive challenges, there is not an either/or distinction since both areas feed different streams of research. However, the NSF has gathered details from user communities about what they require and the broadening array of new scientific instruments (everything from new telescopes to gene sequencers) has yielded a definite call to deal with ever-larger, more diverse, and complex data from across several fields.
“We have been interested in data-intensive for quite some time and that focus is there but we’re also recognizing that new communities are having diff computational needs based on the types of research they’re involved with—this could data-intensive tools or just an expansion of visualization capability, for instance. We want to make sure that they have the cyberinfratructure to do so and do it at a national level,” said Qualters.
Schneider explained that it would send the wrong message to send if it came across that this solicitation was a purely data-intensive call since his team is looking for a balanced set of resources for XSEDE projects and researchers who have stretched the current capabilities of their university machines. However, he said that research groups need to have access to other resources, including everything from virtual machines to new hardware and software tools to allow them to make use of broadening data types and volumes.
“Not everyone needs 100,000 cores,” Schneider said. Most of the researchers they work with via XSEDE and the systems that form its backbone are simply looking for the most efficient way to get their science on the table. He noted that for now the focus is on these new hardware and software tools to support the new needs, but there is nothing preventing them from switching course in two years and funding another system to trump Blue Waters or Stampede. It’s all about what the community tells them is needed, he stressed.
To arrive at the priorities included in their goals for data, software, campus bridging, security and education within the larger computational and data-driven science and engineering, the NSF gathers input from their own internal experts and six task force committees dedicated to specific areas. Last February, the NSF released their vision for the next generation of advanced computing infrastructure for science and engineering, the goal of which was to ensure that research communities had access to the needed computational resources to move forward.
This set of principles guides their funding course for the current cycle and while exascale projects are nowhere in sight, there are some unique technologies that are finally getting a chance to shine. As for exascale in general, Qualters says that for the NSF, it’s not a matter of if, it’s a question of how and when. She emphasized the belief that there is a big difference between what her agency sees as exascale and what the benchmarks show are different—but reiterated that funding decisions won’t be an question of choosing exascale over “big data” science, it will be a decision based on what the research community needs at the time and what is practical for real-world applications.