This week, Dan Reed, Corporate Vice President of Microsoft’s eXtreme Computing Group and Technology Strategy and Policy reiterated the growing demands for more effective ways for scientists to manage the vast influx of data that is streaming in from the ever-growing number of sensors and instruments. He stated that, “In all domains, scientists and researchers are drowning in data. They’ve gone from scarcity to an incredible richness, necessitating a sea change in how they extract insight from all this data.”
In addition to the mounting data, there are also concerns about how data can more seamlessly intersect across disciplines. Since scientists have work that usually overlaps with research from other fields, making the integration of various projects and models as intuitive as possible should be a primary concern.
To highlight this point, consider the case of the Gulf oil spill where scientists were tasked not only with looking for solutions, but what the spill could mean for oil distribution in the water and the subsequent impact on localized ecosystems. This goes, of course, far beyond using computational fluid dynamics to examine the problem myopically—it involves thousands of researchers in many areas all working together to solve the same problem, but in compartmentalized ways.
When a problem like an oil spill or other major event arises, the moment for easy-to-deploy solutions is more critical than ever, however, as Reed notes, “unfortunately, [this requires] many researchers to assume additional systems administrator roles. These researchers often spend inordinate amounts of time maintaining the computing systems they require to do their research rather than devoting their time and talents to the research itself.”
With the rising cost, both in terms of scientist efforts to maintain the needed systems as well as absorbing the costs of general maintenance of massive machines, there is no doubt that a more streamlined system for scientists that removes the complexity of use and allows them to shed the operational expense burden would be of incredible value.
Reed states that, “fortunately, the emergence of cloud computing, coupled with powerful software on clients, such as local desktop computer, offers a solution to this conundrum.” Since the cloud is maintained off-site and kept up to speed with the latest updates and fixes as needed, scientists are freed from the burden and furthermore, the cost issue is mitigated as scientific users can use a “pay as you go” paradigm to maximize efficiency (power and cost) for peak loads and easily “shut down” or drastically reduce use when demand is lessened.
What this means in the context of a major event that requires scientists across disciplines to quickly build or deploy models to collaborate on solutions is that “organizations can buy just-in-time services to process and exploit data, rather than on infrastructure” which lowers the cost barrier of entry, speeds the time to solution, and allows scientists to focus on science rather than the systems they require.
The number of devices that scientists use as instruments of one kind or another, including mobile applications they use as they gather data, iPads, or standard laptops, are growing in number, adding to the data deluge but refining the way scientists gather data in near real-time. As Reed sees it, “the cloud offers unique opportunities to support a global, multi-party and neutral type of collaboration—allowing a diverse set of experts scattered across multiple contents to bring their expertise to bear…By extending the capabilities of powerful, easy to use PC, Web, and mobile applications through on-demand cloud services, the capabilities of the entire research community will be broadened significantly, accelerating the pace of engineering and scientific discovery.”
As we often try to communicate here by highlighting use cases of cloud for HPC and as Reed echoes “the net effect will be the democratization of research capabilities that are now available only to the most elite scientists.”