Failure to incorporate big data computing insights into efforts to achieve exascale computing would be a critical mistake argue Daniel Reed and Jack Dongarra in their article, Exascale Computing and Big Data, published in the July issue of the Communications of the ACM journal. While scientific and big data computing have historically taken different development paths, the distinct problems they tackled are converging and lessons from both will be needed.
“Just a few years ago, the very largest data storage systems contained only a few terabytes of secondary disk storage, backed by automated tape libraries. Today, commercial and research cloud-computing systems each contain many petabytes of secondary storage, and individual research laboratories routinely process terabytes of data produced by their own scientific instruments,” they write.
The authors point out the exponential growth in the number of objects stored in Amazon’s Simple Storage Service (S3).
“Atop such low-level services, companies (such as Netflix) implement advanced recommender systems to suggest movies to subscribers and then stream selections. Scientific researchers also increasingly explore these same cloud services and machine-learning techniques for extracting insight from scientific images, graphs, and text data. There are natural technical and economic synergies among the challenges facing data-intensive science and exascale computing, and advances in both are necessary for future scientific breakthroughs. Data-intensive science relies on the collection, analysis, and management of massive volumes of data, whether obtained from scientific simulations or experimental facilities. In each case, national and international investment in “extreme scale” systems will be necessary to analyze the massive volumes of data that are now commonplace in science and engineering,” they write.
It’s a fascinating article in which the authors provide brief history of HPC and outline the challenges and developing solutions ahead on the path to exascale and knit them together with insights drawn from traditional technical computing and more recent big data efforts centered in the enterprise. A figure from the paper showing difference between the two computing ecosystems is show here.
Both authors are well established in the supercomputing world. Daniel Reed is Vice President for Research and Economic Development, and Professor of Computer Science, Electrical, and Computer Engineering and Medicine, at the University of Iowa. Jack Dongarra holds an appointment at the University of Tennessee, Oak Ridge National Laboratory, and the University of Manchester. He also, of course, an author the Top500.
Key insights, they cite are:
- The tools and cultures of high-performance computing and big data analytics have diverged, to the detriment of both; unification is essential to address a spectrum of major research domains.
- The challenges of scale tax our ability to transmit data, compute complicated functions on that data, or store a substantial part of it; new approaches are required to meet these challenges.
- The international nature of science demands further development of advanced computer architectures and global standards for processing data, even as international competition complicates the openness of the scientific process.
The authors also reprise the list of top ten exascale computing challenges identified by the Department of Energy’s Office of Advanced Scientific Computing Research subcommittee that is well-worth reviewing:
- Energy-efficient circuit, power, and cooling technologies.
- High-performance interconnect technologies.
- Advanced memory technologies to improve capacity. .
- Scalable system software that is power and failure aware.
- Data management software that can handle the volume, velocity, and diversity of data.
- Programming models to express massive parallelism, data locality, and resilience.
- Reformulation of science problems and refactoring solution algorithms.
- Ensuring correctness in the face of faults, reproducibility, and algorithm verification.
- Mathematical optimization and uncertainty quantification for discovery, design, and decision.
- Software engineering and supporting structures to enable productivity.
If the hurdles to exascale are high, so are the rewards argue Reed and Dongarra. “Every advance in computing technology has driven industry innovation and economic growth, spanning the entire spectrum of computing, from the emerging Internet of Things to ubiquitous mobile devices to the world’s most powerful computing systems and largest data archives. These advances have also spurred basic and applied research in every domain of science.”
There’s much more to be gleaned in the article. Reed and Dongarra review a wide range of applications (enterprise and scientific) that depend on HPC and have high impact on society. There’s also a deeper dive into some of the technology issues. Here is a link to the article: http://cacm.acm.org/magazines/2015/7/188732-exascale-computing-and-big-data/fulltext