Here is a collection of highlights from this week’s news stream as reported by HPCwire.
Johns Hopkins Builds Data Mining Super Machine
While most supercomputing designs nowadays are focused on achieving a maximum number of FLOPS (floating point operations per second), researchers at Johns Hopkins University are designing a scientific instrument that will enable a maximum number of IOPS (I/O operations per second). This novel architecture will be better suited to analyzing the enormous amounts of data that today’s science generates.
Dubbed the Data-Scope, the machine is currently being developed by a group led by computer scientist and astrophysicist Alexander Szalay of Johns Hopkins’ Institute for Data Intensive Engineering and Science. The National Science Foundation is providing funding in the form of a $2.1 million grant and Johns Hopkins is contributing nearly $1 million to the project.
According to Szalay:
“Computer science has drastically changed the way we do science and the science that we do, and the Data-Scope is a crucial step in this process. At this moment, the huge data sets are here, but we lack an integrated software and hardware infrastructure to analyze them. Data-Scope will bridge that gap.”
Data-Scope’s design will include a combination of hard disk drives, solid state disks, and GPU computing, enabling it to handle five petabytes of data, with a sequential I/O bandwidth close to 500 gigabytes per second, and a peak performance of 600 teraflops. The machine will be adept at data mining, able to discern relationships and patterns in data leading to discoveries that would otherwise not be possible.
There is already a backlog of data just waiting to be analyzed — three petabytes worth from about 20 interested research groups within Johns Hopkins. Szalay explains that without Data-Scope, the researchers would have to wait years in order to analyze the data already in existence, never mind the data that will keep accumulating in the meantime.
Data-Scope is expected to being operation in May 2011 and will handle a range of subject matter, including genomics, ocean circulation, turbulence, astrophysics, environmental science, and public health.
Szalay underscores the importance of the project:
“There really is nothing like this at any university right now. Such systems usually take many years to build up, but we are doing it much more quickly. It’s similar to what Google is doing — of course on a thousand-times-larger scale than we are. This instrument will be the best in the academic world, bar none.”
Appro Outfits LLNL with Visualization Cluster
This week Appro launched its Appro HyperPower Clusters, providing Lawrence Livermore National Laboratory (LLNL) Computing Center with a new visualization cluster called “Edge.” The cluster is based on the Appro CPU/GPU GreenBlade System and is designed to support I/O bound applications, such as advanced data analysis and visualization tasks. It will also be used for LLNL’s exascale software development computing projects.
The system’s six racks house a total of 216 CPU nodes sporting Six-Core Intel Xeon processors and 208 NVIDIA Tesla GPUs nodes, delivering 29 teraflops of computing power. The system’s 20 terabytes of memory provide the increased level of I/O bandwidth needed for data analysis and complex visualization projects. QDR InfiniBand fabric connects the compute and graphic nodes.
According to Trent D’Hooge, a cluster integration lead at LLNL, Edge is the first data analysis cluster that has GPUs with ECC support and increased double-precision floating point performance.
Becky Springmeyer, computational systems and software environment lead of the Advanced Simulation and Computing program at LLNL, explains that ”post-processing tasks are heavily I/O bound, so specialized visualization servers that optimize I/O rather than CPU speed are better suited for this work, which will be now enabled through the ‘Edge’ cluster.”