NCSA
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud
Green Computing Report

Tabor Communications
Corporate Video

A Dark Matter for Astrophysics Research


Back in 2008, the Sloan Digital Sky Survey (SDSS) came to an end, leaving behind hundreds of terabytes of publicly-available data that has since been used in a range of research projects. Based on this data, researchers have been able to discover distant quasars powered by supermassive black holes in the early universe, uncover collections of sub-stellar objects, and have mapped extended mass distributions around galaxies with weak gravitational fields.

Among the diverse groups of scientists tackling problems that can now be understood using the SDSS data is a team led by Dr. Risa Wechsler from Stanford University's Department of Physics and the SLAC National Accelerator Laboratory.

Wechsler is interested in the process of galaxy formation, the development of universal structure, and what these can tell us about the fundamental physics of the universe. Naturally, dark energy and dark matter enter the equation when one is considering galactic formation and there are few better keys to probing these concepts than data generated from the SDSS.

Just as the Sloan Digital Sky Survey presented several new data storage and computational challenges, so too do the efforts to extract meaningful discoveries. Teasing apart important information for simulations and analysis generates its own string of terabytes on top of the initial SDSS data. This creates a dark matter of its own for computer scientists as they struggle to keep pace with ever-expanding volumes that are outpacing the capability of the systems designed to handle them.

Wechsler's team used the project's astronomical data to make comparisons in the relative luminosity of millions of galaxies to our own Milky Way. All told, the project took images of nearly one-quarter of the sky, creating its own data challenges. The findings revealed that galaxies with two satellites that are nearby with large and small Magellanic clouds are highly unique -- only about four percent of galaxies have similarities to the Milky Way.

To arrive at their conclusions, the group downloaded all of the publicly available Sloan data and began looking for satellite galaxies around the Milky Way, combing through about a million galaxies with spectroscopy to select a mere 20,000 with luminosity similar to that of our own galaxy. With these select galaxies identified, they undertook the task of mining those images for evidence of nearby fainter galaxies via a random review method. As Wechsler noted, running on the Pleiades supercomputer at NASA Ames, it took roughly 6.5 million CPU hours to run a simulation of a region of the universe done with 8 billion particles, making it one of the largest simulations that has ever been done in terms of particle numbers. She said that when you move to smaller box sizes it takes a lot more CPU time per particle because the universe is more clustered on smaller scales.

Wechsler described the two distinct pipelines required for this type of reserach. First, there's the simulation in which researchers spend time looking for galaxies in a model universe. Wechsler told us that this simulation was done on the Pleiades machine at Ames across 10,000 CPUs. From there, the team performed an analysis of this simulation, which shows the evolution of structure formations on the piece of the universe across its entire history of almost 14 billion years -- a process that involves the examination of dark matter halo histories across history. As she noted, the team was "looking for gravitationally bound clumps in that dark matter distribution; you have a distribution of matter at a given time and you want to find the peaks in that density distribution since that is where we expect galaxies to form. We were looking for those types of peas across the 200 snapshots we tool to summarize that entire 14 billion year period."

The team needed to understand the evolutionary processes that occurred between the many billions of years captured in 200 distinct moments. This meant they had to trace the particles from one snapshot to the next in their clumps, which are called dark matter halos. Once the team found the halos, which again, are associated with galaxy formation, they did a statistical analysis that sought out anything that looked like our own Milky Way. Wechsler told is that "the volume of the simulation was comparable to the volume of the data that we were looking at. Out of the 8 million or so total clumps in our simulation we found our set of 20,000 that looked like possibilities to compare to the Milky Way. By looking for fainter things around them -- and remember there are a lot more faint things than bright ones -- we were looking for many, many possibilities at one time."

The computational challenges are abundant in a project like this Wechsler said. Out of all bottlenecks, storage has been the most persistent, although she noted that as of now there are no real solutions to these problems.

Aside from bottlenecks due to the massive storage requirements, Wechsler said that the other computational challenge was that even though this project represented one of the highest resolution simulations at such a volume, they require more power. She said that although they can do larger simulation in a lower resolution, getting the full dynamic range of the calculation is critical. This simulation breaks new ground in terms of being able to simulate Magellenic cloud size objects over a large volume, but it's still smaller than the volume that the observations are able to probe. This means that scaling this kind calculation up to the next level is a major challenge, especially as Wechsler embarks on new projects.

"Our data challenges are the same as those in many other fields that are tackling multiscale problems. We have a wide dynamic range of statistics to deal with but what did enable us to do this simulation is being able to resolve many small objects in a large volume. For this and other research projects, having a wide dynamic range of scales is crucial so some of our lessons can certainly be carried over to other fields."

As Alex Szalay friom the Johns Hopkins University Department of Physics and Astonomy noted, this is a prime example of the kinds of big data problems that researchers in astrophysics and other fields are facing. They are, as he told us, "forced to make tradoffs when they enter the extreme scale" and need to find ways to manage both storage and CPU resources so that these tradeoffs have the least possible impact on the overall time to solutions. Dr. Szalay addressed some of the specific challenges involved in Wechsler's project in a recent presentation called "Extreme Databases-Centric Scientific Computing." In the presentation he addresses the new scalable architectures required for data-intensive scientific applications, looking at the databases as the root point to begin exploring new solutions.

For the dark energy survey, the team will take images of about one-eighth of the sky going back seven billion years. The large synoptic survey telescope, which is currently being built will take images of the half the sky every three days and will provide even more faintness detection, detecting the brightest stars back to a few billion years after the big bang. One goal with this is to map where everything is in order to figure out what the universe is made of. Galaxy surveys help with this research because they can map the physics to large events via simulations to understand galactic evolution.

Sponsored Links

Webinar: Programming Heterogeneous X64+GPU Systems Using OpenACC
Join Michael Wolfe as he compares the advantages and costs of using both low-level models and the directive-based OpenACC model for programming accelerated heterogeneous systems. Registration is free.

High-Performance Computing in Action
Businesses that want to be on the cutting edge of their industries are increasingly turning to high-performance computing (HPC) solutions to handle complex compute processes and speed up their rate of innovation. Download this Executive Brief to see how businesses in energy, life sciences and entertainment put HPC solutions to work in their operations.

Accelerate your science with Seneca
One of the first HPC providers installing a 4X NVIDIA Kepler K-20 cluster. Invites you to a free evaluation on Seneca’s NVIDIA K20 Kepler cluster, pre-loaded with AMBER, NAMD, LAMMPS

May 21, 2013

May 20, 2013

May 17, 2013

May 16, 2013

May 15, 2013

May 14, 2013

May 13, 2013

May 10, 2013

May 09, 2013

May 08, 2013


Most Read Features

Most Read Around the Web

Most Read This Just In

Supermicro

Short Takes

Running Computational Fluid Dynamics in the Cloud

May 16, 2013 | When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...

Computing the Physics of Bubbles

May 15, 2013 | Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...

Internet2 Awards Program Seeks Innovative Applications

May 10, 2013 | Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
Read more...

Floating Funding to Exascale Island

May 09, 2013 | The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
Read more...

HPC and the True Cost of Cloud

May 08, 2013 | For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
Read more...

Sponsored Whitepapers

Best Practices in Big Data Storage

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Progress in Parallel: the Bull Parallel Programming Center

04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.

Sponsored Multimedia

SGI DMF ZeroWatt Disk Solution

In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

SC12 Editorial Feature HPCwire Soundbite sponsored by ISC

HPC Job Bank


Featured Events


  • June 16, 2013 - June 20, 2013
    ISC'13
    Leipzig,
    Germany

  • June 17, 2013 - June 18, 2013
    Forecast 2013
    San Francisco, CA
    United States





HPCwire Events