Since 1986 - Covering the Fastest Computers in the World and the People Who Run Them

Language Flags
May 30, 2013

TACC’s Hadoop Cluster Breaks New Ground

Nicole Hemsoth

A 256-node Hadoop system at the University of Texas at Austin is breaking down the barriers that have traditionally kept high performance computing relegated to technical experts. Nearly 70 students and researchers at the Texas Advanced Computing Center (TACC) have used the cluster to crunch big datasets, and provide potential answers to questions in the fields of biomedicine, linguistics, and astronomy.

There’s been a lot of hype over Apache’s Hadoop in the last few years, and with good reason. With the emergence of big data, new technologies like Hadoop promise to make it easier to sort through huge datasets and tease out the patterns, without burdening users with low-level plumbing, like I/O, memory structures, and job queuing.

What’s notable about the TACC’s Hadoop cluster is that it represents the first Hadoop implementation running on a supercomputer at a U.S. high performance computing center. Until the folks at TACC loaded Hadoop on their 256-node Dell cluster (dubbed Longhorn) in the fall of 2010, you couldn’t find Hadoop running on an academic supercomputer, according to Aaron Dubrow, a science and technology writer at TACC.

In the 3.5 years that the TACC cluster has been online, it’s seen more than one million hours of data intensive computations across 19 different projects, and has been the basis for dozens of papers and presentations ranging from flow cytometry (FCM) to natural language processing.

Longhorn helped accelerate the identification of cell types using FCM, which is a technology used by medical researchers. Thanks to the cluster’s ability to automatically create and schedule parallel tasks based on the user’s job specification, the FCM processing got an immediate boost, and eliminated the need to rewrite the open-source software to handle big data sets.

The cluster was also used by linguistic researchers to show how language is connected across time and space. A UT linguistics professor applied the TextGrounder algorithm against a collection of British and American books from a century ago. The results were then meshed with a geobrowser to display where words have their roots.

Others are using the 96-TB Hadoop cluster to help sort the wheat from the chaff on the Internet as it relates to one topic in particular: Autism. UT researchers are using visualization techniques to help the parents of autistic children find information and support on the Web more quickly.

TACC is also working with Intel to find out how Hadoop clusters can be goosed to run scientific workloads faster, particularly as it relates to speedier interconnects. The groups shared their work together with a white paper that was recently published.

Related Articles

Saddling Phi for TACC’s Stampede

Revelations on Roadrunner’s Retirement

Details Emerge About China’s 50+ Petaflop Chart-Topper

Tags: ,

SC14 Virtual Booth Tours

AMD SC14 video AMD Virtual Booth Tour @ SC14
Click to Play Video
Cray SC14 video Cray Virtual Booth Tour @ SC14
Click to Play Video
Datasite SC14 video DataSite and RedLine @ SC14
Click to Play Video
HP SC14 video HP Virtual Booth Tour @ SC14
Click to Play Video
IBM DCS3860 and Elastic Storage @ SC14 video IBM DCS3860 and Elastic Storage @ SC14
Click to Play Video
IBM Flash Storage
@ SC14 video IBM Flash Storage @ SC14  
Click to Play Video
IBM Platform @ SC14 video IBM Platform @ SC14
Click to Play Video
IBM Power Big Data SC14 video IBM Power Big Data @ SC14
Click to Play Video
Intel SC14 video Intel Virtual Booth Tour @ SC14
Click to Play Video
Lenovo SC14 video Lenovo Virtual Booth Tour @ SC14
Click to Play Video
Mellanox SC14 video Mellanox Virtual Booth Tour @ SC14
Click to Play Video
Panasas SC14 video Panasas Virtual Booth Tour @ SC14
Click to Play Video
Quanta SC14 video Quanta Virtual Booth Tour @ SC14
Click to Play Video
Seagate SC14 video Seagate Virtual Booth Tour @ SC14
Click to Play Video
Supermicro SC14 video Supermicro Virtual Booth Tour @ SC14
Click to Play Video