Visit additional Tabor Communication Publications
April 23, 2009
CluE awards promote academic use of cluster computing resources on IBM/Google cloud
ARLINGTON, Va., April 23 -- Today, the National Science Foundation announced it has awarded nearly five million dollars in grants to fourteen universities through its Cluster Exploratory (CluE) program to participate in the IBM/Google Cloud Computing University Initiative. The initiative will provide the computing infrastructure for leading-edge research projects that could help us better understand our planet, our bodies, and pursue the limits of the World Wide Web.
In 2007, IBM and Google announced a joint university initiative to help computer science students gain the skills they need to build cloud applications. Now, the National Science Foundation is using the same infrastructure and open source methods to award CluE grants to universities around the United States. Through this program, universities will use software and services running on an IBM/Google cloud to explore innovative research ideas in data-intensive computing. These projects cover a range of activities that could lead not only to advances in computing research, but also to significant contributions in science and engineering more broadly.
The National Science Foundation awarded Cluster Exploratory (CluE) program grants to Carnegie-Mellon University, Florida International University, the Massachusetts Institute of Technology, Purdue University, University of California-Irvine, University of California-San Diego, University of California-Santa Barbara, University of Maryland, University of Massachusetts, University of Virginia, University of Washington, University of Wisconsin, University of Utah and Yale University.
"Academic researchers have expressed a need for access to massively scaled computing infrastructures that allow them to complete projects and research activities that have been difficult or impossible previously due to the amount of data involved," said Jeannette Wing, the assistant director for Computer & Information Science and Engineering at the National Science Foundation. "We are pleased to provide the awards to these fourteen universities, enabling researchers to engage with and explore this emerging and pervasive model of computing."
"IBM is intensely focused on applying technology and science to make the world work better," said Willy Chiu, vice president of IBM Cloud Labs. "IBM is thrilled to power the groundbreaking studies taking place at these prestigious universities, and to help enable researchers and students around the world tackle some of the biggest problems of our time."
"We're pleased and excited that the CluE program will support a wide range of original research," said Alfred Spector, Google's vice president for Research and Special Initiatives. "We're looking forward to seeing the grantees solve challenging problems across various fields through creative applications of distributed computing."
The universities will run a wide range of advanced projects and explore innovative research ideas in data-intensive computing, including advancements in image processing, comparative studies of large-scale data analysis, studies and improvements to the Internet, and human genome sequencing, among others, using software and services on the IBM/Google cloud infrastructure.
Researchers at Carnegie-Mellon University are using cloud computing to characterize the topicality of web content to more effectively process web searches. Routing searches topically requires less effort than traditional searches, enabling significant computational and financial savings. The project is using the Google/IBM cluster to "crawl" the web and perform the data cleansing and pre-processing necessary to develop a web dataset of 1 billion documents to support the research. The web dataset is also being made available to the larger information retrieval community to multiply the impact of the project on that discipline.
The second research project is focused on developing the Integrated Cluster Computing Architecture (INCA) for machine translation (using computers to translate from one language to another). Open-source toolkits make it easier for new research groups to tackle the problem at lower costs, broadening participation. Unfortunately, existing toolkits have not kept up with the computing infrastructure required for modern "big data" approaches to machine translations; INCA will fill this void.
Florida International University
Florida International University (FIU) researchers are leveraging cloud computing to analyze aerial images and objects to help support disaster mitigation and environmental protection. Specifically, the CluE effort at FIU relates to its TerraFly project, which is a web-service of 40 terabytes of aerial imagery, geospatial queries and local data. Students and researchers will now be able to precisely code these images in real-time.
Massachusetts Institute of Technology, University of Wisconsin-Madison and Yale University
These three universities are using the National Science Foundation CluE grants for a comparative study of approaches to cluster-based, large-scale data analysis. Both MapReduce and parallel database systems provide scalable data processing over hundreds to thousands of nodes, yet it's important for researchers to know the differences in performance and scalability of these two approaches to know which is more suitable when designing new data-intensive computing applications.
This project is investigating linguistic extensions to MapReduce abstractions for programming modern, large-scale systems, with special focus on applications that manipulate large, unstructured graphs. This will impact a broad class of scientific applications. Graphs have important utility in the social sciences (social networks), recommender systems, and business and finance (networks of transactions), among others. The specific case study targeted by the research is a comparative analysis of graph-structured biochemical networks and pathways which underlie many important problems in biology.
University of California-Irvine
In many applications, data-quality issues resulting from a variety of errors create inconsistencies in structures, representations or semantics. Simple spelling variations such as "Schwarzenegger" vs. "Schwarseneger," "Brittany Spears" vs. "Britney Spears," or "PO Box" vs. "P.O. Box" are an example of this. Dealing with these issues is becoming increasingly important as the volume of data being processed increases. This project is providing support for efficient fuzzy queries on large text repositories. Supporting fuzzy queries can ultimately help applications mitigate their data quality issues because entities with different representations can be matched.
University of California-San Diego / San Diego Supercomputer Center
Researchers at the University of California, San Diego are studying how to manage and process massive spatial data sets on large-scale compute clusters. The specific test case is analysis of high-resolution topographic data sets from airborne LiDAR surveys. LiDAR datasets are of interest to many Earth scientists and providing efficient access and analytic capabilities will have broad impact within the Earth sciences.
University of California-Santa Barbara
Many of today's data-intensive application domains, including searches on social networks like Facebook and protein matching in bioinformatics, require us to answer complex queries on highly-connected data. The UCSB Massive Graphs in Clusters (MAGIC) project is focused on developing software infrastructure that can efficiently answer queries on extremely large graph datasets. The MAGIC software will provide an easy to use interface for searching and analyzing data, and manage the processing of queries to efficiently take advantage of computing resources like large datacenters.
University of Maryland-College Park
The CluE initiative is funding another machine translation project that promises to bridge the language divide in today's multi-cultural and multi-faceted society. Systems capable of converting text from one language into another have the potential to transform how diverse individuals and organizations communicate. By coupling network analysis with cross-language information retrieval techniques, the result is a richer, multilingual contextual model that will guide a machine translation system in translating different types of text. The potential broader impact of this project is no less than knowledge dissemination across language boundaries, which will serve to enrich the lives of all the world's citizens.
A second project focuses on developing parallel algorithms for analyzing the next generation of sequencing data. Scientists can now generate the rough equivalent of an entire human genome in just a few days with one single sequencing instrument. The analysis of these data is complicated by their size - a single run of a sequencing instrument yields terabytes of information, often requiring a significant scale-up of the existing computational infrastructure needed for analysis.
University of Massachusetts-Amherst
This project focuses on how researchers at the Center for Intelligent Information Retrieval (CIIR) are using the CluE infrastructure to learn more about word relationships. These relationships are not labeled explicitly in text and are quite varied; by exploiting these relationships, this project will help lead to a more effective ranking of web-retrieval results.
University of Virginia
Imagine continuously zooming into an image from your personal photo collection. Unlike with modern image processing software, however, this zoom operation would reveal details missing from the original image. For example, zooming into someone's shirt would eventually show a high-resolution image of the threads that compose it. The research team in the Department of Computer Science at the University of Virginia plans to develop techniques for intelligently enlarging a digital image that use a database of millions of on-line images to find examples of what its components look like at a higher spatial resolution.
University of Washington
Astrophysics is addressing many fundamental questions about the nature of the universe through a series of ambitious wide-field optical and infrared imaging surveys. New methodologies for analyzing and understanding petascale data sets are required to answer these questions. This research project is focused on developing new algorithms for indexing, accessing and analyzing astronomical images. This work is expected to have a broad range of applications to other data intensive fields.
University of Washington and University of Utah
This project is building a new infrastructure for computational oceanography that uses the CluE platform to allow ad hoc, longitudinal query and visualization of massive ocean simulation results at interactive speeds. This infrastructure leverages and extends two existing systems: GridFields, a library for general and efficient manipulation of simulation results; and VisTrails, a comprehensive platform for scientific workflow, collaboration, visualization, and provenance.
IBM/Google Cloud Computing University Initiative
The following resources are available from IBM and Google to these universities to leverage for their respective projects:
Source: IBM Corp.
Contributing commentator, Andrew Jones, offers a break in the news cycle with an assessment of what the national "size matters" contest means for the U.S. and other nations...
Today at the International Supercomputing Conference in Leipzing, Germany, Jack Dongarra presented on a proposed benchmark that could carry a bit more weight than its older Linpack companion. The high performance conjugate gradient (HPCG) concept takes into account new architectures for new applications, while shedding the floating point....
Not content to let the Tianhe-2 announcement ride alone, Intel rolled out a series of announcements around its Knights Corner and Xeon Phi products--all of which are aimed at adding some options and variety for a wider base of potential users across the HPC spectrum. Today at the International Supercomputing Conference, the company's Raj....
Jun 19, 2013 |
Supercomputer architectures have evolved considerably over the last 20 years, particularly in the number of processors that are linked together. One aspect of HPC architecture that hasn't changed is the MPI programming model.
Jun 18, 2013 |
The world's largest supercomputers, like Tianhe-2, are great at traditional, compute-intensive HPC workloads, such as simulating atomic decay or modeling tornados. But data-intensive applications--such as mining big data sets for connections--is a different sort of workload, and runs best on a different sort of computer.
Jun 18, 2013 |
Researchers are finding innovative uses for Gordon, the 285 teraflop supercomputer housed at the San Diego Supercomputer Center (SDSC) that has a unique Flash-based storage system. Since going online, researchers have put the incredibly fast I/O to use on a wide variety of workloads, ranging from chemistry to political science.
Jun 17, 2013 |
The advent of low-power mobile processors and cloud delivery models is changing the economics of computing. But just as an economy car is good at different things than a full size truck, an HPC workload still has certain computing demands that neither the fastest smartphone nor the most elastic cloud cluster can fulfill.
Jun 14, 2013 |
For all the progress we've made in IT over the last 50 years, there's one area of life that has steadfastly eluded the grasp of computers: understanding human language. Now, researchers at the Texas Advanced Computing Center (TACC) are utilizing a Hadoop cluster on its Longhorn supercomputer to move the state of the art of language processing a little bit further.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?
Join our webinar to learn how IT managers can migrate to a more resilient, flexible and scalable solution that grows with the data center. Mellanox VMS is future-proof, efficient and brings significant CAPEX and OPEX savings. The VMS is available today.