SC13 Research Highlight: Large Graph Processing Without the Overhead

By Dr. Ling Liu and Kisung Lee

November 16, 2013

Many real world information networks consist of millions or billions of vertices representing heterogeneous entities and billions or trillions of edges representing heterogeneous types of relationships among entities.

Image Source: Max Delbrück Center for Molecular Medicine

For example, the crawled Web graph is estimated to have more than 20 billions of pages (vertices) with 160 billions hyperlinks (edges). Facebook user community exceeds 1 billion users (vertices) with more than 140 billion friendship relationships (edges) in 2012.  The billion triple challenges from the Semantic Web community have put forward large collection of RDF datasets with hundreds of millions of vertices and billions of edges.

As the size and variety of information networks continue to grow in many science and engineering domains, graph computations often exceed the processing capacity of conventional hardware, software systems and tools for a number of reasons. First, graph data often exhibits higher data correlations through both direct and indirect edges and such high correlation tends to generate large size of intermediate results during graph computations. When the size of intermediate results exceeds the available memory, the out of memory problem is unavoidable. Second, the graph datasets are growing in volume, variety and velocity. The bigger the size of the graphs gets, the worse the performance will be for most of the graph computations. One open challenge in this space is how to effectively partition a large graph to enable efficient parallel processing of complex graph operations.

One of the papers to be presented at the ACM/IEEE SC13 conference, titled “Efficient data partitioning model for heterogeneous graphs in the Cloud”, proposes a flexible graph partitioning framework, called VB-partitioner. This work is co-authored by the doctorate student Kisung Lee and Prof. Dr. Ling Liu from the school of Computer Science at Georgia Institute of Technology. To make parallel graph computations highly efficient, an important design goal of VB-Partitioner is to devise graph partitioning techniques that can effectively minimize the inter-partition communication overhead and maximize the intra-partition computation capacity (local processing).

Concretely, the first prototype of the VB-Partitioner focuses on efficient processing of graph queries, namely finding all the subgraphs matching a given subgraph pattern. VB-Partitioner partitions a large graph in three steps.

  • First, it constructs three types of Vertex Blocks (in-VBs, out-VBs and bi-VBs) to capture the general graph processing locality.  Each vertex block has an anchor vertex.
  • Second, it constructs three types of k-hop Extended Vertex Blocks (in-EVBs, out-EVBs and bi-EVBs) to distribute vertex blocks with better query locality. Each EVB has one anchor vertex. It achieves query locality by employing controlled edge replication. The setting of k is determined by the radius of frequent query graphs in order to ensure that most frequently requested queries can be processed in parallel at all partitions with minimized inter-partition communication overhead.
  • Third, it partitions a graph by grouping its vertex blocks and EVBs to maximize parallelism in graph processing while ensuring load balance, controlled edge replication and fast grouping.

Four techniques are considered and compared in the context of grouping and placement of VBs and EVBs to partitions: random grouping, hash-based grouping, min-cut based grouping and high degree vertex-based grouping.  As an integral part of the VB-Partitioner, a data partition-aware query partitioning model is also developed to handle the cases where the radius of a query is larger than k. The experimental results reported in the paper demonstrate the effectiveness of VB-Partitioner in terms of query processing efficiency, data loading time and partition distribution balance.

Graph computations can be broadly classified into two categories, graph queries that find matching subgraphs of a given pattern and iterative graph algorithms that find clusters, orderings, paths or correlation patterns. The former targets at subgraph matching problems over large static graphs and the later targets at graph inference kernels that traverse the graph by iteratively updating the weight of vertices or edges, such as PageRank, shortest path algorithms, spanning tree algorithms, topological sort, and so forth.

Although the first generation of the VB-Partitioner is tailored primarily for efficient parallel processing of graph queries, the ongoing work on VB-Partitioner includes exploring the feasibility and effectiveness of VB-Partitioner in the context of iterative graph algorithms. For example, to minimize inter-partition communications and maximize parallelism in graph computation, it is crucial to optimize the shared memory by minimizing parallel overhead of synchronization barriers. It is equally important to optimize the distributed memory by bounding message buffer sizes, bundling messages, overlapping communication with computation to amortize the overhead of barriers.

image1
Image Source: Giot et al., “A Protein Interaction Map of Drosophila melanogaster”, Science 302, 1722-1736, 2003.”

In addition to exploring parallel computation opportunities through graph partitioning using multi-threads, multi-cores, multiple workers, one can also exploit and combine with other performance optimization techniques to scale large graph analytics. Example techniques include

  • Compression to provide compact storage and in-memory processing,
  • Data placements on disk and in memory to balance computation with storage, and to maximize sequential access and minimize random access,
  • Indexing at vertex and/or edge level to utilize sequential access and minimize unnecessary random access,
  • Caching at vertex, edge or query level to gain performance for repeated access.

Please come hear more on Tuesday, November 19, 2013 10:30AM – 11:00AM (Location: Room 205/207)

http://sc13.supercomputing.org/schedule/event_detail.php?evid=pap708

About the Authors

LingLing Liu is a Professor in the School of Computer Science at Georgia Institute of Technology. She directs the research programs in Distributed Data Intensive Systems Lab (DiSL), examining various aspects of large scale data intensive systems. Prof. Ling Liu is an internationally recognized expert in the areas of Cloud computing, Distributed Computing, Big Data technologies, Database systems and Service oriented computing. Prof. Liu is a recipient of IEEE Computer Society Technical Achievement Award in 2012. Currently Prof. Liu is the editor in chief of IEEE Transactions on Service Computing, and serves on the editorial board of half dozen international journals, including Journal of Parallel and Distributed Computing (JPDC), ACM Transactions on Internet Technology (TOIT), ACM Transactions on Web (TWEB). Dr. Liu’s current research is primarily sponsored by NSF, IBM, and Intel.

 

luiKisung Lee is a Ph.D student in the School of Computer Science at Georgia Tech. He received his BS and MS degree in computer science from KAIST in 2005 and 2007 respectively. He had worked for ETRI as a researcher from 2007 to 2010. He is conducting research in distributed and parallel processing of big data in the Cloud, mobile computing and social network analysis.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

University of Chicago Researchers Generate First Computational Model of Entire SARS-CoV-2 Virus

January 15, 2021

Over the course of the last year, many detailed computational models of SARS-CoV-2 have been produced with the help of supercomputers, but those models have largely focused on critical elements of the virus, such as its Read more…

By Oliver Peckham

Pat Gelsinger Returns to Intel as CEO

January 14, 2021

The Intel board of directors has appointed a new CEO. Intel alum Pat Gelsinger is leaving his post as CEO of VMware to rejoin the company that he parted ways with 11 years ago. Gelsinger will succeed Bob Swan, who will remain CEO until Feb. 15. Gelsinger previously spent 30 years... Read more…

By Tiffany Trader

Roar Supercomputer to Support Naval Aircraft Research

January 14, 2021

One might not think “aircraft” when picturing the U.S. Navy, but the military branch actually has thousands of aircraft currently in service – and now, supercomputing will help future naval aircraft operate faster, Read more…

By Staff report

DOE and NOAA Extend Computing Partnership, Plan for New Supercomputer

January 14, 2021

The National Climate-Computing Research Center (NCRC), hosted by Oak Ridge National Laboratory (ORNL), has been supporting the climate research of the National Oceanic and Atmospheric Administration (NOAA) for the last 1 Read more…

By Oliver Peckham

Using Micro-Combs, Researchers Demonstrate World’s Fastest Optical Neuromorphic Processor for AI

January 13, 2021

Neuromorphic computing, which uses chips that mimic the behavior of the human brain using virtual “neurons,” is growing in popularity thanks to high-profile efforts from Intel and others. Now, a team of researchers l Read more…

By Oliver Peckham

AWS Solution Channel

Now Available – Amazon EC2 C6gn Instances with 100 Gbps Networking

Amazon EC2 C6gn instances powered by AWS Graviton2 processors are now available!

Compared to C6g instances, this new instance type provides 4x higher network bandwidth, 4x higher packet processing performance, and 2x higher EBS bandwidth. Read more…

Intel® HPC + AI Pavilion

Intel Keynote Address

Intel is the foundation of HPC – from the workstation to the cloud to the backbone of the Top500. At SC20, Intel’s Trish Damkroger, VP and GM of high performance computing, addresses the audience to show how Intel and its partners are building the future of HPC today, through hardware and software technologies that accelerate the broad deployment of advanced HPC systems. Read more…

Honing In on AI, US Launches National Artificial Intelligence Initiative Office

January 13, 2021

To drive American leadership in the field of AI into the future, the National Artificial Intelligence Initiative Office has been launched by the White House Office of Science and Technology Policy (OSTP). The new agen Read more…

By Todd R. Weiss

Pat Gelsinger Returns to Intel as CEO

January 14, 2021

The Intel board of directors has appointed a new CEO. Intel alum Pat Gelsinger is leaving his post as CEO of VMware to rejoin the company that he parted ways with 11 years ago. Gelsinger will succeed Bob Swan, who will remain CEO until Feb. 15. Gelsinger previously spent 30 years... Read more…

By Tiffany Trader

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

By John Russell

Intel ‘Ice Lake’ Server Chips in Production, Set for Volume Ramp This Quarter

January 12, 2021

Intel Corp. used this week’s virtual CES 2021 event to reassert its dominance of the datacenter with the formal roll out of its next-generation server chip, the 10nm Xeon Scalable processor that targets AI and HPC workloads. The third-generation “Ice Lake” family... Read more…

By George Leopold

Researchers Say It Won’t Be Possible to Control Superintelligent AI

January 11, 2021

Worries about out-of-control AI aren’t new. Many prominent figures have suggested caution when unleashing AI. One quote that keeps cropping up is (roughly) th Read more…

By John Russell

AMD Files Patent on New GPU Chiplet Approach

January 5, 2021

Advanced Micro Devices is accelerating the GPU chiplet race with the release of a U.S. patent application for a device that incorporates high-bandwidth intercon Read more…

By George Leopold

Programming the Soon-to-Be World’s Fastest Supercomputer, Frontier

January 5, 2021

What’s it like designing an app for the world’s fastest supercomputer, set to come online in the United States in 2021? The University of Delaware’s Sunita Chandrasekaran is leading an elite international team in just that task. Chandrasekaran, assistant professor of computer and information sciences, recently was named... Read more…

By Tracey Bryant

Intel Touts Optane Performance, Teases Next-gen “Crow Pass”

January 5, 2021

Competition to leverage new memory and storage hardware with new or improved software to create better storage/memory schemes has steadily gathered steam during Read more…

By John Russell

Farewell 2020: Bleak, Yes. But a Lot of Good Happened Too

December 30, 2020

Here on the cusp of the new year, the catchphrase ‘2020 hindsight’ has a distinctly different feel. Good riddance, yes. But also proof of science’s power Read more…

By John Russell

Esperanto Unveils ML Chip with Nearly 1,100 RISC-V Cores

December 8, 2020

At the RISC-V Summit today, Art Swift, CEO of Esperanto Technologies, announced a new, RISC-V based chip aimed at machine learning and containing nearly 1,100 low-power cores based on the open-source RISC-V architecture. Esperanto Technologies, headquartered in... Read more…

By Oliver Peckham

Azure Scaled to Record 86,400 Cores for Molecular Dynamics

November 20, 2020

A new record for HPC scaling on the public cloud has been achieved on Microsoft Azure. Led by Dr. Jer-Ming Chia, the cloud provider partnered with the Beckman I Read more…

By Oliver Peckham

NICS Unleashes ‘Kraken’ Supercomputer

April 4, 2008

A Cray XT4 supercomputer, dubbed Kraken, is scheduled to come online in mid-summer at the National Institute for Computational Sciences (NICS). The soon-to-be petascale system, and the resulting NICS organization, are the result of an NSF Track II award of $65 million to the University of Tennessee and its partners to provide next-generation supercomputing for the nation's science community. Read more…

Is the Nvidia A100 GPU Performance Worth a Hardware Upgrade?

October 16, 2020

Over the last decade, accelerators have seen an increasing rate of adoption in high-performance computing (HPC) platforms, and in the June 2020 Top500 list, eig Read more…

By Hartwig Anzt, Ahmad Abdelfattah and Jack Dongarra

Aurora’s Troubles Move Frontier into Pole Exascale Position

October 1, 2020

Intel’s 7nm node delay has raised questions about the status of the Aurora supercomputer that was scheduled to be stood up at Argonne National Laboratory next year. Aurora was in the running to be the United States’ first exascale supercomputer although it was on a contemporaneous timeline with... Read more…

By Tiffany Trader

Google Hires Longtime Intel Exec Bill Magro to Lead HPC Strategy

September 18, 2020

In a sign of the times, another prominent HPCer has made a move to a hyperscaler. Longtime Intel executive Bill Magro joined Google as chief technologist for hi Read more…

By Tiffany Trader

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

By Doug Black

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

By John Russell

Leading Solution Providers

Contributors

Programming the Soon-to-Be World’s Fastest Supercomputer, Frontier

January 5, 2021

What’s it like designing an app for the world’s fastest supercomputer, set to come online in the United States in 2021? The University of Delaware’s Sunita Chandrasekaran is leading an elite international team in just that task. Chandrasekaran, assistant professor of computer and information sciences, recently was named... Read more…

By Tracey Bryant

Top500: Fugaku Keeps Crown, Nvidia’s Selene Climbs to #5

November 16, 2020

With the publication of the 56th Top500 list today from SC20's virtual proceedings, Japan's Fugaku supercomputer – now fully deployed – notches another win, Read more…

By Tiffany Trader

European Commission Declares €8 Billion Investment in Supercomputing

September 18, 2020

Just under two years ago, the European Commission formalized the EuroHPC Joint Undertaking (JU): a concerted HPC effort (comprising 32 participating states at c Read more…

By Oliver Peckham

Texas A&M Announces Flagship ‘Grace’ Supercomputer

November 9, 2020

Texas A&M University has announced its next flagship system: Grace. The new supercomputer, named for legendary programming pioneer Grace Hopper, is replacing the Ada system (itself named for mathematician Ada Lovelace) as the primary workhorse for Texas A&M’s High Performance Research Computing (HPRC). Read more…

By Oliver Peckham

At Oak Ridge, ‘End of Life’ Sometimes Isn’t

October 31, 2020

Sometimes, the old dog actually does go live on a farm. HPC systems are often cursed with short lifespans, as they are continually supplanted by the latest and Read more…

By Oliver Peckham

Nvidia and EuroHPC Team for Four Supercomputers, Including Massive ‘Leonardo’ System

October 15, 2020

The EuroHPC Joint Undertaking (JU) serves as Europe’s concerted supercomputing play, currently comprising 32 member states and billions of euros in funding. I Read more…

By Oliver Peckham

Gordon Bell Special Prize Goes to Massive SARS-CoV-2 Simulations

November 19, 2020

2020 has proven a harrowing year – but it has produced remarkable heroes. To that end, this year, the Association for Computing Machinery (ACM) introduced the Read more…

By Oliver Peckham

Nvidia-Arm Deal a Boon for RISC-V?

October 26, 2020

The $40 billion blockbuster acquisition deal that will bring chipmaker Arm into the Nvidia corporate family could provide a boost for the competing RISC-V architecture. As regulators in the U.S., China and the European Union begin scrutinizing the impact of the blockbuster deal on semiconductor industry competition and innovation, the deal has at the very least... Read more…

By George Leopold

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This