Aspen
Oakridge Top Right
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud
Green Computing Report

Tabor Communications
Corporate Video

PSC's Sherlock to Solve Big Data Mysteries


PITTSBURGH, PA and PLEASANTON, CA, Nov. 7 — The Pittsburgh Supercomputing Center (PSC) and YarcData, a Cray company, today announced the deployment of “Sherlock,” a uRiKA graph-analytics appliance from YarcData for efficiently discovering unknown relationships or patterns “hidden” in extremely large and complex bodies of information. Funded through the Strategic Technologies for Cyberinfrastructure (STCI) program of the National Science Foundation, Sherlock features innovative hardware and software, as well as PSC-specific enhancements, designed to extend the range of applicability to scales not otherwise feasible.

These techniques have been long used by the government and are coming into wider commercial use. Sherlock will focus on extending the domain of applicability of these techniques to a wide range of scientific research projects.

“Sherlock,” says Nick Nystrom, PSC director of strategic applications, “provides a unique capability for discovering new patterns and relationships in data. It will help to discover how genes work, probe the dynamics of social networks, and detect the sources of breaches in Internet security.” Those diverse challenges, along with many others, he adds, have two important features in common: Their data are naturally expressed as interconnected webs of information called graphs, and data sizes for problems of real-world interest become extremely large.

“Until now, graph analytics has largely been impractical for big data,” says Nystrom. This is because, he explains, processing of graph structures requires irregular and unpredictable access to data. On ordinary computers and clusters, nearly all the time is spent waiting for that data to move from memory to processors. Even more challenging, graphs of interest typically cannot be partitioned; their high connectivity prevents dividing them into subgraphs that can be mapped independently onto distributed-memory computers. These factors have precluded large-scale graph analytics, especially for the interactive response times that analysts need to explore data. “YarcData’s uRiKA, ” says Nystrom, “overcomes that barrier through groundbreaking innovations in computer hardware and software.”

Sherlock enables large-scale, rapid graph analytics through massive multithreading, a shared address space, sophisticated memory optimizations, a productive user environment, and support for heterogeneous applications – all packaged as an enterprise-ready appliance. “Sherlock provides researchers with a uniquely powerful tool for doing complex analytics on big data, expanding the capability to address problems of societal importance.” says Nystrom.

“Many current approaches to big data have been about ‘search’ – the ability to efficiently find something that you know is there in your data,” said Arvind Parthasarathi, President of YarcData. “uRiKA was purposely built to solve the problem of ‘discovery’ in big data – to discover things, relationships or patterns that you don’t know exist. By giving organizations the ability to do much faster hypothesis validation at scale and in real time, we are enabling the solution of business problems that were previously difficult or impossible – whether it be discovering the ideal patient treatment, investigating fraud, detecting threats, finding new trading algorithms or identifying counter-party risk. Basically, we are systematizing serendipity.”

The project complements ongoing leadership in data-intensive computing at Carnegie Mellon University (CMU). Randal E. Bryant, Dean of the School of Computer Science at CMU, notes, “We’re very pleased that the PSC will have this new capability for analyzing large-scale, unstructured graphs. Such data structures pervade many of the big data applications being investigated by researchers in such diverse areas as biology (e.g., the connectivity between molecules in a protein), networks (e.g., the structure of the world-wide web), and artificial intelligence (e.g., the relationships between different concepts.) The uRiKA system will enable scientists to deal with far more complex graphs than would otherwise be possible.”

YarcData’s uRiKA is a Big Data appliance for graph analytics that enables enterprises to discover unknown relationships in Big Data. uRiKA is a highly-scalable, real-time platform that supports ad hoc queries, pattern based searches, inferencing and deduction. uRiKA is a purpose-built appliance for graph analytics featuring graph-optimized hardware that provides up to 512 terabytes of global shared memory, massively-multithreaded graph processors supporting 128 threads/processor, and an RDF/SPARQL database optimized for the underlying hardware enabling applications to interact with the appliance using industry standard interfaces. Singularly focused on graph analytics, uRiKA augments existing analytical environments by delivering new high-value discoveries and insights that drive competitive advantage.

PSC customized Sherlock with additional nodes having standard x86 processors to add valuable support for heterogeneous applications that use YarcData’s Threadstorm nodes as graph accelerators. This heterogeneous capability will enable an even broader class of applications, such as genomics, astrophysics, and structural analyses of complex networks. Sherlock runs an enhanced suite of familiar semantic web software for easy access to powerful analytic functionality, together with common programming languages. PSC’s Data Supercell provides complementary, high-performance access to large datasets for ongoing, collaborative analysis.

Prototype projects, led by researchers from across the country, will use Sherlock for research including understanding the natural language of the Web, learning about human social networks involving different types of online and telephone interactions, cluster finding in astrophysics, and genome sequence assembly. For example, Bin Zhang, of the Fox School of Business at Temple University, notes the potential for Sherlock to expand his research into clustering in social networks, “With the help of Sherlock, I can finally observe the true size of social groups in real-world networks of millions to even a billion people. Researchers believe that social group size is larger for online social networks than for traditional groups, but so far it has been impossible to extract groups from large networks and visualize their structures. Sherlock can finally enable us to observe the structure of large social groups and even the whole network.”

About PSC
The Pittsburgh Supercomputing Center is a joint effort of Carnegie Mellon University and the University of Pittsburgh together with Westinghouse Electric Company. Established in 1986, PSC is supported by several federal agencies, the Commonwealth of Pennsylvania and private industry, and is a partner in the National Science Foundation XSEDE program.

About YarcData
YarcData, a Cray company, delivers business-focused real-time graph analytics for enterprises to gain business insight by discovering unknown relationships in Big Data. Early adopters include the Canadian government, Institute for Systems Biology, Mayo Clinic, Noblis, Sandia National Laboratories, and the United States government.

About Cray Inc.
As a global leader in supercomputing, Cray provides highly advanced supercomputers and world-class services and support to government, industry and academia. Cray technology is designed to enable scientists and engineers to achieve remarkable breakthroughs by accelerating performance, improving efficiency and extending the capabilities of their most demanding applications. Cray’s Adaptive Supercomputing vision is focused on delivering innovative next-generation products that integrate diverse processing technologies into a unified architecture, allowing customers to surpass today’s limitations and meeting the market’s continued demand for realized performance.

-----

Source: The Pittsburgh Supercomputing Center

Sponsored Links

Accelerate your science with Seneca
One of the first HPC providers installing a 4X NVIDIA Kepler K-20 cluster. Invites you to a free evaluation on Seneca’s NVIDIA K20 Kepler cluster, pre-loaded with AMBER, NAMD, LAMMPS

Webinar: Programming Heterogeneous X64+GPU Systems Using OpenACC
Join Michael Wolfe as he compares the advantages and costs of using both low-level models and the directive-based OpenACC model for programming accelerated heterogeneous systems. Registration is free.

High-Performance Computing in Action
Businesses that want to be on the cutting edge of their industries are increasingly turning to high-performance computing (HPC) solutions to handle complex compute processes and speed up their rate of innovation. Download this Executive Brief to see how businesses in energy, life sciences and entertainment put HPC solutions to work in their operations.

May 22, 2013

May 21, 2013

May 20, 2013

May 17, 2013

May 16, 2013

May 15, 2013

May 14, 2013

May 13, 2013

May 10, 2013

May 09, 2013


Most Read Features

Most Read Around the Web

Most Read This Just In


Feature Articles

NSF Forges Further Beyond FLOPs

In a recent solicitation, the NSF laid out needs for furthering its scientific and engineering infrastructure with new tools to go beyond top performance, Having already delivered systems like Stampede and Blue Waters, they're turning an eye to solving data-intensive challenges. We spoke with the agency's Irene Qualters and Barry Schneider about..
Read more...

CERN, Google Drive Future of Global Science Initiatives

Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
Read more...

Saddling Phi for TACC’s Stampede

The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
Read more...

Short Takes

Building Supercomputers with Raspberries

May 22, 2013 | At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
Read more...

Running Computational Fluid Dynamics in the Cloud

May 16, 2013 | When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...

Computing the Physics of Bubbles

May 15, 2013 | Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...

Internet2 Awards Program Seeks Innovative Applications

May 10, 2013 | Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
Read more...

Floating Funding to Exascale Island

May 09, 2013 | The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
Read more...

Sponsored Whitepapers

Best Practices in Big Data Storage

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Progress in Parallel: the Bull Parallel Programming Center

04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.

Sponsored Multimedia

SGI DMF ZeroWatt Disk Solution

In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

SC12 Editorial Feature HPCwire Soundbite sponsored by ISC

HPC Job Bank


Featured Events


  • June 16, 2013 - June 20, 2013
    ISC'13
    Leipzig,
    Germany

  • June 17, 2013 - June 18, 2013
    Forecast 2013
    San Francisco, CA
    United States





HPCwire Events