This month, Cray will begin delivery of a new big data analytics cluster that combines a of its entry-level CS300 system that’s been optimized to run Intel’s Hadoop distribution. Cray says the new system will provide customers with a “turnkey” Hadoop cluster that can tackle big data problems that would be difficult to solve using commodity hardware.
The open-source Hadoop framework is bringing powerful analytic capabilities to the masses by eliminating some of the technical skills that have traditionally been required to configure and run parallel HPC workloads. By running Hadoop on clusters of cheap commodity hardware, relatively non-technical users can get big data analysis capabilities that they previously only dreamed of.
As Hadoop goes mainstream, companies like Intel and Cray are figuring out how to make a buck by improving on the stack. Intel’s contribution has been the Intel Distribution of Hadoop, which partner Cray says carries advantages over other Hadoop distros in the areas of security, real-time handling of data, and storage performance.
Putting the Intel Distribution of Hadoop on the CS300 gives customers the capability to take Hadoop applications to enterprise levels, Cray says. “More and more organizations are expanding their usage of Hadoop software beyond just basic storage and reporting,” Bill Blake, senior vice president and CTO of Cray, states in a press release. “But while they’re developing increasingly complex algorithms and becoming more dependent on getting value out of Hadoop systems, they are also pushing the limits of their architectures.”
Blake says the turnkey Hadoop solution that Cray is now selling will do best in high-value Hadoop environments. “Organizations can now focus on scaling their use of platform-independent Hadoop software, while gaining the benefits of important underlying architectural advantages from Cray and Intel,” he says.
The Cray CS300 supercomputers are available in air-cooled and liquid-cooled architectures based on industry standard Xeon and Xeon Phi processors from Intel. The systems feature a complete HPC software stack that’s compatible with most compilers, including a Linux OS, schedulers, libraries, and Cray’s own Advanced Cluster Engine (ACE) management suite. The entire solution–including the Intel Distribution of Hadoop–is integrated, optimized, validated, and supported by Cray.
Steve Conway, IDC research vice president for HPC, says the convergence of data-intensive HPC and high-end commercial analytics is forming a new big data market that the research firm calls High Performance Data Analysis. “Pairing the Cray CS300 systems with Intel’s Hadoop Distribution creates a solution with the potential to tackle Big Data problems that would frustrate most clusters,” he says in a Cray press announcement.