The democratization of high performance computing (HPC) and the converged datacenter have been topics of late in the IT community. This is where HPC, high performance data analytics (big data/Hadoop workloads), and enterprise office applications all run on a common clustered compute architecture with a single file system and network. And, while the IT community is talking about convergence, Banca d’Italia (Bank of Italy) has already moved in that direction with new compute and storage clusters based on Intel Xeon processors and Intel Enterprise Edition for Lustre software.
Bank of Italy is the national central bank, like the U.S. Federal Reserve. In the Bank, the Directorate General for Economics, Statistics and Research (DG-ESR) had developed together with the Directorate General for Information Technology a 12-node cluster to run its scientific computing applications—SAS, Stata, Modeleasy, Matlab and others—for economic analysis and simulations. The results are made available to the employees of the DG-ESR, who could then use them in various reports, presentations, and recommendations to the Bank’s managers, the nation’s financial institutions, and other counterparties interested in the data.
Converging HPC and Enterprise Office
“We needed to replace our old computing system used essentially for our scientific calculations with a new one designed to support all the users’ needs,” said Giuseppe Bruno, in the DG-ESR. “We had more users to support in my Directorate General, from about 400 to 600 people. We needed more performance for our applications because of the growing data. But, we also wanted to increase our file sharing capability for Directorate-wide collaboration; so we needed more real memory and storage capacity. Our SAN couldn’t be upgraded beyond its 20 terabytes, and we were looking at requiring as much as 100 terabytes of capacity for all our users.”
“This is an example of a truly converged architecture, combining traditional enterprise HPC and extremely fast file sharing across the whole Directorate,” added Gabriele Paciucci, solutions architect in Intel’s HPC Platform Group. “The Lustre storage system accommodates the very large data sets sometimes needed for simulations, but it is mainly a repository for lots of small files, whose size is 4K bytes on average.”
A gateway to a Samba server links the Linux-based Lustre file system to the Windows network, so users can share about 20 million files in number so far, according to Mr. Bruno.
With millions of shared files and hundreds of users, the network and file system must perform fast enough to serve all the users from a single repository without hampering productivity.
“We confirmed our choice of Lustre for its performance,” commented Mr. Bruno. Bank of Italy uses Lustre version 2.5, which can serve up tens of thousands of requests per second.
“Lustre originally wasn’t designed for a large number of small files. But the most recent editions are designed for enterprise data environments exactly like the Bank of Italy’s,” commented Mr. Paciucci. “The system is currently performing quite well,” added Mr. Bruno. “I would say that our users are very satisfied with it.”
Staying Open for Business
Performance is not the only criteria for the DG-ESR. “In our central bank, data protection and reliability are paramount, “said Mr. Bruno. “We cannot risk losing data.”
“Bank of Italy is one of the very unique and early adopters of Lustre in circumstances where enterprise storage reliability and data safety are as important as performance,” stated Mr. Paciucci. “Historically, Lustre deployments were for scratch file systems in HPC environments. Persistence was not a concern.” Bank of Italy relies on a well-known and widely trusted reliability design pattern for duplicating their file system across two geographically different sites. “Aside from programmed maintenance, our data are available 24 hours a day, 7 days a week,” added Mr. Bruno. “We satisfy both performance and reliability requirements.”
All users within the DG-ESR can now easily access data for their computations, moreover they can share their information and the documents they create from their Windows applications. The DG-ESR reduced the use of personal storage for sharing and distributing files among the users and made data much more easily accessible and protected.
But, their journey is not over.
Converging Big Data
Bank of Italy extracts data from IBM DB2 and Oracle databases. It uses R, Stata, and Matlab to do the extraction, which can take a half-hour or more before they can run their processing. According to Mr. Paciucci, by adopting Intel Enterprise Edition for Lustre software, it makes it easy for Bank of Italy to engage big data analytics with Hadoop on their existing system.
“What we are searching is the possibility to port the queries onto Hadoop and copy the data to Lustre and manage it there. That means the users can run their analyses on their HPC cluster using Lustre’s HPC Adaptor for MapReduce and pull the data from Lustre using the Hadoop Adaptor for Lustre. Such a configuration eliminates Hadoop’s Shuffle phase, which saves time.”
“Hadoop might have some promise for us,” commented Mr. Bruno. “This is something we are exploring to take advantage of big data in a high-performance manner, using Lustre and HPC together.”
The feature image (Banca d’Italia) was used under the Creative Commons license (attribution: Dawid Skalec).