Recently, the historic battle of “Google AlphaGo vs Lee Se-dol” put Deep Learning under the spotlight. Caffe is a Deep Learning Framework, nowadays one of the fastest Convolutional Neural Networks (CNN) architectures. The high-performance MPI cluster version of Caffe was released by Inspur. The company also gave out the open source code to provide Deep Learning users a more convenient and efficient means of application.
The original Caffe framework within a computing node plus a GPU was developed by UC Berkeley for the purpose of CNN training. Traditionally, CNN methods are utilized to import specific data pool for layer-by-layer training. Using this method, the machine can acquire the specific ability that is necessary. However, the imported data generally have massive volume, which requires dozens of days in order to complete the training. Caffe can significantly improve this problem. However, as the training models grow progressively complicated and the training sample sizes increase, the one node with one GPU may fail to meet user needs.
Inspur’s cluster version of the Caffe framework is designed to satisfy the urgent demand of Deep Learning. It adopts the MPI technology, a mature technology in the high-performance computing. The MPI enables the cluster version to optimize the data parallel of Caffe, the computing framework of which is based on Berkeley’s Caffe architecture. Integrated with these new capabilities, the cluster version retains all the features of the original Caffe architecture, namely the pure C++/CUDA architecture, support of the command line, Python and MATLAB interfaces, and various programming methods. As a result, the cluster version of the Caffe framework is user-friendly, fast, modularized and open, and gives users the optimal application experience.
The solution utilizes the CPU+GPU architecture.,and network communication through Infiniband.The software completely adopts the high-level programming pattern MPI for the development from single node to cluster. Currently, Inspur uses the googlenet network structure to conduct the training on the parallel version of Caffe, which enables the cluster performance to be 12.5 times faster than that of single node. The use of cuDNN will further contribute to an approximately 20% increase in performance improvement.
As a HPC system vendor with heterogeneous HPC application abilities, namely GPU, MIC, and FPGA, Inspur’s Deep Learning solutions are used by many internet operators including Tencent, Baidu, Alibaba, Qihoo, iFLYTEK, and JD.