Recent tests performed at Clemson University achieved a 25 percent improvement in Apache Hadoop Terasort run times by replacing Hadoop Distributed File System (HDFS) with an OrangeFS configuration using dedicated servers. Key components included extension of the MapReduce “FileSystem” class and a Java Native Interface (JNI) shim to the OrangeFS client. No modifications of Hadoop were required, and existing MapReduce jobs require no modification to utilize OrangeFS.
If you thought Lustre and GPFS were your only two choices for a high performance, scalable parallel file system, then you’ve probably never heard of OrangeFS. We talked with three of the file system’s developers and backers to discuss the unique attributes of OrangeFS and how it’s being used in the field.
Clemson University is finding ways to maximize its high performance computing resources.