Parallel File System OrangeFS Starts to Build a Following

By Nicole Hemsoth

November 18, 2011

If you thought Lustre and GPFS were your only two choices for a high performance, scalable parallel file system, then you’ve probably never heard of OrangeFS. Described as a branch of the open source Parallel Virtual File System (PVFS), OrangeFS has been taken under the wing of Omnibond LLC, which is now providing commercial support for the software.

At SC11, there was a BoF session that discussed recent developments in OrangeFS and its future direction. We caught up with two of the session leaders, Walt Ligon, founding PVFS/OrangeFS Architect and Associate Professor of Electrical and Computer Engineering, and Boyd Wilson executive director at Omnibond, as well as Jim Bottum, CIO and vice provost for Computing & Information Technology at Clemson University, to talk about the file system’s unique attributes and some of its real-world use cases.

HPCwire: What is OrangeFS and what problem is it trying to solve that is not being addressed by other parallel file systems like Lustre and GPFS?

Walt Ligon: OrangeFS is a next-generation parallel file system based on PVFS for compute and storage clusters of the future. Its original charter — to complement high-performance computing for cutting-edge research in academic and government initiatives — is fast expanding into a versatile array of real-world applications.

The big benefit of OrangeFS over many similar parallel file systems comes down to two issues. First, it is one of the best performing parallel file systems available. It is based on the PVFS architecture, which is powerful and modular. This has allowed the design to evolve to incorporate distributed directories, optimized requests, a wide variety of interfaces and features. It is well designed.

Second, it is an extremely easy file system to build, install, get and keep running. This is hard to quantify, so we encourage anyone to download the tarball and try it. As another point of reference, PVFS has been used in dozens of educational, experimental, and research settings and formed the basis of many graduate theses. It is a very usable file system.

PVFS went through two generations as an experimental-turned-production file system. OrangeFS has been hardened through several years of development, testing, and support by a professional development team. Now it is being deployed for a range of applications with commercial support, though it is still open source.

A detailed list of features that are unique to OrangeFS is:

  • Unique object-based file data transfer, allowing clients to work on objects without the need to handle underlying storage details, such as data blocks
  • Unified data/metadata servers
  • Distribution of metadata across storage servers
  • Distribution of directory entries
  • Diverse Client Access Methods: Posix, MPI, Linux VFS, FUSE, Windows, WebDAV, S3, and REST interfaces.
  • Ability to configure storage parameters by directory or file, including stripe size, number of servers, replication, security
  • Virtualized storage over any Linux file system as underlying local storage on each connected server

HPCwire: What is the precise relationship between PVFS and OrangeFS?

Ligon: OrangeFS is the next evolution of PVFS, adding commercial grade services in addition to new features and future development. For many years PVFS development focused primarily on a few large scientific workloads. At the same time members of the community used PVFS as a research tool to experiment with different aspects of parallel file system design and implementation. OrangeFS is broadening that scope to include production quality service for a wider range of data intensive application areas. This has led to re-evaluating a number of assumptions that were valid for PVFS but may, or may not, be appropriate for these other workloads. Thus a new generation of development is under way to address these new scenarios.

Boyd Wilson: PVFS was supported by a small group of exceptional developers that were closely associated with the scientific applications that it was intended for. OrangeFS, in contrast is looking to attract a wide range of users, thus Omnibond has stepped up to provide commercial grade support and development practices. Just the same, OrangeFS is still 100% open-source, there are no commercial versions of the code, we intend to support the PVFS community as we always have. We continue to support the PVFS mailing lists and interact with people using OrangeFS as a development platform. The benefit to those customers who do pay for support is priority access to a professional support staff with experience and resources for supporting the software as well as access to the developers in order to guide improvements and new features. Omnibond sees this latter point as a major opportunity to partner with its customers in developing vertical product lines using OrangeFS as a base. No other parallel file system is offered with this philosophy.

HPCwire: How does OrangeFS differ from other parallel file systems? What do you see as its main advantages?

Wilson: OrangeFS was designed with a unified server that supports both distributed metadata and distributed file data. The PVFS architecture is modular, making it easy to develop and support new networks and new storage devices, to implement new requests to optimize specific operations, and to add new features. OrangeFS is 100 percent open source, it was developed by a diverse community of government, academic, and industry. There are no commercial or “pro” versions, every new development is returned to the community. The community is still encouraged to participate in the development.

Configurable features at the system, directory and file levels including striping parameters, distribution methods, replication support, security, etc. The PVFS protocol provides a rich set of operations that support distributed operations and is easily extendable. OrangeFS provides diverse client access methods including MPI-IO, Posix-like methods, Linux VFS, FUSE and Windows support, coming soon will be WebDAV, S3, REST.

OrangeFS supports standard out-of-the-box Linux kernels. Server and client code are implemented at the user level. The Linux kernel module used for VFS support is very simple and does not require kernel patches. OrangeFS is very easy to build, install, and begin operating, and very easy to keep operating.

HPCwire: What types of users would be most interested in the technology? Are there use cases out there in the wild?

Jim Bottum: HPC users on all size parallel systems can make use of OrangeFS. PVFS was initially rolled out to the very high-end computing community and generally very large I/O. Clemson University adopted PVFS, which was initially developed by Clemson faculty and students, as it was beginning to roll out HPC campus wide in 2007.

As the Clemson staff tuned PVFS for its user community both on campus and around the state, it was tuned work equally as well on smaller I/O workloads. Users with rendering and video server farms, would be ideal, as would financial and other data analytics firms. We have been working with users in the oil and gas industry and a broad range of science and engineering.

We have a large corporate client that uses OrangeFS extensively for data mining. They have over 700 distinct OrangeFS file systems they are operating. Here at Clemson we have a diverse range of users including bioinformatics, digital production, astrophysics, several humanities areas and cloud computing.

Other Universities and Research labs are migrating to OrangeFS from PVFS2 and have commented on how it has solved several of their problems in the past. The PVFS2 users list and community has over 260 members and file system research around the globe is accomplished with PVFS2, now OrangeFS as its base.

HPCwire: What types of support and services does Omnibond are offer for OrangeFS and what’s the pricing model?

Wilson: Commercial grade services provide what customers would expect from commercial software but better. When you pay for a typical software license you get access to the software, and support may be extra. With OrangeFS and Omnibond you have the software; when you buy a subscription you get access to support but it also pays for future development. With your subscription you have a say in what features you would like worked on in the future.

For commercial customers a 5-storage-server bundle starts at $25,000 and as quantities increase the price per storage server decreases. We also offer custom pricing for cloud customers who need more scalable options

HPCwire: What does the roadmap for OrangeFS look like for the next couple of years?

Wilson: We have just released several new client interfaces and a new distributed directories implementation. Distributed directories allows directory entries for a single directory to be spread across multiple servers so that multiple client processes can be accessing a very large directory in parallel.

For the last few years we have been developing a new access control implementation based on signed capabilities that will improve the security of OrangeFS significantly with capability based security, which will be the basis of future federated file system access. We hope to be releasing this sometime in the coming months. We have a new Posix-like user interface, and it will include a user-level configurable data cache in the works that should be released soon as well.

Much of our development right now is focused on redundancy, particularly redundant metadata. Today users rely on RAID systems at each server to manage disk failure. In future systems we plan to allow the servers to automatically replicate data and metadata across multiple servers. As part of this we are moving to a more flexible architecture for managing servers in a distributed environment, including not only replication but tiered migration and a much more dynamic object model.

Similar efforts are under way within many research groups; OrangeFS will hopefully contribute to and benefit from these efforts. Finally, on the long-range radar, there is a project under way including LSU, Indiana, and Clemson, to develop a new object-oriented IO model called PXFS, using OrangeFS as a platform and targeting Exascale systems.

HPCwire: If someone wanted to give OrangeFS a try, how would they go about it?

Ligon: They can download a tarball from the website or download our latest changes from our CVS repository. Instructions on how to install the system are found in the documentation tab of our site.

OrangeFS builds using autoconf, make, and gcc from GNU. Most of the code will build and run on any UNIX-based system, except the VFS module, which is Linux-specific. An experimental FUSE module is included. The main dependencies are BerkeleyDB and the proper kernel headers (if the VFS module is to be built). Some operating systems use an old version of BerkeleyDB. In that case, make sure you install and build a newer OrangeFS version — version 4.8.30 or higher.

OrangeFS can be built for a regular user in virtually any location and tested on one or more machines. Access to the “root” account is required to install and start the VFS module. The file system can be operated without the VFS module, but most users will want to install it.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Data Vortex Users Contemplate the Future of Supercomputing

October 19, 2017

Last month (Sept. 11-12), HPC networking company Data Vortex held its inaugural users group at Pacific Northwest National Laboratory (PNNL) bringing together about 30 participants from industry, government and academia t Read more…

By Tiffany Trader

AI Self-Training Goes Forward at Google DeepMind

October 19, 2017

DeepMind, Google’s AI research organization, announced today in a blog that AlphaGo Zero, the latest evolution of AlphaGo (the first computer program to defeat a Go world champion) trained itself within three days to play Go at a superhuman level (i.e., better than any human) – and to beat the old version of AlphaGo – without leveraging human expertise, data or training. Read more…

By Doug Black

Researchers Scale COSMO Climate Code to 4888 GPUs on Piz Daint

October 17, 2017

Effective global climate simulation, sorely needed to anticipate and cope with global warming, has long been computationally challenging. Two of the major obstacles are the needed resolution and prolonged time to compute Read more…

By John Russell

HPE Extreme Performance Solutions

Transforming Genomic Analytics with HPC-Accelerated Insights

Advancements in the field of genomics are revolutionizing our understanding of human biology, rapidly accelerating the discovery and treatment of genetic diseases, and dramatically improving human health. Read more…

Student Cluster Competition Coverage New Home

October 16, 2017

Hello computer sports fans! This is the first of many (many!) articles covering the world-wide phenomenon of Student Cluster Competitions. Finally, the Student Cluster Competition coverage has come to its natural home: H Read more…

By Dan Olds

Data Vortex Users Contemplate the Future of Supercomputing

October 19, 2017

Last month (Sept. 11-12), HPC networking company Data Vortex held its inaugural users group at Pacific Northwest National Laboratory (PNNL) bringing together ab Read more…

By Tiffany Trader

AI Self-Training Goes Forward at Google DeepMind

October 19, 2017

DeepMind, Google’s AI research organization, announced today in a blog that AlphaGo Zero, the latest evolution of AlphaGo (the first computer program to defeat a Go world champion) trained itself within three days to play Go at a superhuman level (i.e., better than any human) – and to beat the old version of AlphaGo – without leveraging human expertise, data or training. Read more…

By Doug Black

Student Cluster Competition Coverage New Home

October 16, 2017

Hello computer sports fans! This is the first of many (many!) articles covering the world-wide phenomenon of Student Cluster Competitions. Finally, the Student Read more…

By Dan Olds

Intel Delivers 17-Qubit Quantum Chip to European Research Partner

October 10, 2017

On Tuesday, Intel delivered a 17-qubit superconducting test chip to research partner QuTech, the quantum research institute of Delft University of Technology (TU Delft) in the Netherlands. The announcement marks a major milestone in the 10-year, $50-million collaborative relationship with TU Delft and TNO, the Dutch Organization for Applied Research, to accelerate advancements in quantum computing. Read more…

By Tiffany Trader

Fujitsu Tapped to Build 37-Petaflops ABCI System for AIST

October 10, 2017

Fujitsu announced today it will build the long-planned AI Bridging Cloud Infrastructure (ABCI) which is set to become the fastest supercomputer system in Japan Read more…

By John Russell

HPC Chips – A Veritable Smorgasbord?

October 10, 2017

For the first time since AMD's ill-fated launch of Bulldozer the answer to the question, 'Which CPU will be in my next HPC system?' doesn't have to be 'Whichever variety of Intel Xeon E5 they are selling when we procure'. Read more…

By Dairsie Latimer

Delays, Smoke, Records & Markets – A Candid Conversation with Cray CEO Peter Ungaro

October 5, 2017

Earlier this month, Tom Tabor, publisher of HPCwire and I had a very personal conversation with Cray CEO Peter Ungaro. Cray has been on something of a Cinderell Read more…

By Tiffany Trader & Tom Tabor

Intel Debuts Programmable Acceleration Card

October 5, 2017

With a view toward supporting complex, data-intensive applications, such as AI inference, video streaming analytics, database acceleration and genomics, Intel i Read more…

By Doug Black

Reinders: “AVX-512 May Be a Hidden Gem” in Intel Xeon Scalable Processors

June 29, 2017

Imagine if we could use vector processing on something other than just floating point problems.  Today, GPUs and CPUs work tirelessly to accelerate algorithms Read more…

By James Reinders

NERSC Scales Scientific Deep Learning to 15 Petaflops

August 28, 2017

A collaborative effort between Intel, NERSC and Stanford has delivered the first 15-petaflops deep learning software running on HPC platforms and is, according Read more…

By Rob Farber

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “ Read more…

By Tiffany Trader

Oracle Layoffs Reportedly Hit SPARC and Solaris Hard

September 7, 2017

Oracle’s latest layoffs have many wondering if this is the end of the line for the SPARC processor and Solaris OS development. As reported by multiple sources Read more…

By John Russell

US Coalesces Plans for First Exascale Supercomputer: Aurora in 2021

September 27, 2017

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, in Arlington, Va., yesterday (Sept. 26), it was revealed that the "Aurora" supercompute Read more…

By Tiffany Trader

Google Releases Deeplearn.js to Further Democratize Machine Learning

August 17, 2017

Spreading the use of machine learning tools is one of the goals of Google’s PAIR (People + AI Research) initiative, which was introduced in early July. Last w Read more…

By John Russell

GlobalFoundries Puts Wind in AMD’s Sails with 12nm FinFET

September 24, 2017

From its annual tech conference last week (Sept. 20), where GlobalFoundries welcomed more than 600 semiconductor professionals (reaching the Santa Clara venue Read more…

By Tiffany Trader

Graphcore Readies Launch of 16nm Colossus-IPU Chip

July 20, 2017

A second $30 million funding round for U.K. AI chip developer Graphcore sets up the company to go to market with its “intelligent processing unit” (IPU) in Read more…

By Tiffany Trader

Leading Solution Providers

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Amazon Debuts New AMD-based GPU Instances for Graphics Acceleration

September 12, 2017

Last week Amazon Web Services (AWS) streaming service, AppStream 2.0, introduced a new GPU instance called Graphics Design intended to accelerate graphics. The Read more…

By John Russell

EU Funds 20 Million Euro ARM+FPGA Exascale Project

September 7, 2017

At the Barcelona Supercomputer Centre on Wednesday (Sept. 6), 16 partners gathered to launch the EuroEXA project, which invests €20 million over three-and-a-half years into exascale-focused research and development. Led by the Horizon 2020 program, EuroEXA picks up the banner of a triad of partner projects — ExaNeSt, EcoScale and ExaNoDe — building on their work... Read more…

By Tiffany Trader

Delays, Smoke, Records & Markets – A Candid Conversation with Cray CEO Peter Ungaro

October 5, 2017

Earlier this month, Tom Tabor, publisher of HPCwire and I had a very personal conversation with Cray CEO Peter Ungaro. Cray has been on something of a Cinderell Read more…

By Tiffany Trader & Tom Tabor

Cray Moves to Acquire the Seagate ClusterStor Line

July 28, 2017

This week Cray announced that it is picking up Seagate's ClusterStor HPC storage array business for an undisclosed sum. "In short we're effectively transitioning the bulk of the ClusterStor product line to Cray," said CEO Peter Ungaro. Read more…

By Tiffany Trader

Intel Launches Software Tools to Ease FPGA Programming

September 5, 2017

Field Programmable Gate Arrays (FPGAs) have a reputation for being difficult to program, requiring expertise in specialty languages, like Verilog or VHDL. Easin Read more…

By Tiffany Trader

IBM Advances Web-based Quantum Programming

September 5, 2017

IBM Research is pairing its Jupyter-based Data Science Experience notebook environment with its cloud-based quantum computer, IBM Q, in hopes of encouraging a new class of entrepreneurial user to solve intractable problems that even exceed the capabilities of the best AI systems. Read more…

By Alex Woodie

HPC Chips – A Veritable Smorgasbord?

October 10, 2017

For the first time since AMD's ill-fated launch of Bulldozer the answer to the question, 'Which CPU will be in my next HPC system?' doesn't have to be 'Whichever variety of Intel Xeon E5 they are selling when we procure'. Read more…

By Dairsie Latimer

  • arrow
  • Click Here for More Headlines
  • arrow
Share This