Convey Computer
Texas Advanced Computing Center
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud
Green Computing Report

Tabor Communications
Corporate Video

French Atomic Energy Authority Releases NFS Server with HPC Features


BRUYÈRES-LE-CHÂTEL, France -- NFS is a well known protocol that exports remote file systems since the late 80's. Along with its widespread use over more than 20 years, the protocol has evolved a lot from the very basic NFSv2 in 1989 to the brand new NFSv4.1 whose specifications were published by the IETF in early 2010. This last version of the protocol contains the pNFS feature which makes it possible to separate metadata path and data paths, as modern parallel file systems do, optimizing both data and metadata management by removing classical bottlenecks.

The Military Applications Department of the French Atomic Energy Authority (CEA/DAM) develops since 2005 a generic NFS server running in User Space and named “NFS-Ganesha”. In production at the TERA compute center since January 2006 and available as an Open Source Software on Source Forge since July 2008. This NFS server has evolved a lot during the past five years with many new features (mostly related to NFSv4 and NFSv4.1/pNFS) within successful collaboration with industrial partners.

More information about installing/configuring NFS-Ganesha can be found at http://nfsganesha.sourceforge/net.

You can download NFS-Ganesha at http://sourceforge.net/projects/nfs-ganesha/files/nfsganesha/.

A generic massively multithreaded server in User Space

NFS-Ganesha has many design differences when compared with classic NFS server provided with various Linux distributions. First of all, it runs in User Space. This presents several advantages:

  • Being located in User Space enables it to use huge portions of memory as cache with specific management policies. This provides NFS-Ganesha with advanced metadata caching capabilities.
  • Lots of ancillary services reside in User Space : authentication (Kerberos), users' names directory (LDAP, NIS). Because these services and NFS-Ganesha run in the same place, communication in-between both is very easy and requires no other daemon or resource (daemons like rpc.svcgssd that manages security tokens for nfsd inside the kernel, and rpc.idmapd that performs name resolution for the kernel are not required).
  • Modern parallel file systems used in HPC (LUSTRE, PanFS, GPFS, CEPH) have userspace libraries that can be used to access storage directly or independently of POSIX namespace, off-loading the kernel. NFS-Ganesha can be linked with these libraries to enhance its efficiency.
  • It's easier to implement fail-over for services in User Space. If the daemon crashes or comes to a degraded mode, restarting it will solve the matter.
  • User Space is safer : a bug in a User Space program will have an impact on the related program only. A bug in the Kernel may kill the whole machine.

NFS-Ganesha is a massively multithreaded program. At startup, a large number of workers (aka threads dedicated to process NFS requests) and dispatchers (threads that receive requests and assign them to one of the workers by pushing the request into the related worker's queue) are spawned. This fits pretty well with today's architecture where the machines tend to have more and more processors and cores (machines with up to 16 or 32 cores, 8 cores per socket are relatively common). Memory management is then a critical consideration to avoid the resource allocator serializing the threads. NFS-Ganesha embeds its own memory manager, based on the BuddyBlock algorithm used in the kernel. Each thread allocates a large chuck of memory and divides it into various combinations of smaller blocks (powers of 2). Once released, the blocks are freed by being pushed back to the buddy blocks pools and reassembled. This operation is made by setting pointers without moving data, making it very efficient and fast. The internal design itself is multi-thread safe and tries to avoid bottlenecks: normally, a lock should not be requested by more than two or three different threads. There is no “big lock” that would finally serialize the threads. Currently, NFS-Ganesha runs in production with hundreds of workers and thousand of dispatchers at CEA's TERA compute facility.

A third point is the architecture of NFS-Ganesha itself. The daemon has been designed for being a very layered product. From the highest layers (transport layers (including IPv6 support) and protocols layers (managing NFSv2/NFSv3/NFSv4/NFSv4.1 and 9P.2000L)) to cache and backend layers, each module has been defined as a standalone library with well known structure and API. The lowest layer is called FSAL (which stands for File System Abstraction Layer) and acts as a generic interface with the file system used as a a backend. Currently, several “FSALs” are supported:

  • FSAL_VFS: uses the open_by_handle_at/name_to_handle_at syscalls to export any filesystem managed by the kernel's VFS.
  • FSAL_XFS: exports a XFS file system (using libhandle from xfsprogs package).
  • FSAL_LUSTRE: exports a LUSTREv2 file system (using Lustre API Library).
  • FSAL_GPFS: exports a GPFS file system
  • FSAL_ZFS: exports the content of a ZFS tank. This module is based on a user space library (provided with NFS-Ganesha) that fully implements ZFS functionality with no kernel adherence.
  • FSAL_PROXY: operates as a NFSv4 client to a remote NFSv4 server. This module turns NFS-Ganesha into a NFSv4 Proxy.
  • FSAL_HPSS: exports the HPSS namespace. HPSS (High Performance Storage System) is a HSM sold by IBM Government System and widely used in the HPC community.
  • FSAL_FUSELIKE: provides required hooks to plug any “fuse-ready” product to be used with NFS-Ganesha. The products that use fuse often reside in full User Space. This FSAL is then a natural way to make a “all in User Space” solution.

More FSAL are coming in future releases:

  • FSAL_EXOFS: exports a EXOFS file system. EXOFS uses Object Storage Device (aka OSD) as storage. NFS-Ganesha will implements a pNFS/OSD2 layout with this FSAL.
  • FSAL_CEPH: support for the CEPH parallel file system, with pNFS/Files layout.

A NFS server with interesting features for HPC

NFS-Ganesha was born because of needs coming from HPC's computing centers. Since the very beginning, it has been designed to fit the specific workload produced by supercomputers, making it possible to deal with HPC's specific issues. Multiple corporations (like IBM, Panasas and LinuxBox) with wealth of HPC experience joined hands with the NFS-Ganesha project and contributed towards the project improvement . While most of these corporations' contributions are aimed at providing the interface to their products, their participation greatly enhanced the stability, functionality and most importantly the reachability of NFS-Ganesha. This makes NFSGanesha a well adapted server to use NFS in front of “many clients” themselves part of a big compute cluster.

Another HPC oriented feature of NFS-Ganesha is pNFS. This feature is defined by the NFSv4.1's RFC (aka RFC5661). By using pNFS, a client can use different servers, some dedicated to metadata and others to data. This is very similar to what modern parallel File Systems (LUSTRE,Panasas, CEPH,...) do by separating Medata Servers from Data Servers. This makes pNFS a model that fit very well the classical model used in HPC. The specifications for pNFS are very open, but three layouts (layouts are structures and mechanisms used by the client to access the data on the Data Servers) are well defined in the RFCs. NFS-Ganesha currently implements the first of them, the “files layout” and has collaboration with the industry to enhance and debug it. The other layouts (based on the “block device” mode and based on the OSD2 protocol) are part of the NFS-Ganesha roadmap with already established partnership. CEA's position is simple : soon (this is already the case in Fedora 15) every linux based machine will embed the NFSv4.1 features, including pNFS. If the NFS server is pNFS ready and if the exported namespace has itself parallel capabilities, then using pNFS will be a natural and portable way to access data in a parallel way, strongly enhancing the IO rate via NFS. NFS-Ganesha supports pNFS since its version 1.1.0 . Many improvements will be done in this area in future releases.

About the CEA

The French Alternative Energies and Atomic Energy Commission (CEA) leads research, development and innovation in four main areas: low-carbon energy sources, global defense and security, information technologies and healthcare technologies. The CEA’s leadership position in the world of research is built on a cross-disciplinary culture of engineers and researchers, ideal for creating synergy between fundamental research and technology innovation. With its 15,600 researchers and collaborators, it has internationally recognized expertise in its areas of excellence and has developed many collaborations with national and international, academic and industrial partners.

Information about HPC at CEA can be found at http://www-hpc.cea.fr/index-en.htm

-----

Source: French Alternative Energies and Atomic Energy Commission (CEA)

Sponsored Links

Accelerate your science with Seneca
One of the first HPC providers installing a 4X NVIDIA Kepler K-20 cluster. Invites you to a free evaluation on Seneca’s NVIDIA K20 Kepler cluster, pre-loaded with AMBER, NAMD, LAMMPS

High-Performance Computing in Action
Businesses that want to be on the cutting edge of their industries are increasingly turning to high-performance computing (HPC) solutions to handle complex compute processes and speed up their rate of innovation. Download this Executive Brief to see how businesses in energy, life sciences and entertainment put HPC solutions to work in their operations.

May 17, 2013

May 16, 2013

May 15, 2013

May 14, 2013

May 13, 2013

May 10, 2013

May 09, 2013

May 08, 2013

May 07, 2013

May 06, 2013



Feature Articles

Saddling Phi for TACC’s Stampede

The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
Read more...

"No Exascale for You!" An Interview with Berkeley Lab's Horst Simon

Although Horst Simon was named Deputy Director of Lawrence Berkeley National Laboratory, he maintains his strong ties to the scientific computing community as an editor of the TOP500 list and as an invited speaker at conferences.
Read more...

Supercomputing Vet Champions Quantum Cause

Supercomputing veteran, Bo Ewald, has been neck-deep in bleeding edge system development since his twelve-year stint at Cray Research back in the mid-1980s, which was followed by his tenure at large organizations like SGI and startups, including Scale Eight Corporation and Linux Networx. He has put his weight behind quantum company....
Read more...

Short Takes

Running Computational Fluid Dynamics in the Cloud

May 16, 2013 | When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...

Computing the Physics of Bubbles

May 15, 2013 | Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...

Internet2 Awards Program Seeks Innovative Applications

May 10, 2013 | Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
Read more...

Floating Funding to Exascale Island

May 09, 2013 | The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
Read more...

HPC and the True Cost of Cloud

May 08, 2013 | For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
Read more...

Sponsored Whitepapers

Best Practices in Big Data Storage

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Progress in Parallel: the Bull Parallel Programming Center

04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.

Sponsored Multimedia

SGI DMF ZeroWatt Disk Solution

In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

SC12 Editorial Feature HPCwire Soundbite sponsored by ISC

HPC Job Bank


Featured Events


  • June 16, 2013 - June 20, 2013
    ISC'13
    Leipzig,
    Germany

  • June 17, 2013 - June 18, 2013
    Forecast 2013
    San Francisco, CA
    United States





HPCwire Events