December 09, 2011
BRUYÈRES-LE-CHÂTEL, France -- NFS is a well known protocol that exports remote file systems since the late 80's. Along with its widespread use over more than 20 years, the protocol has evolved a lot from the very basic NFSv2 in 1989 to the brand new NFSv4.1 whose specifications were published by the IETF in early 2010. This last version of the protocol contains the pNFS feature which makes it possible to separate metadata path and data paths, as modern parallel file systems do, optimizing both data and metadata management by removing classical bottlenecks.
The Military Applications Department of the French Atomic Energy Authority (CEA/DAM) develops since 2005 a generic NFS server running in User Space and named “NFS-Ganesha”. In production at the TERA compute center since January 2006 and available as an Open Source Software on Source Forge since July 2008. This NFS server has evolved a lot during the past five years with many new features (mostly related to NFSv4 and NFSv4.1/pNFS) within successful collaboration with industrial partners.
More information about installing/configuring NFS-Ganesha can be found at http://nfsganesha.sourceforge/net.
You can download NFS-Ganesha at http://sourceforge.net/projects/nfs-ganesha/files/nfsganesha/.
A generic massively multithreaded server in User Space
NFS-Ganesha has many design differences when compared with classic NFS server provided with various Linux distributions. First of all, it runs in User Space. This presents several advantages:
NFS-Ganesha is a massively multithreaded program. At startup, a large number of workers (aka threads dedicated to process NFS requests) and dispatchers (threads that receive requests and assign them to one of the workers by pushing the request into the related worker's queue) are spawned. This fits pretty well with today's architecture where the machines tend to have more and more processors and cores (machines with up to 16 or 32 cores, 8 cores per socket are relatively common). Memory management is then a critical consideration to avoid the resource allocator serializing the threads. NFS-Ganesha embeds its own memory manager, based on the BuddyBlock algorithm used in the kernel. Each thread allocates a large chuck of memory and divides it into various combinations of smaller blocks (powers of 2). Once released, the blocks are freed by being pushed back to the buddy blocks pools and reassembled. This operation is made by setting pointers without moving data, making it very efficient and fast. The internal design itself is multi-thread safe and tries to avoid bottlenecks: normally, a lock should not be requested by more than two or three different threads. There is no “big lock” that would finally serialize the threads. Currently, NFS-Ganesha runs in production with hundreds of workers and thousand of dispatchers at CEA's TERA compute facility.
A third point is the architecture of NFS-Ganesha itself. The daemon has been designed for being a very layered product. From the highest layers (transport layers (including IPv6 support) and protocols layers (managing NFSv2/NFSv3/NFSv4/NFSv4.1 and 9P.2000L)) to cache and backend layers, each module has been defined as a standalone library with well known structure and API. The lowest layer is called FSAL (which stands for File System Abstraction Layer) and acts as a generic interface with the file system used as a a backend. Currently, several “FSALs” are supported:
More FSAL are coming in future releases:
A NFS server with interesting features for HPC
NFS-Ganesha was born because of needs coming from HPC's computing centers. Since the very beginning, it has been designed to fit the specific workload produced by supercomputers, making it possible to deal with HPC's specific issues. Multiple corporations (like IBM, Panasas and LinuxBox) with wealth of HPC experience joined hands with the NFS-Ganesha project and contributed towards the project improvement . While most of these corporations' contributions are aimed at providing the interface to their products, their participation greatly enhanced the stability, functionality and most importantly the reachability of NFS-Ganesha. This makes NFSGanesha a well adapted server to use NFS in front of “many clients” themselves part of a big compute cluster.
Another HPC oriented feature of NFS-Ganesha is pNFS. This feature is defined by the NFSv4.1's RFC (aka RFC5661). By using pNFS, a client can use different servers, some dedicated to metadata and others to data. This is very similar to what modern parallel File Systems (LUSTRE,Panasas, CEPH,...) do by separating Medata Servers from Data Servers. This makes pNFS a model that fit very well the classical model used in HPC. The specifications for pNFS are very open, but three layouts (layouts are structures and mechanisms used by the client to access the data on the Data Servers) are well defined in the RFCs. NFS-Ganesha currently implements the first of them, the “files layout” and has collaboration with the industry to enhance and debug it. The other layouts (based on the “block device” mode and based on the OSD2 protocol) are part of the NFS-Ganesha roadmap with already established partnership. CEA's position is simple : soon (this is already the case in Fedora 15) every linux based machine will embed the NFSv4.1 features, including pNFS. If the NFS server is pNFS ready and if the exported namespace has itself parallel capabilities, then using pNFS will be a natural and portable way to access data in a parallel way, strongly enhancing the IO rate via NFS. NFS-Ganesha supports pNFS since its version 1.1.0 . Many improvements will be done in this area in future releases.
About the CEA
The French Alternative Energies and Atomic Energy Commission (CEA) leads research, development and innovation in four main areas: low-carbon energy sources, global defense and security, information technologies and healthcare technologies. The CEA’s leadership position in the world of research is built on a cross-disciplinary culture of engineers and researchers, ideal for creating synergy between fundamental research and technology innovation. With its 15,600 researchers and collaborators, it has internationally recognized expertise in its areas of excellence and has developed many collaborations with national and international, academic and industrial partners.
Information about HPC at CEA can be found at http://www-hpc.cea.fr/index-en.htm
-----
Source: French Alternative Energies and Atomic Energy Commission (CEA)
The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
Read more...
Although Horst Simon was named Deputy Director of Lawrence Berkeley National Laboratory, he maintains his strong ties to the scientific computing community as an editor of the TOP500 list and as an invited speaker at conferences.
Read more...
Supercomputing veteran, Bo Ewald, has been neck-deep in bleeding edge system development since his twelve-year stint at Cray Research back in the mid-1980s, which was followed by his tenure at large organizations like SGI and startups, including Scale Eight Corporation and Linux Networx. He has put his weight behind quantum company....
Read more...
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
Read more...
May 09, 2013 |
The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
Read more...
May 08, 2013 |
For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
Read more...
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.