Visit additional Tabor Communication Publications
December 09, 2011
BRUYÈRES-LE-CHÂTEL, France -- NFS is a well known protocol that exports remote file systems since the late 80's. Along with its widespread use over more than 20 years, the protocol has evolved a lot from the very basic NFSv2 in 1989 to the brand new NFSv4.1 whose specifications were published by the IETF in early 2010. This last version of the protocol contains the pNFS feature which makes it possible to separate metadata path and data paths, as modern parallel file systems do, optimizing both data and metadata management by removing classical bottlenecks.
The Military Applications Department of the French Atomic Energy Authority (CEA/DAM) develops since 2005 a generic NFS server running in User Space and named “NFS-Ganesha”. In production at the TERA compute center since January 2006 and available as an Open Source Software on Source Forge since July 2008. This NFS server has evolved a lot during the past five years with many new features (mostly related to NFSv4 and NFSv4.1/pNFS) within successful collaboration with industrial partners.
More information about installing/configuring NFS-Ganesha can be found at http://nfsganesha.sourceforge/net.
You can download NFS-Ganesha at http://sourceforge.net/projects/nfs-ganesha/files/nfsganesha/.
A generic massively multithreaded server in User Space
NFS-Ganesha has many design differences when compared with classic NFS server provided with various Linux distributions. First of all, it runs in User Space. This presents several advantages:
NFS-Ganesha is a massively multithreaded program. At startup, a large number of workers (aka threads dedicated to process NFS requests) and dispatchers (threads that receive requests and assign them to one of the workers by pushing the request into the related worker's queue) are spawned. This fits pretty well with today's architecture where the machines tend to have more and more processors and cores (machines with up to 16 or 32 cores, 8 cores per socket are relatively common). Memory management is then a critical consideration to avoid the resource allocator serializing the threads. NFS-Ganesha embeds its own memory manager, based on the BuddyBlock algorithm used in the kernel. Each thread allocates a large chuck of memory and divides it into various combinations of smaller blocks (powers of 2). Once released, the blocks are freed by being pushed back to the buddy blocks pools and reassembled. This operation is made by setting pointers without moving data, making it very efficient and fast. The internal design itself is multi-thread safe and tries to avoid bottlenecks: normally, a lock should not be requested by more than two or three different threads. There is no “big lock” that would finally serialize the threads. Currently, NFS-Ganesha runs in production with hundreds of workers and thousand of dispatchers at CEA's TERA compute facility.
A third point is the architecture of NFS-Ganesha itself. The daemon has been designed for being a very layered product. From the highest layers (transport layers (including IPv6 support) and protocols layers (managing NFSv2/NFSv3/NFSv4/NFSv4.1 and 9P.2000L)) to cache and backend layers, each module has been defined as a standalone library with well known structure and API. The lowest layer is called FSAL (which stands for File System Abstraction Layer) and acts as a generic interface with the file system used as a a backend. Currently, several “FSALs” are supported:
More FSAL are coming in future releases:
A NFS server with interesting features for HPC
NFS-Ganesha was born because of needs coming from HPC's computing centers. Since the very beginning, it has been designed to fit the specific workload produced by supercomputers, making it possible to deal with HPC's specific issues. Multiple corporations (like IBM, Panasas and LinuxBox) with wealth of HPC experience joined hands with the NFS-Ganesha project and contributed towards the project improvement . While most of these corporations' contributions are aimed at providing the interface to their products, their participation greatly enhanced the stability, functionality and most importantly the reachability of NFS-Ganesha. This makes NFSGanesha a well adapted server to use NFS in front of “many clients” themselves part of a big compute cluster.
Another HPC oriented feature of NFS-Ganesha is pNFS. This feature is defined by the NFSv4.1's RFC (aka RFC5661). By using pNFS, a client can use different servers, some dedicated to metadata and others to data. This is very similar to what modern parallel File Systems (LUSTRE,Panasas, CEPH,...) do by separating Medata Servers from Data Servers. This makes pNFS a model that fit very well the classical model used in HPC. The specifications for pNFS are very open, but three layouts (layouts are structures and mechanisms used by the client to access the data on the Data Servers) are well defined in the RFCs. NFS-Ganesha currently implements the first of them, the “files layout” and has collaboration with the industry to enhance and debug it. The other layouts (based on the “block device” mode and based on the OSD2 protocol) are part of the NFS-Ganesha roadmap with already established partnership. CEA's position is simple : soon (this is already the case in Fedora 15) every linux based machine will embed the NFSv4.1 features, including pNFS. If the NFS server is pNFS ready and if the exported namespace has itself parallel capabilities, then using pNFS will be a natural and portable way to access data in a parallel way, strongly enhancing the IO rate via NFS. NFS-Ganesha supports pNFS since its version 1.1.0 . Many improvements will be done in this area in future releases.
About the CEA
The French Alternative Energies and Atomic Energy Commission (CEA) leads research, development and innovation in four main areas: low-carbon energy sources, global defense and security, information technologies and healthcare technologies. The CEA’s leadership position in the world of research is built on a cross-disciplinary culture of engineers and researchers, ideal for creating synergy between fundamental research and technology innovation. With its 15,600 researchers and collaborators, it has internationally recognized expertise in its areas of excellence and has developed many collaborations with national and international, academic and industrial partners.
Information about HPC at CEA can be found at http://www-hpc.cea.fr/index-en.htm
Source: French Alternative Energies and Atomic Energy Commission (CEA)
In quieter times, sounding the bell of funding big science with big systems tends to resonate further than when ears are already burning with sour economic and national security news. For exascale's future, however, the time could be ripe to instill some sense of urgency....
In a recent solicitation, the NSF laid out needs for furthering its scientific and engineering infrastructure with new tools to go beyond top performance, Having already delivered systems like Stampede and Blue Waters, they're turning an eye to solving data-intensive challenges. We spoke with the agency's Irene Qualters and Barry Schneider about..
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
May 23, 2013 |
The study of climate change is one of those scientific problems where it is almost essential to model the entire Earth to attain accurate results and make worthwhile predictions. In an attempt to make climate science more accessible to smaller research facilities, NASA introduced what they call ‘Climate in a Box,’ a system they note acts as a desktop supercomputer.
May 22, 2013 |
At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.