Feb. 3 — Cambridge, UK-based start-up Ellexus Ltd has launched its inaugural product to load balance shared storage across high performance computing (HPC) clusters.
This ground breaking new software, called Mistral, has the potential to save the HPC industry millions in wasted compute resources and improve the quality of service across compute clusters around the world.
Developed in collaboration with the IT department of ARM Holdings, Mistral solves the noisy neighbour problem: when a small number of jobs overload the network or file system in a compute cluster with shared storage. Sometimes this problem is caused by rogue jobs that have been submitted to the cluster by mistake. Other times the cluster may simply by overloaded with a high number of IO hungry jobs. In either case, the performance of all the jobs on the cluster can be affected and the cluster may even be brought down completely, bringing hundreds of engineers to a standstill and causing critical deadlines to be missed at a cost of tens or hundreds of thousands of pounds.
Mistral works by monitoring application IO and IO performance across a cluster in order to identify rogue jobs and hotspots, and can automatically throttle IO problem jobs and applications. By deploying Mistral, a company can gain an in-depth understanding of how users are accessing the storage as well as prevent disastrous data access patterns.
Dr. Rosemary Francis, CEO of Ellexus, said, “We are really excited to bring Mistral to the market at such a critical time for the HPC industry and the world of scientific computing. The sector is growing rapidly with the advances in microchip design, genome discovery and bioinformatics. Using Mistral will help to ensure that organisations involved in HPC can get the most out of their systems.
“Mistral builds on our existing technology, our tool suite Breeze, which carries out dependency analysis to allow engineers to identify problems with the installation and deployment of Linux applications. International organisations around the world rely on Breeze to solve not only internal problems but to help customers understand how their systems are working.”
Olly Stephens, Engineering Systems Architect at ARM, said of the collaboration with Ellexus, “We wanted to develop a system that will allow the infrastructure to protect itself somewhat against IO behaviour that is considered a risk. In particular, we wanted the ability for aggressive use of the storage infrastructure to be automatically detected early and remedial steps taken quickly.
“Previously this activity was done by the HPC support staff, who were able to monitor and detect issues, but this was a slow and difficult process, primarily due to the lack of available information. The data and system control provided by Mistral will allow the infrastructure to prevent risky IO patterns and give us a lot more information to learn from.”
Mistral was launched in January 2016 to a queue to a keen waiting list of international organisations eager to get their hands in the unique software. There is no doubt that it will make real differences to their cluster performance.
For more information about Mistral and to request an evaluation of the software, contact Rosemary at [email protected].
—
Source: Ellexus