June 29, 2021 — The Workflows and Distributed Computing team at the Barcelona Supercomputing Center is proud to announce a new release, version 2.9 (codename Jasmine), of the programming environment COMPSs.
This version of COMPSs updates the result of the team’s work in the last years on the provision of a set of tools that helps developers to program and execute their applications efficiently on distributed computational infrastructures such as clusters, clouds and container managed platforms. COMPSs is a task-based programming model known for notably improving the performance of large-scale applications by automatically parallelizing their execution.
COMPSs has been available for the last years for the MareNostrum supercomputer and Spanish Supercomputing Network users, and it has been adopted in several research projects such as EUBra-BIGSEA, MUG, EGI, ASCETIC, TANGO, NEXTGenIO, I-BiDaaS and mF2C. In these projects, COMPSs has been applied to implement use cases provided by different communities across diverse disciplines as biomedicine, engineering, biodiversity, chemistry, astrophysics, financial, telecommunications, manufacturing and earth sciences. Currently it is also under extension and adoption in applications in the projects AI-SPRINT, ExaQUte, LANDSUPPORT, the BioExcel CoE, PerMedCoE, CLASS, ELASTIC, and the EXPERTISE ETN, and in the Edge Twins HPC FET Innovation Launchpad project. It has also been applied in sample use cases of the ChEESE CoE. A special mention is the eFlows4HPC project coordinated by the group, started in January 2021, that aims to develop a workflow software stack where one of the main components is the PyCOMPSs/COMPSs environment.
The new release includes support for nested tasks, including recursive nested tasks. This feature enhances the programmability of applications that naturally have a hierarchical structure. It also helps on mapping large tasks into nodes and smaller tasks in resources inside the nodes, which enables a better locality exploitation. This feature is enabled in the agents’ runtime deployment, where each agent is running an instance of the runtime, converting it into a distributed engine.
The serialization and deserialization of the tasks’ parameters is one of the more consuming phases when using the Python binding of COMPSs. In this release, the Python binding comes with a new support for a Python workers cache, which overcomes one of the largest overheads for Python applications, which had to serialize the tasks’ parameters into files. In this way, parameters can be placed in the cache and reused by multiple tasks.
Other enhancements come as initial work towards the support of checkpointing and restart in COMPSs applications. In this sense, a new application time out functionality enables a controlled finalization of applications before a given wall_clock_limit. This helps very large execution applications that cannot be run in a single job managed by a job scheduler. In this case, when the job approaches the wall_clock_limit, the application is safely stopped and can be restarted in another job. The functionality is still not totally automatic and requires that an initial check to see if a previous job was executing the application.
Other improvements are in the profiling and tracing features, for which we have extended the tracing of the agents’ deployment, added new tracing events and added a functionality to profile the amount of memory used.
COMPSs 2.9 comes with other minor new features, extensions and bug fixes.
COMPSs had around 1000 downloads last year and is used by around 20 groups in real applications. COMPSs has recently attracted interest from areas such as engineering, image recognition, genomics and seismology, where specific courses and dissemination actions have been performed.
The packages and the complete list of features are available in the Downloads page. A virtual appliance is also available to test the functionalities of COMPSs through a step-by-step tutorial that guides the user to develop and execute a set of example applications.
Additionally, a user guide and papers published in relevant conferences and journals are available.
For more information on COMPSs please visit: http://www.bsc.es/compss
About the Workflow and Distributed Computing group
The Workflow and Distributed Computing team at the Barcelona Supercomputing Center aims to offer tools and mechanisms that enable the sharing, selection, and aggregation of a wide variety of geographically distributed computational resources in a transparent way. The research done in this team is based in the former expertise of the group, and extending it towards the aspects of distributed computing that can benefit from this expertise. The team at BSC has a strong focus on programming models and resource management and scheduling in distributed computing environments.