At most supercomputer centers, it’s common practice to allocate 10 percent or less of the machine for application development purposes. Such limited availability especially hampers development projects intended for large-scale deployments. Some organizations do not have any on-premise cycles for their code development and others may be looking to evaluate architectures not easily accessible or not even on the market yet.
A new company aims to address all these scenarios with custom-built HPC development systems that are available on demand in the cloud.
In an ambitious undertaking, Singapore-based startup Archanan emerged from stealth yesterday with the beta launch of its cloud-based developer platform for building and testing at-scale code. Founded in February 2018 by computer engineers and NYU Stony Brook alums Alexander Nodeland and Lukasz Orlowski, Archanan has backing from multiple VCs in the Singapore area, including primary investor SGInnovate.
In February 2019, Archanan (pronounced “are-KAY-nin” – Men in Black fans may recall the reference) raised a SGD$1.2 million (USD $881k) seed round and currently has several partnerships in play with OEMs and well-known supercomputing centers. John Gustafson of Gustafson’s law fame is the company’s lead scientific advisor.
Although there exist a number of development environment toolkits in the market as well as an array of HPC cloud infrastructures, Archanan combines these front and back ends, further baking in hardware-level virtualization to provide HPC developers with a functional replica of their production or target architecture.
Organizations accepted into the beta program will access personalized virtual test environments that are an emulation of their organizations’ production system(s) via the Archanan development platform. Archanan’s web-based IDE allows users to debug large parallel jobs in C and C++ and Python on a few different emulated supercomputers, including NSCC Singapore’s Aspire-1. Users can also construct custom system designs, based on a small, but growing number of hardware options.
“The Archanan IDE provides a purpose-built parallel debugger and visualization tools where you can develop code at scale,” said Nodeland. “In the future, we will be expanding the library of available supercomputer emulators and we’ll also be expanding the availability of tools – both built in house and some community tools.”
The testing environment, provided via cloud infrastructure (Amazon Web Services is a partner), employs a combination of virtualization, emulation and encapsulation technologies enabling users to predict performance metrics without having to run all the production servers. The goal, said Nodeland in an interview with HPCwire, is to enable the offloading of all HPC development effort to the cloud.
“One of the primary reasons why more organizations, especially in the commercial space, aren’t utilizing the power of modern supercomputers, is the considerable challenge of effective coding at these larger, more complex scales,” said Gustafson, esteemed computer scientist and visiting scientist at A*STAR – Agency for Science, Technology and Research. “There is a big gap between a laptop and that of a remote, giant collection of distributed, interconnected processors. By combining hardware-level virtualization and cloud computing, Archanan has figured out how to bridge both the technical, but also economical gaps that have presented adoption challenges for computing at this level. It’s exciting to see that we’re on the precipice of the democratization of high-performance computing across industries, at last.”
With multiple layers of abstraction in the stack, these testing and debugging systems are not intended to replicate the performance of the production environment, rather they address the pain points faced by many HPC developers stemming from limited access to production machine cycles.
“We provide an on-demand environment where users can develop their code at scale, meaning that if they are going to be running their production application on 30,000 cores, they can do their development on a virtual 30,000 cores, specifically to test how the network is going to behave in such a scenario, how MPI is going to behave, etc.,” said Nodeland.
Cited benefits include faster time to results due to shortened development time, developing code at the target production scale, and the subsequent minimization of port-over failures from the development to the production environment.
The company is confident it can ensure a high degree of scalability – outside of, possibly, the top 10 or 20 leadership machines. A white paper is in the works that will document internal performance.
To onboard a new supercomputing center onto its platform, Archanan gets together with the applications specialists, the solution architects and the support team for the supercomputing center to build the model with them. It employs all the same software packages, the same compiler, the same version of Linux, etc. (encapsulated in a Singularity container) and emulates down to the component level — the processors, the accelerators, and the network elements.
This is a two-week to two-month process during which the Archanan team fine tunes the emulator so it can accurately predict performance.
The Archanan development platform currently includes support for x86 provided by Intel and AMD, and also for Arm. Nvidia K80s and P100s GPUs are also supported. The company is working on support for Power 9 and Power 10, as well as NEC Vector Engines. Emulation for other architectures, including FPGAs, are on Archanan’s roadmap.
HPC sites participating in the beta program access the platform via a yearly or monthly subscription with a mechanism for overflow billing based on virtual node hours. Another use case enables OEMs or systems integrators to provide their customers with an evaluation system during the tendering and commissioning process. In that model, Archanan emulates the supercomputer for some fraction of the cost of the machine.
A third, forthcoming, usage model will be individual or group licenses. Archanan plans to offer monthly memberships through the Github marketplace so smaller users can try the system and run tests for their own jobs even if their organizations are not customers.
Archanan says its beta roster includes supercomputing centers and research groups based in Singapore, Australia and China.
While Archanan is going after traditional and enterprise HPC for its initial target market, Nodeland foresees expanding to more general AI, machine learning and big data workloads. The company has recently increased its workforce from two to seven employees, and has several open positions it is hiring for. It expects to expand its staff to 15 by year end.
Nodeland acknowledged there is work to be done building up their libraries. Given the heavy lift, it’s encouraging that the company has garnered the support of Gustafson as well as Wolfgang Gentzsch, co-founder and CEO of The UberCloud.
“Virtually all the software development tools in high-end, complex computing are used on desktop workstations and laptops, drastically limiting the development and debugging capability of these tools — it’s analogous to trying to recreate a masterpiece on an Etch-a-Sketch,” said Wolfgang Gentzsch, co-founder and CEO of The UberCloud. “Archanan’s cloud-based development platform extends these workstations and lets developers construct their code at scale, as if they were doing it directly on these large, complex architectures, thus creating better quality software in shorter time.”