Singapore Startup Hatches At-Scale HPC Dev Cloud

By Tiffany Trader

April 26, 2019

At most supercomputer centers, it’s common practice to allocate 10 percent or less of the machine for application development purposes. Such limited availability especially hampers development projects intended for large-scale deployments. Some organizations do not have any on-premise cycles for their code development and others may be looking to evaluate architectures not easily accessible or not even on the market yet.

A new company aims to address all these scenarios with custom-built HPC development systems that are available on demand in the cloud.

In an ambitious undertaking, Singapore-based startup Archanan emerged from stealth yesterday with the beta launch of its cloud-based developer platform for building and testing at-scale code. Founded in February 2018 by computer engineers and NYU Stony Brook alums Alexander Nodeland and Lukasz Orlowski, Archanan has backing from multiple VCs in the Singapore area, including primary investor SGInnovate.

In February 2019, Archanan (pronounced “are-KAY-nin” – Men in Black fans may recall the reference) raised a SGD$1.2 million (USD $881k) seed round and currently has several partnerships in play with OEMs and well-known supercomputing centers. John Gustafson of Gustafson’s law fame is the company’s lead scientific advisor.

Although there exist a number of development environment toolkits in the market as well as an array of HPC cloud infrastructures, Archanan combines these front and back ends, further baking in hardware-level virtualization to provide HPC developers with a functional replica of their production or target architecture.

Organizations accepted into the beta program will access personalized virtual test environments that are an emulation of their organizations’ production system(s) via the Archanan development platform. Archanan’s web-based IDE allows users to debug large parallel jobs in C and C++ and Python on a few different emulated supercomputers, including NSCC Singapore’s Aspire-1. Users can also construct custom system designs, based on a small, but growing number of hardware options.

“The Archanan IDE provides a purpose-built parallel debugger and visualization tools where you can develop code at scale,” said Nodeland. “In the future, we will be expanding the library of available supercomputer emulators and we’ll also be expanding the availability of tools – both built in house and some community tools.”

The testing environment, provided via cloud infrastructure (Amazon Web Services is a partner), employs a combination of virtualization, emulation and encapsulation technologies enabling users to predict performance metrics without having to run all the production servers. The goal, said Nodeland in an interview with HPCwire, is to enable the offloading of all HPC development effort to the cloud.

“One of the primary reasons why more organizations, especially in the commercial space, aren’t utilizing the power of modern supercomputers, is the considerable challenge of effective coding at these larger, more complex scales,” said Gustafson, esteemed computer scientist and visiting scientist at A*STAR – Agency for Science, Technology and Research. “There is a big gap between a laptop and that of a remote, giant collection of distributed, interconnected processors. By combining hardware-level virtualization and cloud computing, Archanan has figured out how to bridge both the technical, but also economical gaps that have presented adoption challenges for computing at this level. It’s exciting to see that we’re on the precipice of the democratization of high-performance computing across industries, at last.”

With multiple layers of abstraction in the stack, these testing and debugging systems are not intended to replicate the performance of the production environment, rather they address the pain points faced by many HPC developers stemming from limited access to production machine cycles.

“We provide an on-demand environment where users can develop their code at scale, meaning that if they are going to be running their production application on 30,000 cores, they can do their development on a virtual 30,000 cores, specifically to test how the network is going to behave in such a scenario, how MPI is going to behave, etc.,” said Nodeland.

Cited benefits include faster time to results due to shortened development time, developing code at the target production scale, and the subsequent minimization of port-over failures from the development to the production environment.

The company is confident it can ensure a high degree of scalability – outside of, possibly, the top 10 or 20 leadership machines. A white paper is in the works that will document internal performance.

To onboard a new supercomputing center onto its platform, Archanan gets together with the applications specialists, the solution architects and the support team for the supercomputing center to build the model with them. It employs all the same software packages, the same compiler, the same version of Linux, etc. (encapsulated in a Singularity container) and emulates down to the component level — the processors, the accelerators, and the network elements.

This is a two-week to two-month process during which the Archanan team fine tunes the emulator so it can accurately predict performance.

The Archanan development platform currently includes support for x86 provided by Intel and AMD, and also for Arm. Nvidia K80s and P100s GPUs are also supported. The company is working on support for Power 9 and Power 10, as well as NEC Vector Engines. Emulation for other architectures, including FPGAs, are on Archanan’s roadmap.

HPC sites participating in the beta program access the platform via a yearly or monthly subscription with a mechanism for overflow billing based on virtual node hours. Another use case enables OEMs or systems integrators to provide their customers with an evaluation system during the tendering and commissioning process. In that model, Archanan emulates the supercomputer for some fraction of the cost of the machine.

A third, forthcoming, usage model will be individual or group licenses. Archanan plans to offer monthly memberships through the Github marketplace so smaller users can try the system and run tests for their own jobs even if their organizations are not customers.

Archanan says its beta roster includes supercomputing centers and research groups based in Singapore, Australia and China.

While Archanan is going after traditional and enterprise HPC for its initial target market, Nodeland foresees expanding to more general AI, machine learning and big data workloads. The company has recently increased its workforce from two to seven employees, and has several open positions it is hiring for. It expects to expand its staff to 15 by year end.

Nodeland acknowledged there is work to be done building up their libraries. Given the heavy lift, it’s encouraging that the company has garnered the support of Gustafson as well as Wolfgang Gentzsch, co-founder and CEO of The UberCloud.

“Virtually all the software development tools in high-end, complex computing are used on desktop workstations and laptops, drastically limiting the development and debugging capability of these tools — it’s analogous to trying to recreate a masterpiece on an Etch-a-Sketch,” said Wolfgang Gentzsch, co-founder and CEO of The UberCloud. “Archanan’s cloud-based development platform extends these workstations and lets developers construct their code at scale, as if they were doing it directly on these large, complex architectures, thus creating better quality software in shorter time.”

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Rockport Networks Launches 300 Gbps Switchless Fabric, Reveals 396-Node Deployment at TACC

October 27, 2021

Rockport Networks emerged from stealth this week with the launch of its 300 Gbps switchless networking architecture focused on the needs of the high-performance computing and the advanced-scale AI market. Early customers Read more…

AWS Adds Gaudi-Powered, ML-Optimized EC2 DL1 Instances, Now in GA

October 27, 2021

As machine learning becomes a dominating use case for local and cloud computing, companies are racing to provide solutions specifically optimized and accelerated for AI applications. Now, Amazon Web Services (AWS) is int Read more…

Fireside Chat with LBNL’s Advanced Quantum Testbed Director

October 26, 2021

Last week, Irfan Siddiqi led a “fireside chat” with a few media and analysts to introduce the Department of Energy’s relatively new Advanced Quantum Testbed (AQT), which is based at Lawrence Berkeley National Labor Read more…

Graphcore Introduces Larger-Than-Ever IPU-Based Pods

October 22, 2021

After launching its second-generation intelligence processing units (IPUs) in 2020, four years after emerging from stealth, Graphcore is now boosting its product line with its largest commercially-available IPU-based sys Read more…

Quantum Chemistry Project to Be Among the First on EuroHPC’s LUMI System

October 22, 2021

Finland’s CSC has just installed the first module of LUMI, a 550-peak petaflops system supported by the European Union’s EuroHPC Joint Undertaking. While LUMI -- pictured in the header -- isn’t slated to complete i Read more…

AWS Solution Channel

Royalty-free stock illustration ID: 577238446

Putting bitrates into perspective

Recently, we talked about the advances NICE DCV has made to push pixels from cloud-hosted desktops or applications over the internet even more efficiently than before. Read more…

Killer Instinct: AMD’s Multi-Chip MI200 GPU Readies for a Major Global Debut

October 21, 2021

AMD’s next-generation supercomputer GPU is on its way – and by all appearances, it’s about to make a name for itself. The AMD Radeon Instinct MI200 GPU (a successor to the MI100) will, over the next year, begin to power three massive systems on three continents: the United States’ exascale Frontier system; the European Union’s pre-exascale LUMI system; and Australia’s petascale Setonix system. Read more…

Rockport Networks Launches 300 Gbps Switchless Fabric, Reveals 396-Node Deployment at TACC

October 27, 2021

Rockport Networks emerged from stealth this week with the launch of its 300 Gbps switchless networking architecture focused on the needs of the high-performance Read more…

AWS Adds Gaudi-Powered, ML-Optimized EC2 DL1 Instances, Now in GA

October 27, 2021

As machine learning becomes a dominating use case for local and cloud computing, companies are racing to provide solutions specifically optimized and accelerate Read more…

Fireside Chat with LBNL’s Advanced Quantum Testbed Director

October 26, 2021

Last week, Irfan Siddiqi led a “fireside chat” with a few media and analysts to introduce the Department of Energy’s relatively new Advanced Quantum Testb Read more…

Killer Instinct: AMD’s Multi-Chip MI200 GPU Readies for a Major Global Debut

October 21, 2021

AMD’s next-generation supercomputer GPU is on its way – and by all appearances, it’s about to make a name for itself. The AMD Radeon Instinct MI200 GPU (a successor to the MI100) will, over the next year, begin to power three massive systems on three continents: the United States’ exascale Frontier system; the European Union’s pre-exascale LUMI system; and Australia’s petascale Setonix system. Read more…

D-Wave Embraces Gate-Based Quantum Computing; Charts Path Forward

October 21, 2021

Earlier this month D-Wave Systems, the quantum computing pioneer that has long championed quantum annealing-based quantum computing (and sometimes taken heat fo Read more…

LLNL Prepares the Water and Power Infrastructure for El Capitan

October 21, 2021

When it’s (ostensibly) ready in early 2023, El Capitan is expected to deliver in excess of two exaflops of peak computing power – around four times the powe Read more…

Intel Reorgs HPC Group, Creates Two ‘Super Compute’ Groups

October 15, 2021

Following on changes made in June that moved Intel’s HPC unit out of the Data Platform Group and into the newly created Accelerated Computing Systems and Graphics (AXG) business unit, led by Raja Koduri, Intel is making further updates to the HPC group and announcing... Read more…

Quantum Workforce – NSTC Report Highlights Need for International Talent

October 13, 2021

Attracting and training the needed quantum workforce to fuel the ongoing quantum information sciences (QIS) revolution is a hot topic these days. Last week, the U.S. National Science and Technology Council issued a report – The Role of International Talent in Quantum Information Science... Read more…

Enter Dojo: Tesla Reveals Design for Modular Supercomputer & D1 Chip

August 20, 2021

Two months ago, Tesla revealed a massive GPU cluster that it said was “roughly the number five supercomputer in the world,” and which was just a precursor to Tesla’s real supercomputing moonshot: the long-rumored, little-detailed Dojo system. Read more…

Esperanto, Silicon in Hand, Champions the Efficiency of Its 1,092-Core RISC-V Chip

August 27, 2021

Esperanto Technologies made waves last December when it announced ET-SoC-1, a new RISC-V-based chip aimed at machine learning that packed nearly 1,100 cores onto a package small enough to fit six times over on a single PCIe card. Now, Esperanto is back, silicon in-hand and taking aim... Read more…

US Closes in on Exascale: Frontier Installation Is Underway

September 29, 2021

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, held by Zoom this week (Sept. 29-30), it was revealed that the Frontier supercomputer is currently being installed at Oak Ridge National Laboratory in Oak Ridge, Tenn. The staff at the Oak Ridge Leadership... Read more…

Intel Reorgs HPC Group, Creates Two ‘Super Compute’ Groups

October 15, 2021

Following on changes made in June that moved Intel’s HPC unit out of the Data Platform Group and into the newly created Accelerated Computing Systems and Graphics (AXG) business unit, led by Raja Koduri, Intel is making further updates to the HPC group and announcing... Read more…

Ahead of ‘Dojo,’ Tesla Reveals Its Massive Precursor Supercomputer

June 22, 2021

In spring 2019, Tesla made cryptic reference to a project called Dojo, a “super-powerful training computer” for video data processing. Then, in summer 2020, Tesla CEO Elon Musk tweeted: “Tesla is developing a [neural network] training computer... Read more…

Intel Completes LLVM Adoption; Will End Updates to Classic C/C++ Compilers in Future

August 10, 2021

Intel reported in a blog this week that its adoption of the open source LLVM architecture for Intel’s C/C++ compiler is complete. The transition is part of In Read more…

Hot Chips: Here Come the DPUs and IPUs from Arm, Nvidia and Intel

August 25, 2021

The emergence of data processing units (DPU) and infrastructure processing units (IPU) as potentially important pieces in cloud and datacenter architectures was Read more…

AMD-Xilinx Deal Gains UK, EU Approvals — China’s Decision Still Pending

July 1, 2021

AMD’s planned acquisition of FPGA maker Xilinx is now in the hands of Chinese regulators after needed antitrust approvals for the $35 billion deal were receiv Read more…

Leading Solution Providers

Contributors

HPE Wins $2B GreenLake HPC-as-a-Service Deal with NSA

September 1, 2021

In the heated, oft-contentious, government IT space, HPE has won a massive $2 billion contract to provide HPC and AI services to the United States’ National Security Agency (NSA). Following on the heels of the now-canceled $10 billion JEDI contract (reissued as JWCC) and a $10 billion... Read more…

Intel Unveils New Node Names; Sapphire Rapids Is Now an ‘Intel 7’ CPU

July 27, 2021

What's a preeminent chip company to do when its process node technology lags the competition by (roughly) one generation, but outmoded naming conventions make i Read more…

Quantum Roundup: IBM, Rigetti, Phasecraft, Oxford QC, China, and More

July 13, 2021

IBM yesterday announced a proof for a quantum ML algorithm. A week ago, it unveiled a new topology for its quantum processors. Last Friday, the Technical Univer Read more…

The Latest MLPerf Inference Results: Nvidia GPUs Hold Sway but Here Come CPUs and Intel

September 22, 2021

The latest round of MLPerf inference benchmark (v 1.1) results was released today and Nvidia again dominated, sweeping the top spots in the closed (apples-to-ap Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

Frontier to Meet 20MW Exascale Power Target Set by DARPA in 2008

July 14, 2021

After more than a decade of planning, the United States’ first exascale computer, Frontier, is set to arrive at Oak Ridge National Laboratory (ORNL) later this year. Crossing this “1,000x” horizon required overcoming four major challenges: power demand, reliability, extreme parallelism and data movement. Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

D-Wave Embraces Gate-Based Quantum Computing; Charts Path Forward

October 21, 2021

Earlier this month D-Wave Systems, the quantum computing pioneer that has long championed quantum annealing-based quantum computing (and sometimes taken heat fo Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire