Singapore Startup Hatches At-Scale HPC Dev Cloud

By Tiffany Trader

April 26, 2019

At most supercomputer centers, it’s common practice to allocate 10 percent or less of the machine for application development purposes. Such limited availability especially hampers development projects intended for large-scale deployments. Some organizations do not have any on-premise cycles for their code development and others may be looking to evaluate architectures not easily accessible or not even on the market yet.

A new company aims to address all these scenarios with custom-built HPC development systems that are available on demand in the cloud.

In an ambitious undertaking, Singapore-based startup Archanan emerged from stealth yesterday with the beta launch of its cloud-based developer platform for building and testing at-scale code. Founded in February 2018 by computer engineers and NYU Stony Brook alums Alexander Nodeland and Lukasz Orlowski, Archanan has backing from multiple VCs in the Singapore area, including primary investor SGInnovate.

In February 2019, Archanan (pronounced “are-KAY-nin” – Men in Black fans may recall the reference) raised a SGD$1.2 million (USD $881k) seed round and currently has several partnerships in play with OEMs and well-known supercomputing centers. John Gustafson of Gustafson’s law fame is the company’s lead scientific advisor.

Although there exist a number of development environment toolkits in the market as well as an array of HPC cloud infrastructures, Archanan combines these front and back ends, further baking in hardware-level virtualization to provide HPC developers with a functional replica of their production or target architecture.

Organizations accepted into the beta program will access personalized virtual test environments that are an emulation of their organizations’ production system(s) via the Archanan development platform. Archanan’s web-based IDE allows users to debug large parallel jobs in C and C++ and Python on a few different emulated supercomputers, including NSCC Singapore’s Aspire-1. Users can also construct custom system designs, based on a small, but growing number of hardware options.

“The Archanan IDE provides a purpose-built parallel debugger and visualization tools where you can develop code at scale,” said Nodeland. “In the future, we will be expanding the library of available supercomputer emulators and we’ll also be expanding the availability of tools – both built in house and some community tools.”

The testing environment, provided via cloud infrastructure (Amazon Web Services is a partner), employs a combination of virtualization, emulation and encapsulation technologies enabling users to predict performance metrics without having to run all the production servers. The goal, said Nodeland in an interview with HPCwire, is to enable the offloading of all HPC development effort to the cloud.

“One of the primary reasons why more organizations, especially in the commercial space, aren’t utilizing the power of modern supercomputers, is the considerable challenge of effective coding at these larger, more complex scales,” said Gustafson, esteemed computer scientist and visiting scientist at A*STAR – Agency for Science, Technology and Research. “There is a big gap between a laptop and that of a remote, giant collection of distributed, interconnected processors. By combining hardware-level virtualization and cloud computing, Archanan has figured out how to bridge both the technical, but also economical gaps that have presented adoption challenges for computing at this level. It’s exciting to see that we’re on the precipice of the democratization of high-performance computing across industries, at last.”

With multiple layers of abstraction in the stack, these testing and debugging systems are not intended to replicate the performance of the production environment, rather they address the pain points faced by many HPC developers stemming from limited access to production machine cycles.

“We provide an on-demand environment where users can develop their code at scale, meaning that if they are going to be running their production application on 30,000 cores, they can do their development on a virtual 30,000 cores, specifically to test how the network is going to behave in such a scenario, how MPI is going to behave, etc.,” said Nodeland.

Cited benefits include faster time to results due to shortened development time, developing code at the target production scale, and the subsequent minimization of port-over failures from the development to the production environment.

The company is confident it can ensure a high degree of scalability – outside of, possibly, the top 10 or 20 leadership machines. A white paper is in the works that will document internal performance.

To onboard a new supercomputing center onto its platform, Archanan gets together with the applications specialists, the solution architects and the support team for the supercomputing center to build the model with them. It employs all the same software packages, the same compiler, the same version of Linux, etc. (encapsulated in a Singularity container) and emulates down to the component level — the processors, the accelerators, and the network elements.

This is a two-week to two-month process during which the Archanan team fine tunes the emulator so it can accurately predict performance.

The Archanan development platform currently includes support for x86 provided by Intel and AMD, and also for Arm. Nvidia K80s and P100s GPUs are also supported. The company is working on support for Power 9 and Power 10, as well as NEC Vector Engines. Emulation for other architectures, including FPGAs, are on Archanan’s roadmap.

HPC sites participating in the beta program access the platform via a yearly or monthly subscription with a mechanism for overflow billing based on virtual node hours. Another use case enables OEMs or systems integrators to provide their customers with an evaluation system during the tendering and commissioning process. In that model, Archanan emulates the supercomputer for some fraction of the cost of the machine.

A third, forthcoming, usage model will be individual or group licenses. Archanan plans to offer monthly memberships through the Github marketplace so smaller users can try the system and run tests for their own jobs even if their organizations are not customers.

Archanan says its beta roster includes supercomputing centers and research groups based in Singapore, Australia and China.

While Archanan is going after traditional and enterprise HPC for its initial target market, Nodeland foresees expanding to more general AI, machine learning and big data workloads. The company has recently increased its workforce from two to seven employees, and has several open positions it is hiring for. It expects to expand its staff to 15 by year end.

Nodeland acknowledged there is work to be done building up their libraries. Given the heavy lift, it’s encouraging that the company has garnered the support of Gustafson as well as Wolfgang Gentzsch, co-founder and CEO of The UberCloud.

“Virtually all the software development tools in high-end, complex computing are used on desktop workstations and laptops, drastically limiting the development and debugging capability of these tools — it’s analogous to trying to recreate a masterpiece on an Etch-a-Sketch,” said Wolfgang Gentzsch, co-founder and CEO of The UberCloud. “Archanan’s cloud-based development platform extends these workstations and lets developers construct their code at scale, as if they were doing it directly on these large, complex architectures, thus creating better quality software in shorter time.”

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

HPC Pioneer Gordon Bell Passed Away

May 22, 2024

Legendary computer scientist Gordon Bell passed away last Friday at his home in Coronado, CA. He was 89. The New York Times has a nice tribute piece. A long-time pioneer with Digital Equipment Corp, he pushed hard for de Read more…

ISC 2024 — A Few Quantum Gems and Slides from a Packed QC Agenda

May 22, 2024

If you were looking for quantum computing content, ISC 2024 was a good place to be last week — there were around 20 quantum computing related sessions. QC even earned a slide in Kathy Yelick’s opening keynote — Bey Read more…

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators and Atos' financial turmoil have the HPC market worried. If Read more…

Core42 Is Building Its 172 Million-core AI Supercomputer in Texas

May 20, 2024

UAE-based Core42 is building an AI supercomputer with 172 million cores which will become operational later this year. The system, Condor Galaxy 3, was announced earlier this year and will have 192 nodes with Cerebras Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's latest weapon in the AI battle with GPU maker Nvidia and clou Read more…

ISC 2024 Student Cluster Competition

May 16, 2024

The 2024 ISC 2024 competition welcomed 19 virtual (remote) and eight in-person teams. The in-person teams participated in the conference venue and, while the virtual teams competed using the Bridges-2 supercomputers at t Read more…

ISC 2024 — A Few Quantum Gems and Slides from a Packed QC Agenda

May 22, 2024

If you were looking for quantum computing content, ISC 2024 was a good place to be last week — there were around 20 quantum computing related sessions. QC eve Read more…

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's l Read more…

Europe’s Race towards Quantum-HPC Integration and Quantum Advantage

May 16, 2024

What an interesting panel, Quantum Advantage — Where are We and What is Needed? While the panelists looked slightly weary — their’s was, after all, one of Read more…

The Future of AI in Science

May 15, 2024

AI is one of the most transformative and valuable scientific tools ever developed. By harnessing vast amounts of data and computational power, AI systems can un Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

ISC 2024 Keynote: High-precision Computing Will Be a Foundation for AI Models

May 15, 2024

Some scientific computing applications cannot sacrifice accuracy and will always require high-precision computing. Therefore, conventional high-performance c Read more…

Shutterstock 493860193

Linux Foundation Announces the Launch of the High-Performance Software Foundation

May 14, 2024

The Linux Foundation, the nonprofit organization enabling mass innovation through open source, is excited to announce the launch of the High-Performance Softw Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

Leading Solution Providers

Contributors

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos of wh Read more…

Intel Plans Falcon Shores 2 GPU Supercomputing Chip for 2026  

August 8, 2023

Intel is planning to onboard a new version of the Falcon Shores chip in 2026, which is code-named Falcon Shores 2. The new product was announced by CEO Pat Gel Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators Read more…

How the Chip Industry is Helping a Battery Company

May 8, 2024

Chip companies, once seen as engineering pure plays, are now at the center of geopolitical intrigue. Chip manufacturing firms, especially TSMC and Intel, have b Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire