Digital biology and healthcare have been a long time coming. In fact, they’re hardly here in any complete sense. But they seem much closer and were on impressive virtual display at GTC22 in a blend of product introductions/updates and promising case histories underpinned by – no surprise – Nvidia technology and collaborator expertise. It is Nvidia’s conference, after all.
Nvidia has been toiling in the computational biology vineyard – as have others – for many years. Industry-wide, those efforts started as ad hoc assemblies of often ill-fitting technologies tackling early DNA sequencing and biochemical pathways. Success with DNA sequencing was the accelerant. That was more than 20 years ago. Today advanced computational infrastructure, robotics, and key sensing/imaging tech are tightly knitted together with AI emerging as a kind of enlightening glue.
These new systems increasingly tackle a wide variety of bio-research and healthcare applications. In a GTC briefing earlier this week, Kimberly Powell, Nvidia vice president of healthcare, walked through Nvidia’s expanding healthcare portfolio from introduction of its new medical-grade Clara Holoscan platform to the world’s largest clinical language models (GatorTron) to a new genomics sequencing pipeline (UNAP), and the expanding ranks of digital biology collaborations at Nvidia’s dedicated-to-life-sciences supercomputer Cambridge-1.
Here’s a snapshot of GTC’s life science highlights discussed by Powell:
- Clara Holoscan MGX. Clara is Nvidia’s healthcare domain-specific software suite and includes a variety of frameworks and models. Clara Holoscan is the full-stack platform (hardware and software). The just-announced MGX version follows IEC60601 standard for hardware and IEC62304 for software turning the Clara Holoscan MGX into a medical-grade platform suitable for use by instrument makers and clinicians in regulated. environments.
- SynGatorTron/GatorTron. Nvidia announced SynGatorTron, developed with the University of Florida. SynGatorTron is, in essence, a factory for generating simulated clinical data not associated with real humans. This is a key enabler for training models for use in downstream healthcare. Nvidia is also releasing GatorTron’s pre-trained models free on Nvidia NGC. GatorTron is now the largest clinical language model in the world with 5 billion parameters, says Nvidia.
- UNAP. That’s short for ‘ultra rapid nanopore analysis pipeline’ that works in conjunction with Oxford Nanopore’s PrometION DNA sequencing platform. Powell highlighted efforts by Dr. Ewan Ashley to shorten sequencing time required for infants in critical care as well as collaboration to achieve the Guinness World Record for sequencing time (who knew there was such a thing).
- MegaMolBART. Nvidia announced early access to an updated MegaMolBART, a training framework and pre-trained model for drug discovery jointly developed with AstraZeneca. “It takes advantage of Nvidia’s NeMo Megatron, which is our large language model training framework and made it possible to train very large models. We’ve adapted NeMo Megatron to handle molecular data and created all of the training scripts as well as pre-trained models that can be used to generate new molecules that that the models never seen before,” said Powell.
- Cambridge-1 Efforts. Cambridge-1 is Nvidia’s wholly-owned supercomputer dedicated in to life sciences and based in the U.K. Grouped broadly around the topic of digital biology, Powell noted the “next wave” of collaborators “who all have a common theme of training, large datacenter scale transformer-based language models that essentially encapsulate the knowledge of DNA and protein sequences and chemistry.” Among them: ALCHEMAB, InstaDeep, Peptone, Relation Therapeutics.
Powell noted that GTC has become one of the most important healthcare events in industry. “It’s unique in that it brings together pioneers in academia, startups and industry from so many diverse fields – radiology pathology, microscopy, surgical robotics, protein engineering, drug discovery and genomics. There’s no other conference like it and this year in particular is a really big year for Big Pharma at GTC. We have speakers from AstraZeneca, GSK, Pfizer, Merck, Bristol Myers Squibb and Eli Lilly, and exciting new companies in digital biology and protein engineering,” she said.
Clara Holoscan MGX
Clara Holoscan is the centerpiece of Nvidia’s healthcare and life sciences portfolio. Launched last fall, and now upgraded to MGX, Nvidia positions it is a one-of-a-kind end-to-end platform for both the AI development and the production and deployment of medical AI in medical devices.
Clara has all the software elements. One is the framework Monai, which Powell said has been dubbed the PyTorch of healthcare. Started by Nvidia, Monai is open source. Amazon has built it into its Sage Maker machine learning platform and the UK’s National Health Service “is [using] Monai as their new operating system for hospitals to deploy AI application,” said Powell.
Another key piece is Nvidia FLARE (Federated Learning Application Runtime Environment) is intended for privacy-preserving model training in which models can be developed and shared without sharing data. “We have over 40 pre-trained models across critical data domains including imaging, drug discovery, natural language processing, and computer vision. We offer enterprise software for genomics analysis across DNA, cancer sequencing, single cell sequencing,” said Powell.
The full Clara Holoscan MGX includes necessary the hardware infrastructure (see slide above). Powell said, “You’ve seen how we’ve evolved Nvidia DRIVE, the platform for autonomous vehicles. Nvidia Clara Holoscan is essentially the same thing for medical devices. It’s a platform for real-time software-defined medical devices.”
Broadly, Clara Holoscan MGX is a three-chip scalable reference design for both embedding directly right into a new instrument or, as Nvidia calls it, a “sidecar” for at-the-edge computing.
“It’s based on Nvidia Orin to deliver up to 250 AI TOPs [Tera Operations Per Second] of inferencing and scaling to over 600 AI TOPs with an integrated RTX A6000 discrete GPU,” said Powell. “We also have Nvidia ConnectX-7 to deliver streaming IO [to] take streaming data right into an ultra-low latency processing pipeline that can utilize GPU computing. MGX also has safety, security and manageability built in with our SSM module. It has a baseboard management controller for over-the-air updates and monitoring and it has an external root-of-trust for boot security.” A full SDK is available for the new platform.
The Clara Holoscan reference design will be available through global partners – “ADLINK, Advantech, Dedicated Computing, Kontron, Leadtek, MBX Systems, Onyx Healthcare, Portwell, Prodrive, RYOYO Electric, and Yuan High-Tech.
If Clara Holoscan MGX was the big news, there are plenty more of significance.
Development of SynGatorTron is important. Nvidia says it is now world’s largest clinical language generation model with 5 billion parameters. Nvidia is also releasing two models that were developed on GatorTron – they will be free and on NGC. Comparisons shown by Nvidia suggest that data generated by SynGatorTron is just as effective in developing accurate models. This may well turn out to be a major given all of the difficulties encountered (regulatory and technical) when using actual patient data to build models. (Recent paper on GatorTron.)
“These models will substantially reduce the barrier for developing clinical applications used across the entire industry. Whether you’re a hospital system, a pharmaceutical company, insurance company, a contract research organization with clinical trials, there are so many untapped opportunities to be able to use clinical language models, whether you’re doing research, clinical trial matching, creating chatbots for, you know, patient interaction, event detection,” said Powell who showed joint work with Janssen Pharmaceutical to use the tools for pharmacovigilance.
Powel also review work by Janssen (part of Johnson & Johnson) using MegaMolBART and Nvidia’s NeMo Megatron for use in pharmacovigilance – monitoring post-care patients for adverse effects – in which Janssen was able to improve adverse event detection by 12 percent using domain specific models.
Throughout her briefing, Powell kept returning to the idea of domain-specific platforms, languages, and AI tools and one has the sense we see a broader of such domain-specific offerings from Nvidia over time.
Adoption of UNAP will bear watching just because there are so many kinds of DNA sequencing possible and many present different computational challenges. Also Oxford Nanapore’s system is somewhat though extremely exciting.