As a senior undergraduate student at Fisk University in Nashville, Tenn., Ifrah Khurram’s calculus professor, Dr. Sanjukta Hota, encouraged her to apply for the Sustainable Research Pathways Program (SRP). SRP was created by Sustainable Horizons Institute (SHI) and Berkeley Lab to provide students from underrepresented groups, including smaller colleges and universities, with exposure to high-performance computing (HPC). Through SRP, Khurram met Dr. Silvia Crivelli, a staff scientist at Berkeley Lab and Executive Director of the Molecular Science and Software Engineering program at the University of California, Berkeley. This opportunity initiated a sequence of events that set Khurram, now a medical student at San Juan Bautista School of Medicine in Puerto Rico, on a path to discovering her passion for research that bridges computing and healthcare.
Khurram was intrigued by Dr. Crivelli’s interdisciplinary approach, which combines computational expertise with medical insight, focusing on Obstructive Sleep Apnea (OSA). Joining Dr. Crivelli’s team was an easy decision for Khurram due to the welcoming atmosphere and the chance to directly impact patient care. Reflecting on her choice, Khurram explained: “Choosing this project within Dr. Crivelli’s team felt like a natural fit. Understanding how patients navigate various conditions associated with OSA, combined with my interest in using advanced computational methods to improve healthcare, led me to this research.”
Khurram’s research is now featured in a paper, “Towards Maps of Disease Progression: Biomedical Large Language Model Latent Spaces For Representing Disease Phenotypes And Pseudotime,” focused on the complexities of disease phenotypes linked to OSA. Using large language models (LLMs) and HPC resources, the study refines cohort phenotype definitions and offers insights for ongoing sleep apnea research. The work demonstrates the potential of LLMs combined with HPC to effectively organize clinical data and help identify sub-phenotypes related to OSA, contributing to the development of more personalized healthcare strategies.
A Pathway to Teamwork
Khurram’s work also represents a team effort led by Dr. Crivelli with oversight from Khurram’s advisor and two other SRP alumni from Hood College who currently work in Dr. Crivelli’s lab, Rafael Zamora-Resendiz, who served as Khurram’s direct mentor and guided her throughout the project, and Destinee Morrow, who compiled the data and worked alongside sleep physicians on materials, such as International Classification of Diseases (ICD) codes compilation, to ensure accurate interpretation of the data.
Building on previous work, their research focused on applications of LLMs and HPC to address issues related to OSA and its comorbidities. By analyzing a dataset of discharge reports from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database, they aimed to understand disease phenotypes linked to OSA. The significance of this work is underscored by its potential impact on the aging global population, which faces a higher risk of OSA and severe comorbidities. Leveraging the Perlmutter supercomputer at the National Energy Research Scientific Computing Center (NERSC), Khurram and the team employed data parallelism to distribute the processing workload efficiently across multiple GPUs. The study utilized advanced techniques such as K-means clustering and UMAP (Uniform Manifold Approximation and Projection) visualization to analyze the data, gaining insights into the subtle nuances of disease subtypes associated with OSA.
Research in ‘Color’
The researchers created a color mapping technique to explore GatorTron’s capabilities —a leading clinical LLM developed by the University of Florida—in discriminating OSA from various comorbid phenotypes. This method provides a visual representation of how LLMs organize different patient cohorts within the EHR (Electronic Health Records) dataset. Each color represents a specific comorbidity, allowing for better identification of distinct phenotypic patterns from the clinical text than by billing codes and procedures. For instance, blue denotes patients with heart failure, red represents those with OSA, and green indicates individuals with both conditions. Such detailed categorization not only aids in pinpointing unique patterns within clinical narratives but also deepens the understanding of how different conditions may interact and influence patient outcomes.
As the number of clusters expanded, GatorTron demonstrated heightened efficiency in organizing clinical narratives by shared characteristics, notably in grouping OSA patients with heart failure. Through the combination of LLMs, data compression, and color maps, the research illuminates the intricate interplay between OSA and comorbidities. This work offers insights into tailoring personalized healthcare strategies and optimizing treatment approaches for OSA patients.
Identifying sub-cohorts of OSA patients at risk for severe complications, such as heart failure, is key for optimizing resource allocation, particularly for costly treatments like continuous positive airway pressure (CPAP). While acknowledging the current limitations of LLMs in clinical medicine, Khurram’s study highlights the promising performance of models trained on larger datasets in organizing clinical data based on clinically relevant measures.
Reflecting on Khurram’s journey and the challenges of initiating students into the HPC realm, Dr. Crivelli shared her admiration for the medical student’s dedication and intellect.
“She is very passionate about research and is not afraid to try new things,“ Dr. Crivelli noted. “In fact, she eagerly jumped into our computational medicine project despite not being familiar with the field. Ifrah is smart, creative, motivated, and independent.”
With Khurram’s blend of academic excellence and strong dedication, she embodies the promise of a new generation of physicians who will be adept at applying data-driven insights and HPC to advance medical science.