A deep learning framework developed by university researchers aims to convert brain signals recorded by an implant into synthesized speech, aiding those who have lost the ability to speak due to neurological disorders.
Investigators at the University of California at San Francisco reported last month on a speech synthesis technique based on neural decoding of spoken sentences. The approach represents an advance over current approaches allowing speech-impaired persons to write thoughts one letter at a time. Converting brain signals into speech via a deep learning model trained on audible input would boost the performance of such a system closer to the 150 words a minute uttered by the average speaker.
The researchers captured “high density” brain signals via intercranial implants in five patients being evaluated for epilepsy surgery. All could speak normally, but the prototype “prosthetic voice” has yet to be tested on, for example, stroke victims that would make brain signal decoding more difficult.
Using Nvidia Tesla GPUs, they trained recurrent neural networks to decode recorded brain activity used to generate synthesized speech. The deep learning framework captured sentences spoken aloud along with the corresponding cortical signals. The researchers used GPUs to infer articulatory kinematics—that is, the physical mechanisms used to produce speech—from audio recordings.
The resulting algorithm correlated the resulting patterns between speech and brain signals with subtle movements of patients’ lips, tongue, larynx and jaw.
“This study demonstrates that we can generate entire spoken sentences based on an individual’s brain activity,” Edward Chang, a professor of neurological surgery and member of the UCSF Weill Institute for Neuroscience, added in an Nvidia blog post. “We should be able to build a device that is clinically viable in patients with speech loss.”
The UC-San Francisco team previously reported on their recurrent neural network used to decode “vocal tract physiological signals from direct cortical recordings,” converting them to synthesized speech. They claimed “robust decoding performance” using as little as 25 minutes of training data.
“Our goal was to demonstrate the feasibility of a neural speech prosthetic by translating brain signals into intelligible synthesized speech at the rate of a fluent speaker,” they added. “Naïve listeners were able to accurately identify these decoded sentences.”
Significantly, the researchers also tested their system using “mimed” speech with promising results, as previously reported, but that still leaves open the question of whether the system would work in the absence of kinetic movement.
The speech synthesis research is part of a larger push by Nvidia into medical AI. Last month, it announced collaboration with medical groups to use GPU-powered AI tools for clinical research, including radiology and drug discovery.
Feature image source: UCSF Neurosurgery video