Machine Learning Model Using Speech Acoustics Identifies Neurodegenerative Diseases With High Accuracy


A recently published study revealed that speech markers can have a high accuracy in distinguishing between neurodegenerative diseases and healthy speech, emphasizing the significance of speech analysis in disease assessment.

Adam P. Vogel, PhD, professor and director of the Centre for Neuroscience of Speech at The University of Melbourne, and chief science officer at Redenlab

Adam P. Vogel, PhD

Newly published in the journal of IEEE Transactions on Neural Systems and Rehabilitation Engineering, a study demonstrated that speech markers can recognize neurodegenerative diseases and identify healthy speech from pathological speech with high accuracy. These findings highlight the importance of examining speech outcomes in the assessment of these diseases and suggest that large-scale initiatives are needed to broaden the scope for differentiating other neurological diseases.1

In the study, using acoustic properties of speech alone, the overall model performance identified patients with Friedreich ataxia (FA, n = 73), multiple sclerosis (MS, n = 122), and healthy controls (HC, n = 229) with an 82% accuracy rate. In the findings, classification accuracy was higher for HC compared with FA (P <.001) and MS (P <.001), and higher for FA compared with MS (P<.001). Notably, the results pointed to 21 acoustic features that were strong markers of neurodegenerative diseases, falling under the labels of spectral qualia, spectral power, and speech rate.

Clinical Takeaways

  • Speech markers, as analyzed through machine learning, potentially exhibit a high accuracy in identifying neurodegenerative diseases, suggesting promise for diagnostics.
  • The study identified 21 acoustic features, including spectral qualia, spectral power, and speech rate, as robust markers of neurodegenerative diseases.
  • Machine learning and speech analysis present an opportunity for healthcare as a potential tool for initial detection, monitoring disease progression, and refining test selection for differential diagnosis.

“Digital objective measures of speech were able to separate the speech of individuals with different diseases. I personally was surprised the methods were so accurate, as speech can vary within and between people,” senior author Adam P. Vogel, PhD, professor and director of the Centre for Neuroscience of Speech at The University of Melbourne, and chief science officer at Redenlab, told NeurologyLive®. "The approach, using sophisticated signal processing and machine learning, could help triage diagnostic pathways at initial medical consults before specialist services are involved. This means different disease groups, recording in different environments (e.g. on smartphones), move beyond just speech (how we sound) to also include language, and refined AI modeling."

To broaden the utility of speech markers, Investigators examined how multiple acoustic features can distinguish neurodegenerative diseases. The authors used supervised machine learning with gradient boosting, utilizing CatBoost, to identify healthy speech in the HC group from speech of patients with MS or FA. In assessment of the machine learning model, the participants performed a diadochokinetic task where they repeated alternating syllables in their speech. The authors then applied 74 spectral and temporal prosodic features from the speech recordings of the patients to the machine learning model.

READ MORE: CSA Spinal Cord Damage Identified as Potential Biomarker of Friedreich Ataxia Disease Progression

All told, HCs were recognized by a less steep and less variable spectral decrease, a smaller spectral spread and range of energy produced in low frequencies, greater energy in frequency bands, and shorter utterance durations. The FA group was identified by low intensity and energy in low, high, and broadband frequency bands, a higher and more variable spectral spread, and longer utterance durations. As for the MS group, the patients were characterized by a steeper and more variable spectral decrease, as well as utterance durations and spectral spread values that fell between the control and FA groups. Additional useful acoustic features in distinguishing between the groups included metrics of speech timing, spectral features, formants 1-5, and the alpha ratio.

“Practitioners could use this information to refine test selection for differential diagnosis. This would be particularly useful for people living in rural communities with increased travel burdens or during situations where the risk of infection is heightened (e.g., pandemics). Speech markers can be used as a remote tool to initially detect signs of neurodegenerative disease, expand our understanding of the clinical characteristics of these diseases to improve our ability to develop targeted interventions, and to monitor disease progression or treatment response,” Vogel and colleagues wrote.1

Other acoustic features that might increase the accuracy and sensitivity of the machine learning algorithm were not considered in the current study. It was also noted that there was little agreement for the best way to extract the burst and vowel onset times as well as which acoustic features should be considered. Similarly, the authors did not include other voice assessment tasks that can more reliably measure certain features to constrain the number of variables and tasks to avoid overfitting. Additionally, the inclusion of nonspeech performance measurements could also increase discrimination accuracy.2,3 The approach did not aim to determine the severity or stage of the disease and thus, investigators concluded that future studies could employ an approach in which the severity of the disease is predicted or estimated following identification.

“This novel application of machine learning and acoustic analysis paves the way for new pre-diagnostic methods that could leverage big data to discriminate between a range of neurodegenerative diseases and/or other conditions. Through initiatives that obtain and share speech data from various clinical populations, our innovative approach could be applied to any population that is able to produce speech,” Vogel et al noted.1 “The implications of this approach are substantial and provide new opportunities for healthcare, particularly for remote and rural areas where access to health providers might be limited."

1. Schultz BG, Joukhadar Z, Nattala U, et al. Disease Delineation for Multiple Sclerosis, Friedreich Ataxia, and Healthy Controls Using Supervised Machine Learning on Speech Acoustics. IEEE Trans Neural Syst Rehabil Eng. 2023;31:4278-4285. doi:10.1109/TNSRE.2023.3321874
2. Lee L, Stemple JC, Glaze L, Kelchner LN. Quick screen for voice and supplementary documents for identifying pediatric voice disorders. Lang Speech Hear Serv Sch. 2004;35(4):308-319.
3. Zhang Z. Variable selection with stepwise and best subset approaches. Ann Transl Med. 2016;4(7):136. doi:10.21037/atm.2016.03.35
Related Videos
Michael Levy, MD, PhD
Tarun Singhal, MD, MBBS
Jaime Imitol, MD
Eoin P. Flanagan, MB, BCh
Eoin P. Flanagan, MB, BCh
© 2024 MJH Life Sciences

All rights reserved.