Jude Savarraj, PhD, discusses the accurate performance of his team’s machine learning models in predicting subarachnoid hemorrhage outcomes.
Recent study data has suggested that machine learning (ML) models used to predict delayed cerebral ischemia (DCI) and functional outcomes significantly outperformed standard models (SMs) when used in subarachnoid hemorrhage (SAH) care.
The ML models accurately predicted DCI outcome (area under the receiver operating curve [AUC], 0.75 [standard deviation (SD), 0.07]; 95% CI, 0.64–0.84), discharge outcome (AUC, 0.85 [SD, 0.05]; 95% CI, 0.75–0.92), and 3-month outcome (AUC, 0.89 [SD, 0.03]; 95% CI, 0.81–0.94). The ML model outperformed the SMs in AUC by 0.20 (95% CI, –0.02 to 0.4) for DCI, 0.07 (95% CI, 0.0018–0.14) for discharge outcomes, and by 0.14 (95% CI, 0.03–0.24) for 3-month outcomes. Additionally, ML models matched physician performance in predicting 3-month outcomes.
Interestingly, these models performed best when they used clinician-determined Hunt-Hess (HH) score in addition to standard electronic medical record (EMR) variables (AUC, 0.85 [SD, 0.05]; 95% CI, 0.75–0.92) when compared to only using EMR data (AUC, 0.81 [SD, 0.05]; 95% CI, 0.71–0.89; P <.05).
NeurologyLive reached out to Jude Savarraj, PhD, an author of the study and a research scientist at the University of Texas Medical School to learn more about ML and the synergistic opportunities of combining human and computer skills.
Jude Savarraj, PhD: In SAH, one of the major clinical issues is DCI, which is a complication that happens in about 30% of the SAH patients that usually occurs at about seven days after SAH. Predicting DCI has been a central part of SAH research for decades. Currently, there's one treatment to address DCI, but there's been efforts to predict DCI sooner, so that there could be prophylactic intervention. And ML is really good at predicting stuff before it happens, so that's when the idea struck me: maybe we should be trying to use ML algorithms to see if we can predict this complication before symptoms arise.
I think the initial step is to demonstrate whether ML algorithms can, in fact, be useful clinically. The end goal is to have this algorithm be programmed into the back end of computers in the hospital, which would allow real time assessment on the patient’s status. That would be the ideal realization of this project, that you could go to a hospital EMR system and some sort of a bedside monitor would report that, “This patient needs more attention because he's going to develop this complication in the next day or so.” Now, we are still at the early stages of this project. So, we've demonstrated that ML algorithms can indeed be useful to predict this complication with reasonable accuracy. I think there are a few more steps that we have to do. First, we have to show that our model works on data sets from other institutions, and perhaps even make this model better by including more diverse set of patients, so that the model can learn from multiple patient instances from diverse institutions, as opposed to just data from our institution at this point.
The traditional way of predicting outcomes in SAH is the HH score assigned at admission, a 0–5 score. The physician usually goes and administers a bunch of neurological tests to the patient to see how alert they are and how responsive they are. The HH score pretty much determines how the patient is going to be doing at 3 months, or sometimes even 6 months after the stroke. The model that we developed does not use that score. It only uses data from the EMR to give a score. We found that model to be pretty effective. The next question was, how could we make a hybrid model using ML and parameters derived by human intuition? We found that this third model performed the best. So, I think what this reflects is that humans make assessments quite differently from machines. Human assessment is very subjective. It comes from years of experience and decisions are made on a number of factors, not all of which are quantifiable and some we can't even explain. People call it intuition. But machines have this ability to look at large amounts of data and take into account hundreds of variables very objectively—it's a completely different approach. But when we combine these 2 independent and diverse approaches, we were able to get better scores. I think that suggests that humans and computer models have to work together, instead of trying to replace one with another.
Transcript edited for clarity.