The associate professor in the Department of Rehabilitation Medicine at NYU Langone discussed a recent study on a new tool aimed at quantifying movements during stroke rehabilitation.
For years, the typical approach to quantifying or dosing rehabilitation interventions have been through manual counting and relying on recorded time stamps. The amount of exercise patients, more specifically those with stroke, need remains a mystery. Although, a new sensor-equipped program may alleviate some of those issues. In a recent study published in PLOS Digital Health, use of the PrimSeq digital tool was shown to effective aid patients’ recovery from stroke through accurately tracking of their movement intensity during rehabilitation therapy.
Led by Heidi Schambra, MD, the use of the PrimSeq’s 9 sensors led to more than 51,000 upper body movements recorded amongst 41 poststroke adults who were prescribed rehabilitation movements. The tool was 77% effective in identifying and counting arm motions prescribed to patients as part of these exercises. Although imperfect, the tool remains above current standards, according to Schambra.1,2
Schambra, an associate professor in the Department of Neurology and Department of Rehabilitation Medicine at NYU Langone, sat down with NeurologyLive® as part of a new iteration of NeuroVoices to discuss the tool in detail, along with the findings of the recent analysis. She provided insight on advantages it might bring to the stroke rehabilitation field, the overall quantitative approach, and the main take-home points from the study.
Heidi Schambra, MD: We call it PriSeq (pronounced PrimSeek), which is a play on words based on sequence-to-sequence deep learning algorithms. It seeks what we call primitive motions, which are these sorts of building block motions that help us count out how many people are training physically with their arms in rehabilitation. This tool is a combination of approaches. One, it’s a detailed motion capture of upper extremity or arm motions, rotations, and linear movements, as well as some joint angles. We capture this information with wearable sensors called inertial measurement units. These are transmitted at a very high rate; they detail what the whole upper body is doing as patients are undergoing rehab.
We’ve paired that with a deep learning algorithm that can hunt through all of the data that’re coming from these patients, and it has been taught or trained to recognize particular patterns of motions that link to these primitive motions. These primitive motions are our reach, transport or reposition, stabilization, and idle. We had previously found that almost all the rehab activities that we observed in our patients could be broken down into these five simple types of fundamental motions. The tool has been built or trained so that it can sniff through all the data and see patterns of, let’s say, reach, recognize these patterns, and then count them up. This gives us a nice readout of how much people are moving, and we get a quantitative measure of their training amount.
What we did to build this tool was take recordings from a bunch of stroke survivors—41 to be exact—with moderate to mild impairment, upper extremity impairment, and had them perform several rehab activities. We’re training this tool to perform in a rehab environment, so we don’t want our patients to be doing ballet practice, or piano practice, or baseball practice. We want to make sure this tool can perform well in a rehab setting. We had them come to the gym and do a bunch of activities while we recorded them with those wearable sensors. Synchronously, we also recorded them with video cameras. To train these algorithms to detect patterns and motions, we had to present it with lots and lots of examples of regions, transports, and so forth. To provide those examples, we had a whole host of fantastic, trained coders or labelers who looked through all those video data.
What they would do is mark on the video at the beginning of when a reach occurred and at the end of when the reach occurred, and that would not only segment the video, but also segment that associated sensor data. We would then pass off that packet of sensor data to the algorithm. Going along with the five different classes, it started to recognize those patterns. That’s the training portion of supervised machine learning. Once the algorithm had been trained, we wanted to see how it would do in a real-world type of situation where it had never seen a particular patient before. We took a segment of our stroke patients that it hadn’t learned off their data, but then we applied it to their motion data from their rehab activities and saw how well it predicted those motions.
We had our gold standard by human labelers. We could say, "The average did this, but how well did it do against our gold standard humans?" We can look at the sensitivity and false discovery rate with that. It’s also important how well it counted. It turns out it did really well with counting, and number wise, it was spitting out the appropriate numbers that were pretty close to what the patients were actually doing. We took it a step further and wanted to make sure what it was seeing was being appropriately counted. Sometimes algorithms can insert something that’s not there or may delete something that was there. It may predict something that was or wasn’t there and on average, still be net even.
We wanted to make sure that the counts reflected high performance, and it turns out it does pretty well. It’s not perfect. It’s in the high 70s (77%) for specificity, and that it’s mostly seeing things that are there, mostly calling things correctly, and not having high levels of insertion. It’s not misidentifying things or inserting things that aren’t there in terms of counts. This trained model outperforms the best-in-class other types of models that we compared it to. The neat thing about this model is based or ripped off the word detection or word recognition programs that you have on your phone. If you’re doing voice activation, it’s those algorithms that have been trained to detect certain words that are in the same way trained to detect movements based on patterns. Instead of spatial temporal frequencies of a word, we’re doing it with the motions, joint angles, and arms.