Consumer Sleep Tracking Devices Perform Equally or Better Than Actigraphy


No devices performed better than polysomnography, but the researchers believe they warrant further testing.

Rachel R. Markwald, PhD, head, sleep, tactical efficiency and endurance lab, Naval Health Research Center

Rachel R. Markwald, PhD

A recent study evaluated consumer sleep-tracking devices and found that most devices performed well in detecting sleep and performed as well as or better than actigraphy in detecting wake.

The Fatigue Science Readiband, Fitbit Alta HR, EarlySense Live, ResMed S+, and SleepScore Max devices all performed as well as or better than actigraphy on sleep and wake measures. Garmin devices performed worse than actigraphy. Overall, all devices had high epoch-by-epoch (EBE) sensitivity of over 0.93, with low-to-medium specificity (0.18–0.54). The devices performed worse on nights with poor or disturbed sleep and accuracy was mixed in determining sleep stages.

Senior author Rachel R. Markwald, PhD, head, sleep, tactical efficiency and endurance lab, Naval Health Research Center, and colleagues wrote that “polysomnography (PSG) provides the most direct assessment and thus has remained the gold-standard technique in research laboratories and sleep medicine clinics for over half a century... However, PSG is not practical outside the laboratory or clinic due to a number of factors... Actigraphy, on the other hand, overcomes many of these barriers. Actigraphy utilizes a research-grade wrist-worn device to collect physical activity data that are later processed with algorithms to estimate sleep and wake.”

Markwald and colleagues investigated 7 devices in total, 4 of which are worn on the wrist: the Fatigue Science Readiband, the Fitbit Alta HR, the Garmin Fenix 5x, and the Garmin vivosmart 3. The other 3 devices are kept nearby and plugged into the wall for power: the EarlySense Live, the ResMed S+, and the SleepScore Max. They tested these devices on 34 healthy adults with an average age of 28.1 years (standard deviation [SD], 3.9), 22 (64.7%) of which were women. The readings of these devices were compared to PSG conducted in the lab.

WATCH NOW: Raman Malhotra, MD: What to Look Forward to in Sleep Medicine

The Fatigue Science Readiband was tested in 15 participants, the Fitbit Alta HR in 20, the Garmin Fenix 5x in 11, the Garmin VivoSmart 3 in 15, the EarlySense Live in 19, the ResMed S+ in 19, and the SleepScore Max in 15. The research grade Actiwatch 2 was also used by all participants during the pre-study period.

Markwald and colleagues found that sensitivity for all devices versus PSG was very high (all ≥.93) in EBE agreement of sleep versus wake, but specificity ranged from 0.18–0.54, with both Garmin models at the lowest end of specificity. Positive predictive value (PPV) scores of PSG agreement of EBE ranged from 0.88–0.93 but negative predictive values (NPV) ranged from 0.55–0.74. The Fitbit Alta HR had the highest values across light (0.76 sensitivity) and REM sleep (0.69 sensitivity) EBE agreement measures.

“Overall, the consumer sleep-tracking devices we tested had high sensitivity but relatively lower specificity, indicating a tendency for the devices to accurately detect sleep but to less accurately detect wake compared with the gold-standard sleep measurement technique PSG,” the authors wrote.

The Actiwatch significantly differed from PSG on all sleep/wake measures. It overestimated total sleep time (TST) and sleep efficiency (SE) and underestimated sleep onset latency (SOL), latency to persistent sleep (LPS), and wake after sleep onset (WASO; all P <.001). The Garmin Fenix 5S, Garmin vivosmart 3, and EarlySense Live also significantly overestimated TST and SE (all P ≤.02). 

The Fatigue Science Readiband, Fitbit Alta HR, Garmin Fenix 5x, EarlySense Live, ResMed S+, and SleepScore Max all significantly differed (all P <.01) from PSG on sleep latency measures SOL and LPS, although the difference was not as great as the Actiwatch and no device significantly differed from PSG on both SOL and LPS. WASO was significantly underestimated in the Garmin Fenix 5S, Garmin Vivosmart 3, EarlySense Live, and SleepScore Max (all P ≤.025). 

All 6 consumer devices significantly overestimated light sleep as compared to PSG (all P ≤.006). The EarlySense Live, ResMed S+, and SleepScore Max also significantly overestimated deep sleep (all P ≤.017). The Fitbit Alta HR, ResMed S+, and SleepScore Max significantly underestimated rapid eye movement (REM) sleep (all P ≤.018). 

“The wide use, rapid technological advancement, and promising initial research findings demonstrating the improved sleep-tracking performance of many recent-generation consumer devices warrant further testing versus gold-standards in different conditions, populations, and settings in order to evaluate their wider validity and utility—towards their consideration as possible valid alternatives to actigraphy,” Markwald and colleagues concluded.

Chinoy ED, Cuellar JA, Huwa KE, et al. Performance of seven consumer sleep-tracking devices compared with polysomnography. Sleep. December 2020; 291.
Related Videos
Kevin Church, PhD
Merit Cudkowicz, MD, MSc
Jessica Ailani, MD
Frederic Schaper, MD, PhD
Jaime Imitol, MD
Jason M. Davies, MD, PhD
Carolyn Bernstein, MD
© 2024 MJH Life Sciences

All rights reserved.