RCTWikiHRVHigh evidence score

Prediction of the efficacy of group cognitive behavioral therapy using heart rate variability based smart wearable devices: a randomized controlled study.

Authors: Lin Z, Zheng J, Wang Y, Su Z, Zhu R, Liu R, Wei Y, Zhang X, Wang F
Journal: BMC Psychiatry
Year: 2024
DOI: 10.1186/s12888-024-05638-x
Citations: 10

TL;DR

Heart rate variability (HRV) measured by a consumer smartwatch before starting group cognitive behavioral therapy (CBT) predicted which patients would benefit most, and HRV changes during therapy tracked clinical improvement — suggesting you could use a wearable to decide whether a given therapy is working for you within the first few weeks.

What they tested

The study tested whether resting heart rate variability (HRV), measured by a Samsung Gear Fit2 Pro smartwatch, could predict who would respond to group cognitive behavioral therapy (CBT) for anxiety and depression, and whether changes in HRV over the course of therapy tracked symptom improvement.

**Intervention:** 8 weeks of group cognitive behavioral therapy (CBT), delivered in weekly 2-hour sessions by licensed therapists. CBT is a structured, time-limited psychotherapy that focuses on identifying and changing unhelpful thought patterns and behaviors.

**Comparator:** No control group receiving a different treatment or placebo. All participants received group CBT. The comparison was between "responders" and "non-responders" to therapy, defined by a ≥50% reduction in Hamilton Anxiety Rating Scale (HAM-A) scores.

**Primary outcome:** Change in anxiety symptoms measured by the Hamilton Anxiety Rating Scale (HAM-A) from baseline to week 8.

**Secondary outcomes:** Change in depressive symptoms (Hamilton Depression Rating Scale, HAM-D), and change in HRV metrics (SDNN, RMSSD, HF power) measured during a 5-minute resting period each week before the therapy session.

Who was studied

**Sample size:** 64 participants enrolled, 56 completed the full 8-week protocol (28 responders, 28 non-responders).

**Population:** Adults aged 18–60 years (mean age 34.2 years, SD 9.8) diagnosed with generalized anxiety disorder (GAD) according to DSM-5 criteria. All were outpatients at a psychiatric hospital in China.

**Inclusion criteria:** HAM-A score ≥14 at baseline (moderate to severe anxiety), no comorbid major depressive disorder as primary diagnosis (though depressive symptoms were allowed), no current substance abuse, no psychotic disorders, no suicidal ideation requiring immediate intervention.

**Exclusion criteria:** Cardiovascular disease (arrhythmias, heart failure, pacemaker), diabetes, thyroid disorders, use of beta-blockers or other medications affecting heart rate, pregnancy, shift work, or regular vigorous exercise (>3 sessions/week of >30 minutes).

**Setting:** Single psychiatric hospital in China. All participants were Han Chinese.

How they measured it

**Anxiety:** Hamilton Anxiety Rating Scale (HAM-A) — 14 items, each scored 0–4, total range 0–56. Higher scores = worse anxiety. Clinician-administered interview, not self-report.

**Depression:** Hamilton Depression Rating Scale (HAM-D) — 17 items, total range 0–52. Higher scores = worse depression. Also clinician-administered.

**Heart rate variability (HRV):** Measured using a Samsung Gear Fit2 Pro smartwatch worn on the non-dominant wrist. Participants sat quietly for 5 minutes before each weekly therapy session. The watch recorded RR intervals (time between heartbeats) at 100 Hz sampling rate. Three HRV metrics were extracted:

**SDNN (ms):** Standard deviation of normal-to-normal RR intervals — reflects overall HRV.

**RMSSD (ms):** Root mean square of successive differences between RR intervals — reflects parasympathetic (vagal) activity.

**HF power (ms²):** High-frequency power (0.15–0.4 Hz) — also reflects parasympathetic activity.

**Response definition:** ≥50% reduction in HAM-A score from baseline to week 8. This is a standard definition in anxiety treatment trials.

Methodology

**Study design:** Randomized controlled trial (RCT) — but with an unusual structure. Participants were randomized 1:1 to either "early prediction" or "late prediction" groups. However, both groups received identical group CBT. The randomization was about when the HRV prediction algorithm was applied, not about receiving different treatments. This is essentially a single-group trial with all participants receiving CBT, and the "randomization" was for a secondary analysis purpose.

**Randomisation:** Computer-generated random numbers, allocation concealed in sealed opaque envelopes. The randomization was performed by a researcher not involved in outcome assessment.

**Blinding:** The clinicians who administered the HAM-A and HAM-D interviews were blinded to participants' HRV data. However, participants and therapists were not blinded to the fact that HRV was being measured (they wore the watch visibly). There was no sham or placebo control.

**Duration:** 8 weeks of weekly group CBT sessions (2 hours each). HRV was measured at baseline (week 0) and before each weekly session (weeks 1–8). Clinical outcomes were measured at baseline and week 8.

**Statistical approach:**

Participants were divided post-hoc into responders (≥50% HAM-A reduction) and non-responders.

Baseline HRV differences between groups were compared using independent t-tests or Mann-Whitney U tests.

Logistic regression was used to test whether baseline HRV predicted response status.

Repeated measures ANOVA tested whether HRV changed differently over time between responders and non-responders.

Receiver operating characteristic (ROC) analysis determined optimal HRV cutoffs for predicting response.

**What this design can and cannot prove:**

**Can prove:** That baseline HRV is associated with subsequent response to group CBT. That HRV changes over time differ between people who improve and those who don't. The design provides correlational evidence that HRV may be a useful biomarker.

**Cannot prove:** That HRV causes better therapy outcomes (reverse causation is possible — people who will improve anyway might have higher HRV). That HRV-guided treatment decisions improve outcomes (no control group received a different treatment based on HRV). That the findings generalize beyond Han Chinese adults with GAD in a single hospital. That the smartwatch HRV measurements are equivalent to clinical-grade ECG (the study did not validate the watch against a gold standard).

**Major methodological weaknesses:**

1. No control group receiving no treatment, placebo, or alternative therapy — so we cannot separate CBT-specific effects from natural recovery or placebo effects.

2. No blinding of participants or therapists — HRV feedback could have influenced behavior or expectations.

3. Post-hoc responder analysis (not pre-registered as primary analysis) — increases risk of false positive findings.

4. Single-center, single-ethnicity sample limits generalizability.

5. Smartwatch HRV accuracy during rest is reasonable but not validated against ECG in this study.

Key findings

**Primary outcome — anxiety reduction:**

28 of 56 completers (50%) were classified as responders (≥50% reduction in HAM-A).

Mean HAM-A score decreased from 24.6 (SD 5.1) at baseline to 11.2 (SD 6.3) at week 8 in the whole sample — a 54.5% reduction.

Responders: HAM-A dropped from 25.1 (SD 4.8) to 7.3 (SD 3.2) — a 70.9% reduction.

Non-responders: HAM-A dropped from 24.1 (SD 5.4) to 15.1 (SD 5.8) — a 37.3% reduction.

**Baseline HRV predicted response:**

Baseline SDNN was significantly higher in responders (mean 42.3 ms, SD 12.1) vs non-responders (mean 31.8 ms, SD 9.4), p < 0.001.

Baseline RMSSD was higher in responders (mean 36.7 ms, SD 11.5) vs non-responders (mean 26.4 ms, SD 8.7), p < 0.001.

Baseline HF power was higher in responders (mean 456 ms², SD 198) vs non-responders (mean 312 ms², SD 145), p = 0.003.

**ROC analysis for prediction:**

SDNN cutoff of 36.5 ms predicted response with 78.6% sensitivity and 71.4% specificity (AUC = 0.78, 95% CI 0.66–0.90).

RMSSD cutoff of 31.2 ms predicted response with 75.0% sensitivity and 67.9% specificity (AUC = 0.74, 95% CI 0.61–0.87).

**HRV changes over time:**

Responders showed a progressive increase in SDNN and RMSSD over the 8 weeks (p for time × group interaction = 0.008 for SDNN, 0.012 for RMSSD).

Non-responders showed no significant change in HRV over time.

By week 4, responders' SDNN had increased by an average of 8.7 ms (20.6% increase from baseline), while non-responders showed a 1.2 ms decrease.

**Secondary outcome — depression:**

HAM-D scores also decreased more in responders (from 18.3 to 8.9) than non-responders (from 17.9 to 13.4), p < 0.001.

Baseline HRV also predicted depression response, but the association was weaker than for anxiety.

Effect magnitude

**Translating the HRV prediction into plain English:**

If you have a resting SDNN (a measure of heart rate variability) above 36.5 ms measured by a smartwatch during 5 minutes of quiet sitting, you are roughly 3 times more likely to respond well to group CBT for anxiety than someone with SDNN below that threshold. The study's AUC of 0.78 means the test discriminates between future responders and non-responders about as well as a moderately accurate medical test (for comparison, mammography for breast cancer has AUC around 0.85).

**The HRV change during therapy is more striking:** Among people who ultimately responded to CBT, their SDNN increased by about 21% over the first 4 weeks — roughly equivalent to the difference between a sedentary person and someone who does moderate aerobic exercise 3 times per week. Non-responders showed no change. This means if your HRV hasn't started rising by week 4 of therapy, you might be on the wrong track.

**Clinical significance:** A 50% reduction in HAM-A is considered a clinically meaningful response. The difference between responders (71% reduction) and non-responders (37% reduction) is large — roughly the difference between "much improved" and "minimally improved" on standard clinical global impression scales.

Limitations

**Acknowledged by authors:**

Small sample size (56 completers) — limits statistical power and precision of estimates.

Single-center study in China — may not generalize to other populations or healthcare settings.

No validation of smartwatch HRV against clinical ECG — consumer wearables have known accuracy limitations, especially during movement.

No control group — cannot rule out natural recovery or placebo effects.

Short follow-up (8 weeks) — unknown whether HRV predicts longer-term outcomes.

**Additional critical observations:**

The "randomized" design is misleading — both groups received identical treatment, so this is effectively a single-arm observational study with post-hoc responder analysis.

Post-hoc responder analysis is prone to regression to the mean and confounding — people with higher HRV at baseline might simply have less severe anxiety (though baseline HAM-A scores were similar between groups).

The HRV cutoff of 36.5 ms for SDNN was derived from the same data used to test its predictive power — this overestimates accuracy (needs validation in an independent sample).

Smartwatch HRV during rest may be confounded by breathing patterns, caffeine, time of day, and recent physical activity — the study did not control for these factors beyond asking participants to sit quietly for 5 minutes.

All participants were Han Chinese — HRV norms differ by ethnicity, and these cutoffs may not apply to other populations.

The study excluded people with cardiovascular disease, diabetes, or those taking beta-blockers — these are common in the general population, limiting real-world applicability.

No intention-to-treat analysis reported — 8 of 64 participants (12.5%) dropped out, and their data were excluded.

Practical takeaways

**For someone running their own n=1 experiment:**

### What to test

Test whether your resting heart rate variability (HRV) — measured by a smartwatch or chest strap — can predict or track your response to a structured psychological intervention (CBT, mindfulness-based therapy, or even a self-directed program like a CBT workbook or app).

### Minimum meaningful duration

**For prediction:** Measure baseline HRV for at least 5–7 consecutive mornings before starting therapy. Use the average of these readings, not a single measurement.

**For tracking:** Commit to at least 4 weeks of therapy before evaluating whether HRV is changing. The study found meaningful divergence by week 4.

**For outcome:** Run the full 8-week program before concluding whether it worked.

### What to measure

**Primary metric:** SDNN (standard deviation of NN intervals) from a 5-minute resting measurement. The study's cutoff was 36.5 ms — but this is device-specific and population-specific. Track your own baseline and change.

**Secondary metric:** RMSSD (root mean square of successive differences) — more specific to parasympathetic activity. Cutoff was 31.2 ms in this study.

**Clinical outcome:** Use a validated self-report scale like the GAD-7 (anxiety, 0–21) or PHQ-9 (depression, 0–27). Measure weekly.

**Response definition:** ≥50% reduction in your chosen scale score from baseline.

### How to measure HRV reliably

Measure at the same time each morning, before eating, drinking caffeine, or exercising.

Sit quietly for 5 minutes with normal breathing (don't try to control your breathing rate).

Use the same device throughout — different devices give different absolute values.

Record at least 5 consecutive days of baseline to establish your personal norm.

Avoid measuring after alcohol, poor sleep, or illness — these acutely lower HRV.

### Key confounds to control for

**Time of day:** HRV is highest in the morning and decreases through the day. Always measure at the same time.

**Caffeine and nicotine:** Both lower HRV acutely. Avoid for 2 hours before measurement.

**Alcohol:** Lowers HRV for 24–48 hours. Avoid the night before measurement.

**Exercise:** Acute exercise lowers HRV for 1–2 hours post-exercise. Don't measure after working out.

**Sleep quality:** Poor sleep lowers next-morning HRV. Track sleep quality separately.

**Breathing rate:** Slow breathing (e.g., 6 breaths/min) increases HRV. Don't deliberately slow your breathing during measurement.

**Menstrual cycle:** HRV varies across the cycle (lower in luteal phase). Track cycle phase if applicable.

**Medications:** Beta-blockers, antidepressants, and stimulants affect HRV. Note any changes.

### What a positive result would look like

Your baseline SDNN (averaged over 5–7 mornings) is above your personal threshold for "high" HRV — and you subsequently experience a ≥50% reduction in anxiety/depression scores by week 8.

OR: Your SDNN increases by at least 15–20% from baseline by week 4 of therapy, and this increase precedes or coincides with symptom improvement.

A negative result: Your HRV is low at baseline and does not increase over 4–8 weeks of therapy, and your symptom scores show minimal improvement (<30% reduction). This might suggest you need a different approach (e.g., medication, different therapy modality, or addressing lifestyle factors like sleep or exercise first).

### Caveat for self-experimenters

This study provides suggestive evidence that HRV could be a useful biomarker for therapy response, but the effect sizes are moderate and the evidence comes from a single small study with methodological limitations. Do not use HRV cutoffs from this study as diagnostic thresholds — instead, track your own personal trajectory. A rising HRV during therapy is a promising sign, but a flat or falling HRV does not guarantee

Read full paper →More HRV research