Deriving Pseudo-HRV Features from Intermittent Heart Rate Samples: Bridging the Gap Between Smartwatches and Clinical ECG
Abstract
Heart rate variability (HRV) is a well-established biomarker for autonomic nervous system activity, stress, and emotional regulation. Clinical HRV analysis requires continuous electrocardiogram (ECG) recordings with beat-to-beat (R-R interval) precision—a requirement that off-the-shelf smartwatches fundamentally cannot meet. Consumer wearables such as Garmin devices provide photoplethysmography (PPG)-derived heart rate samples at intervals of 1–5 seconds, yielding an intermittent, low-resolution signal. This article presents a systematic methodology for deriving pseudo-HRV features from these sparse heart rate streams, evaluates their correspondence with ECG-derived HRV metrics, and discusses the practical implications for stress estimation research. We introduce time-domain approximations (pseudo-RMSSD, pseudo-SDNN, HR slope variability), frequency-domain surrogates via Lomb-Scargle periodograms, and nonlinear entropy measures adapted for irregular sampling. A Python-based pipeline illustrates each step, from artifact rejection and interpolation through feature extraction and validation against ECG ground truth using Bland-Altman analysis and intraclass correlation coefficients.
1. Introduction
1.1 The HRV Gold Standard
Heart rate variability—the variation in time intervals between consecutive heartbeats—is one of the most widely used non-invasive markers of autonomic function (Task Force of the European Society of Cardiology, 1996). In stress research, decreased HRV (particularly reduced parasympathetic indices such as RMSSD and high-frequency power) reliably tracks increased sympathetic activation and psychological stress (Thayer et al., 2012). The gold standard for HRV measurement is the clinical ECG, which captures the electrical activity of the heart at sampling rates of 250–1000 Hz, enabling precise detection of R-peaks and calculation of R-R intervals (also called inter-beat intervals, or IBIs) with millisecond accuracy.
1.2 The Smartwatch Reality
Consumer smartwatches use optical photoplethysmography (PPG) sensors to estimate heart rate. While PPG can theoretically capture pulse-to-pulse intervals (pulse rate variability, or PRV), most off-the-shelf devices do not expose raw PPG waveforms or beat-to-beat intervals to developers. Instead, they report aggregated heart rate values at intervals determined by firmware—typically every 1 to 5 seconds for Garmin devices in active recording mode, and as infrequently as every 15 minutes during passive background monitoring.
This creates a fundamental mismatch: clinical HRV requires ~1000 Hz ECG data yielding individual beat timings, while smartwatches provide ~0.2–1 Hz heart rate estimates that average over multiple beats.
1.3 Why “Pseudo-HRV”?
We use the term pseudo-HRV to distinguish features derived from intermittent HR samples from true HRV computed from beat-to-beat intervals. Pseudo-HRV features are surrogate metrics designed to capture the same underlying autonomic dynamics but computed from a fundamentally different (and lower-quality) input signal. The central question of this article is: can these surrogates provide meaningful stress-related information despite their limitations?
1.4 Scope and Contribution
This article contributes:
- A formal taxonomy of pseudo-HRV features computable from 1–5 second HR samples
- Python implementations for each feature category
- A discussion of interpolation, resampling, and artifact handling strategies
- A validation framework comparing pseudo-HRV against ECG-derived HRV
- Practical guidelines for when pseudo-HRV is (and is not) appropriate
2. From R-R Intervals to Heart Rate Samples: What Is Lost?
2.1 Information Content Comparison
To understand what pseudo-HRV can and cannot capture, we must first understand what information is discarded when moving from ECG to aggregated HR.
| Property | ECG R-R Intervals | Smartwatch HR Samples |
|---|---|---|
| Temporal resolution | ~1 ms (beat-to-beat) | 1,000–5,000 ms (sample interval) |
| Signal origin | Electrical (QRS complex) | Optical (PPG blood volume) |
| Individual beat timing | Yes | No (averaged) |
| Respiratory sinus arrhythmia | Fully captured | Partially captured (aliased) |
| Ectopic beat detection | Possible | Not possible |
| Frequency content preserved | Up to ~0.5 Hz | Up to ~0.1–0.5 Hz (Nyquist) |
| Typical recording context | Clinical, controlled | Ambulatory, daily life |
Key losses:
- Beat-to-beat resolution: Each HR sample aggregates multiple beats, smoothing out rapid fluctuations
- High-frequency HRV: With 1 Hz sampling, the Nyquist frequency is 0.5 Hz—just barely covering the HF band (0.15–0.4 Hz). At 0.2 Hz sampling (5-second intervals), HF content is entirely aliased
- Ectopic beat information: Premature beats and artifacts visible in ECG are invisible in aggregated HR
- Phase information: The precise timing of autonomic modulation within each respiratory cycle is lost
2.2 What Is Preserved
Despite these losses, important information survives:
- Low-frequency trends: Sympathetic activation and withdrawal operate on timescales of tens of seconds to minutes, well within the capture range of 1–5 second sampling
- Overall variability: The spread of HR values across a window reflects total autonomic modulation
- Directional changes: HR increases/decreases associated with stress onset and recovery are clearly visible
- Relative comparisons: Within-subject, within-session comparisons (baseline vs. stress) remain valid even if absolute HRV values are biased
3. Preprocessing: From Raw Samples to Analysis-Ready Signal
3.1 Loading Smartwatch Data
Garmin devices export data in the proprietary .fit (Flexible and Interoperable Data Transfer) format. Heart rate records contain a timestamp and a heart rate value in beats per minute (bpm).
from fitparse import FitFile
import pandas as pd
import numpy as np
def load_garmin_hr(fit_path: str) -> pd.DataFrame:
"""
Extract heart rate time series from a Garmin .fit file.
Returns DataFrame with columns: timestamp (UTC datetime), hr (bpm)
"""
fitfile = FitFile(fit_path)
records = []
for record in fitfile.get_messages('record'):
ts = None
hr = None
for field in record:
if field.name == 'timestamp':
ts = field.value
elif field.name == 'heart_rate':
hr = field.value
if ts is not None and hr is not None:
records.append({'timestamp': ts, 'hr': hr})
df = pd.DataFrame(records)
df['timestamp'] = pd.to_datetime(df['timestamp'], utc=True)
df = df.sort_values('timestamp').reset_index(drop=True)
return df
3.2 Artifact Detection and Removal
Even aggregated HR values can contain artifacts caused by sensor lift-off, motion, or firmware glitches. We apply a three-stage filter:
Stage 1 — Physiological range gating: Remove samples outside 30–220 bpm.
Stage 2 — Successive difference threshold: Flag samples where |HR(t) - HR(t-1)| exceeds a threshold relative to the local median. A sudden jump of 40+ bpm between consecutive 1-second samples almost certainly reflects artifact, not physiology.
Stage 3 — Accelerometer-based motion gating (when available): Exclude windows where accelerometer magnitude exceeds a threshold, indicating movement that degrades PPG signal quality.
def clean_hr_signal(df: pd.DataFrame,
hr_min: float = 30.0,
hr_max: float = 220.0,
max_successive_diff: float = 40.0) -> pd.DataFrame:
"""
Remove artifacts from HR time series.
Args:
df: DataFrame with 'timestamp' and 'hr' columns
hr_min: Minimum plausible HR (bpm)
hr_max: Maximum plausible HR (bpm)
max_successive_diff: Maximum allowed HR change between consecutive samples
Returns:
Cleaned DataFrame with artifact rows removed
"""
# Stage 1: Physiological range
mask = (df['hr'] >= hr_min) & (df['hr'] <= hr_max)
# Stage 2: Successive difference
diffs = df['hr'].diff().abs()
mask &= (diffs <= max_successive_diff) | diffs.isna()
cleaned = df[mask].copy().reset_index(drop=True)
pct_removed = 100 * (1 - len(cleaned) / len(df))
print(f"Artifact removal: {pct_removed:.1f}% of samples removed")
return cleaned
3.3 Gap Detection and Interpolation
After artifact removal, gaps may appear in the time series. We distinguish between:
- Short gaps (< 10 seconds): Filled via cubic spline interpolation
- Medium gaps (10–60 seconds): Filled via linear interpolation with a quality flag
- Long gaps (> 60 seconds): Not interpolated; the segment is split into separate analysis windows
from scipy.interpolate import CubicSpline
def interpolate_hr(df: pd.DataFrame,
target_interval_s: float = 1.0,
max_gap_short_s: float = 10.0,
max_gap_medium_s: float = 60.0) -> pd.DataFrame:
"""
Resample HR to uniform intervals with gap-aware interpolation.
Returns DataFrame with uniform timestamps and interpolated HR.
Gaps longer than max_gap_medium_s produce NaN values.
"""
t_seconds = (df['timestamp'] - df['timestamp'].iloc[0]).dt.total_seconds().values
hr_values = df['hr'].values
# Create uniform time grid
t_uniform = np.arange(t_seconds[0], t_seconds[-1], target_interval_s)
# Cubic spline on original data
cs = CubicSpline(t_seconds, hr_values, extrapolate=False)
hr_interp = cs(t_uniform)
# Identify gaps in original data and mask accordingly
result = pd.DataFrame({
'timestamp': df['timestamp'].iloc[0] + pd.to_timedelta(t_uniform, unit='s'),
'hr': hr_interp,
'interpolated': True
})
# Mark original sample positions
for _, row in df.iterrows():
t_orig = (row['timestamp'] - df['timestamp'].iloc[0]).total_seconds()
idx = np.argmin(np.abs(t_uniform - t_orig))
if abs(t_uniform[idx] - t_orig) < target_interval_s:
result.loc[idx, 'interpolated'] = False
# Null out long gaps
for i in range(1, len(t_seconds)):
gap = t_seconds[i] - t_seconds[i - 1]
if gap > max_gap_medium_s:
gap_mask = (t_uniform > t_seconds[i - 1]) & (t_uniform < t_seconds[i])
result.loc[gap_mask, 'hr'] = np.nan
elif gap > max_gap_short_s:
# Linear interpolation for medium gaps (override cubic)
gap_mask = (t_uniform > t_seconds[i - 1]) & (t_uniform < t_seconds[i])
gap_indices = np.where(gap_mask)[0]
if len(gap_indices) > 0:
result.loc[gap_indices, 'hr'] = np.interp(
t_uniform[gap_indices],
[t_seconds[i - 1], t_seconds[i]],
[hr_values[i - 1], hr_values[i]]
)
return result
3.4 Conversion to Pseudo-IBI
A critical intermediate step is converting HR (bpm) to approximate inter-beat intervals (ms). While each HR sample represents an average over multiple beats, the instantaneous IBI approximation provides a bridge to standard HRV formulas:
IBI_approx(t) = 60,000 / HR(t) [milliseconds]
This conversion is exact only if HR is computed from a single beat interval—which it is not for smartwatch data. Nevertheless, it provides a useful proxy, especially when HR is relatively stable within the averaging window.
def hr_to_pseudo_ibi(hr_series: pd.Series) -> pd.Series:
"""Convert HR (bpm) to approximate IBI (ms)."""
return 60_000.0 / hr_series
4. Pseudo-HRV Feature Taxonomy
We organize pseudo-HRV features into four categories: time-domain, frequency-domain, nonlinear, and trend-based. Each feature is defined formally, with a discussion of how it relates to (and diverges from) its ECG-HRV counterpart.
4.1 Time-Domain Features
4.1.1 Pseudo-SDNN (Standard Deviation of NN Intervals)
ECG definition: Standard deviation of all normal-to-normal R-R intervals in a window.
Pseudo-HRV adaptation: Standard deviation of pseudo-IBI values in a window.
def pseudo_sdnn(hr_window: np.ndarray) -> float:
"""Compute pseudo-SDNN from HR samples (bpm)."""
ibi = 60_000.0 / hr_window
return np.std(ibi, ddof=1)
Expected bias: Pseudo-SDNN underestimates true SDNN because the averaging inherent in HR computation smooths beat-to-beat variability. The degree of underestimation depends on the ratio of HR sampling interval to mean IBI.
4.1.2 Pseudo-RMSSD (Root Mean Square of Successive Differences)
ECG definition: Square root of the mean of squared successive differences between adjacent R-R intervals.
Pseudo-HRV adaptation: RMSSD computed on pseudo-IBI values. This is the most commonly used parasympathetic index and the most challenging to approximate from low-frequency data.
def pseudo_rmssd(hr_window: np.ndarray) -> float:
"""Compute pseudo-RMSSD from HR samples (bpm)."""
ibi = 60_000.0 / hr_window
successive_diffs = np.diff(ibi)
return np.sqrt(np.mean(successive_diffs ** 2))
Expected bias: Substantial underestimation, particularly at low sampling rates. At 1 Hz, successive differences capture only slow fluctuations; the rapid beat-to-beat changes that dominate RMSSD are lost.
4.1.3 Pseudo-pNN50
ECG definition: Percentage of successive R-R intervals differing by more than 50 ms.
Pseudo-HRV adaptation: Same computation on pseudo-IBI, but with an adjusted threshold. Since averaging compresses variability, a lower threshold (e.g., 20 ms) may be more appropriate for smartwatch data.
def pseudo_pnn(hr_window: np.ndarray, threshold_ms: float = 50.0) -> float:
"""Compute pseudo-pNNxx from HR samples."""
ibi = 60_000.0 / hr_window
successive_diffs = np.abs(np.diff(ibi))
return 100.0 * np.sum(successive_diffs > threshold_ms) / len(successive_diffs)
4.1.4 HR Range and Coefficient of Variation
These simpler metrics are less affected by the smoothing problem:
def hr_range(hr_window: np.ndarray) -> float:
"""Max HR - Min HR in window."""
return np.max(hr_window) - np.min(hr_window)
def hr_cv(hr_window: np.ndarray) -> float:
"""Coefficient of variation of HR: SD / mean × 100."""
return 100.0 * np.std(hr_window, ddof=1) / np.mean(hr_window)
4.2 Frequency-Domain Features
4.2.1 The Sampling Problem
Standard frequency-domain HRV uses the power spectral density (PSD) of the R-R interval time series, typically computed via FFT after resampling to uniform 4 Hz. The canonical bands are:
- VLF: 0.003–0.04 Hz (thermoregulation, hormonal)
- LF: 0.04–0.15 Hz (mixed sympathetic/parasympathetic)
- HF: 0.15–0.40 Hz (parasympathetic, respiratory)
With smartwatch HR sampled at 1 Hz, the Nyquist frequency is 0.5 Hz—barely covering the HF band. At 0.2 Hz (5-second intervals), only VLF and part of LF are accessible. Furthermore, the non-uniform sampling of real smartwatch data makes FFT inappropriate without resampling.
4.2.2 Lomb-Scargle Periodogram
The Lomb-Scargle method estimates spectral power from unevenly sampled data without requiring interpolation to uniform intervals, making it well-suited for smartwatch HR:
from scipy.signal import lombscargle
def pseudo_frequency_features(timestamps_s: np.ndarray,
hr_values: np.ndarray) -> dict:
"""
Compute frequency-domain pseudo-HRV features using Lomb-Scargle.
Args:
timestamps_s: Time in seconds (not necessarily uniform)
hr_values: HR in bpm
Returns:
Dictionary with VLF, LF, HF power and LF/HF ratio
"""
# Convert to pseudo-IBI
ibi = 60_000.0 / hr_values
# Remove mean (detrend)
ibi_centered = ibi - np.mean(ibi)
# Define frequency grid
freq = np.linspace(0.003, 0.5, 1000)
angular_freq = 2 * np.pi * freq
# Compute Lomb-Scargle periodogram
pgram = lombscargle(timestamps_s, ibi_centered, angular_freq, normalize=True)
# Integrate power in bands
df = freq[1] - freq[0]
vlf_mask = (freq >= 0.003) & (freq < 0.04)
lf_mask = (freq >= 0.04) & (freq < 0.15)
hf_mask = (freq >= 0.15) & (freq <= 0.40)
vlf_power = np.trapz(pgram[vlf_mask], freq[vlf_mask])
lf_power = np.trapz(pgram[lf_mask], freq[lf_mask])
hf_power = np.trapz(pgram[hf_mask], freq[hf_mask])
total_power = vlf_power + lf_power + hf_power
return {
'vlf_power': vlf_power,
'lf_power': lf_power,
'hf_power': hf_power,
'total_power': total_power,
'lf_hf_ratio': lf_power / hf_power if hf_power > 0 else np.nan,
'lf_norm': lf_power / (lf_power + hf_power) if (lf_power + hf_power) > 0 else np.nan,
'hf_norm': hf_power / (lf_power + hf_power) if (lf_power + hf_power) > 0 else np.nan
}
Caveat: HF power estimates from 1 Hz data should be interpreted cautiously. The Lomb-Scargle method can estimate power near the Nyquist limit, but the estimates are noisy and potentially biased. For smartwatch data sampled at intervals > 2 seconds, we recommend reporting only VLF and LF features and noting that HF estimates are unreliable.
4.3 Nonlinear Features
4.3.1 Sample Entropy (SampEn)
Sample entropy quantifies the complexity/regularity of a time series. Lower SampEn indicates more regularity, which in HRV analysis is associated with stress and reduced autonomic flexibility (Richman & Moorman, 2000).
def sample_entropy(signal: np.ndarray, m: int = 2, r_factor: float = 0.2) -> float:
"""
Compute sample entropy of a signal.
Args:
signal: Input time series
m: Embedding dimension
r_factor: Tolerance as fraction of signal SD
"""
N = len(signal)
r = r_factor * np.std(signal)
def count_matches(template_len):
count = 0
templates = np.array([signal[i:i + template_len]
for i in range(N - template_len)])
for i in range(len(templates)):
for j in range(i + 1, len(templates)):
if np.max(np.abs(templates[i] - templates[j])) < r:
count += 1
return count
A = count_matches(m + 1)
B = count_matches(m)
if B == 0:
return np.nan
return -np.log(A / B) if A > 0 else np.nan
4.3.2 Detrended Fluctuation Analysis (DFA) — Short-Term (α1)
DFA quantifies the fractal scaling properties of a time series. The short-term exponent α1 (computed over scales 4–16 beats) is a sensitive stress marker (Penttilä et al., 2001). For pseudo-IBI from smartwatch data, the “beat” axis is replaced by sample indices, which means α1 reflects scaling at the sample resolution rather than the beat resolution.
def dfa_alpha1(signal: np.ndarray, scales: list = None) -> float:
"""
Compute DFA short-term scaling exponent (α1).
Args:
signal: Input time series (pseudo-IBI or HR)
scales: Box sizes for DFA (default: 4 to 16)
"""
if scales is None:
max_scale = min(16, len(signal) // 4)
scales = list(range(4, max_scale + 1))
if len(scales) < 2:
return np.nan
# Integrate the mean-subtracted signal
y = np.cumsum(signal - np.mean(signal))
fluctuations = []
for scale in scales:
n_segments = len(y) // scale
if n_segments < 1:
continue
rms_values = []
for seg in range(n_segments):
segment = y[seg * scale:(seg + 1) * scale]
x = np.arange(scale)
coeffs = np.polyfit(x, segment, 1)
trend = np.polyval(coeffs, x)
rms_values.append(np.sqrt(np.mean((segment - trend) ** 2)))
fluctuations.append(np.mean(rms_values))
if len(fluctuations) < 2:
return np.nan
log_scales = np.log(scales[:len(fluctuations)])
log_fluct = np.log(fluctuations)
slope, _ = np.polyfit(log_scales, log_fluct, 1)
return slope
4.4 Trend-Based Features
These features exploit what smartwatch data does capture well: HR dynamics over seconds to minutes.
def trend_features(hr_window: np.ndarray,
timestamps_s: np.ndarray) -> dict:
"""
Compute trend-based pseudo-HRV features.
Args:
hr_window: HR values (bpm)
timestamps_s: Corresponding timestamps in seconds
Returns:
Dictionary of trend features
"""
# HR slope (linear regression)
slope, intercept = np.polyfit(timestamps_s - timestamps_s[0], hr_window, 1)
# Residual variability after detrending
trend_line = slope * (timestamps_s - timestamps_s[0]) + intercept
residuals = hr_window - trend_line
residual_std = np.std(residuals, ddof=1)
# First differences of HR
hr_diffs = np.diff(hr_window)
# Acceleration (second differences)
hr_accel = np.diff(hr_diffs)
return {
'hr_slope': slope, # bpm/sec
'hr_residual_std': residual_std, # bpm
'hr_mean_abs_diff': np.mean(np.abs(hr_diffs)), # bpm
'hr_diff_std': np.std(hr_diffs, ddof=1), # bpm
'hr_accel_std': np.std(hr_accel, ddof=1) if len(hr_accel) > 1 else np.nan,
'hr_mean': np.mean(hr_window), # bpm
'hr_median': np.median(hr_window), # bpm
'hr_iqr': np.percentile(hr_window, 75) - np.percentile(hr_window, 25)
}
5. Complete Feature Extraction Pipeline
5.1 Windowed Extraction
Features are computed over sliding windows, typically 60–300 seconds with 50% overlap. Shorter windows capture acute changes; longer windows provide more stable estimates.
def extract_all_features(df: pd.DataFrame,
window_sec: float = 120.0,
step_sec: float = 60.0,
min_coverage: float = 0.7) -> pd.DataFrame:
"""
Extract pseudo-HRV features from cleaned, interpolated HR data.
Args:
df: DataFrame with 'timestamp' and 'hr' columns (uniform sampling)
window_sec: Window size in seconds
step_sec: Step size in seconds
min_coverage: Minimum fraction of non-NaN samples required
Returns:
DataFrame with one row per window and all features
"""
t0 = df['timestamp'].iloc[0]
t_sec = (df['timestamp'] - t0).dt.total_seconds().values
hr = df['hr'].values
features_list = []
window_start = t_sec[0]
while window_start + window_sec <= t_sec[-1]:
mask = (t_sec >= window_start) & (t_sec < window_start + window_sec)
hr_w = hr[mask]
ts_w = t_sec[mask]
# Check coverage
valid = ~np.isnan(hr_w)
if np.sum(valid) / len(hr_w) < min_coverage:
window_start += step_sec
continue
hr_clean = hr_w[valid]
ts_clean = ts_w[valid]
# Time-domain
feats = {
'window_start': t0 + pd.Timedelta(seconds=window_start),
'window_center': t0 + pd.Timedelta(seconds=window_start + window_sec / 2),
'coverage': np.sum(valid) / len(hr_w),
'pseudo_sdnn': pseudo_sdnn(hr_clean),
'pseudo_rmssd': pseudo_rmssd(hr_clean),
'pseudo_pnn50': pseudo_pnn(hr_clean, threshold_ms=50.0),
'pseudo_pnn20': pseudo_pnn(hr_clean, threshold_ms=20.0),
'hr_range': hr_range(hr_clean),
'hr_cv': hr_cv(hr_clean),
}
# Frequency-domain
if len(hr_clean) >= 30:
freq_feats = pseudo_frequency_features(ts_clean, hr_clean)
feats.update(freq_feats)
# Trend-based
feats.update(trend_features(hr_clean, ts_clean))
# Nonlinear
ibi_w = 60_000.0 / hr_clean
if len(ibi_w) >= 30:
feats['sample_entropy'] = sample_entropy(ibi_w)
feats['dfa_alpha1'] = dfa_alpha1(ibi_w)
features_list.append(feats)
window_start += step_sec
return pd.DataFrame(features_list)
5.2 Baseline Normalization
Within-subject, within-session normalization improves comparability across individuals with different resting HR levels:
def normalize_to_baseline(features_df: pd.DataFrame,
baseline_duration_sec: float = 120.0,
step_sec: float = 60.0) -> pd.DataFrame:
"""
Express each feature as a percentage change from the baseline window.
Assumes the first window(s) correspond to the resting baseline.
"""
n_baseline_windows = max(1, int(baseline_duration_sec / step_sec))
baseline = features_df.iloc[:n_baseline_windows]
numeric_cols = features_df.select_dtypes(include=[np.number]).columns
normalized = features_df.copy()
for col in numeric_cols:
bl_mean = baseline[col].mean()
if bl_mean != 0 and not np.isnan(bl_mean):
normalized[col] = 100.0 * (features_df[col] - bl_mean) / abs(bl_mean)
else:
normalized[col] = 0.0
return normalized
6. Validation Against ECG Ground Truth
6.1 Study Design
Validation requires simultaneous recording of clinical ECG and smartwatch HR during a controlled stress protocol. The protocol used in this project consists of:
- Baseline (5 minutes): Seated rest, neutral video
- Stress induction (variable): Cognitive task with time pressure (mental arithmetic, bomb defusal) or emotionally charged video
- Recovery (5 minutes): Seated rest, calming video
ECG is recorded at 500 Hz using a clinical-grade monitor. The smartwatch (Garmin Forerunner/Fenix series) records HR in foreground activity mode (~1 Hz).
6.2 ECG-HRV Reference Pipeline
import neurokit2 as nk
def compute_ecg_hrv(ecg_signal: np.ndarray,
sampling_rate: int = 500,
window_sec: float = 120.0) -> dict:
"""
Compute gold-standard HRV from ECG signal.
Returns dictionary of standard HRV metrics.
"""
# Detect R-peaks
_, info = nk.ecg_process(ecg_signal, sampling_rate=sampling_rate)
rpeaks = info['ECG_R_Peaks']
# Compute R-R intervals (ms)
rr_intervals = np.diff(rpeaks) / sampling_rate * 1000
# Time-domain
sdnn = np.std(rr_intervals, ddof=1)
rmssd = np.sqrt(np.mean(np.diff(rr_intervals) ** 2))
pnn50 = 100.0 * np.sum(np.abs(np.diff(rr_intervals)) > 50) / len(np.diff(rr_intervals))
# Frequency-domain (via NeuroKit2)
hrv_freq = nk.hrv_frequency(
{'ECG_R_Peaks': rpeaks},
sampling_rate=sampling_rate
)
return {
'ecg_sdnn': sdnn,
'ecg_rmssd': rmssd,
'ecg_pnn50': pnn50,
'ecg_lf_power': hrv_freq['HRV_LF'].values[0],
'ecg_hf_power': hrv_freq['HRV_HF'].values[0],
'ecg_lf_hf_ratio': hrv_freq['HRV_LFHF'].values[0]
}
6.3 Agreement Analysis
6.3.1 Bland-Altman Analysis
The Bland-Altman plot is the standard method for assessing agreement between two measurement methods (Bland & Altman, 1986). For each pseudo-HRV metric and its ECG counterpart, we compute:
- Bias: Mean difference (pseudo - ECG)
- Limits of agreement: Bias ± 1.96 × SD of differences
- Proportional bias: Regression of differences on means
import matplotlib.pyplot as plt
def bland_altman_plot(ecg_values: np.ndarray,
pseudo_values: np.ndarray,
metric_name: str):
"""Generate Bland-Altman agreement plot."""
means = (ecg_values + pseudo_values) / 2
diffs = pseudo_values - ecg_values
bias = np.mean(diffs)
sd = np.std(diffs, ddof=1)
upper_loa = bias + 1.96 * sd
lower_loa = bias - 1.96 * sd
fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(means, diffs, alpha=0.5, s=20)
ax.axhline(bias, color='red', linestyle='-', label=f'Bias: {bias:.2f}')
ax.axhline(upper_loa, color='gray', linestyle='--',
label=f'Upper LoA: {upper_loa:.2f}')
ax.axhline(lower_loa, color='gray', linestyle='--',
label=f'Lower LoA: {lower_loa:.2f}')
ax.set_xlabel(f'Mean of ECG and Pseudo {metric_name}')
ax.set_ylabel(f'Difference (Pseudo - ECG)')
ax.set_title(f'Bland-Altman: {metric_name}')
ax.legend()
return fig, {'bias': bias, 'upper_loa': upper_loa, 'lower_loa': lower_loa}
6.3.2 Intraclass Correlation Coefficient (ICC)
ICC quantifies the consistency between ECG-HRV and pseudo-HRV across windows and participants. ICC(2,1) (two-way random, single measures, absolute agreement) is appropriate for this comparison:
from scipy import stats
def compute_icc(ecg_values: np.ndarray,
pseudo_values: np.ndarray) -> dict:
"""
Compute ICC(2,1) for agreement between ECG-HRV and pseudo-HRV.
Returns ICC value and 95% CI.
"""
n = len(ecg_values)
data = np.column_stack([ecg_values, pseudo_values])
# Two-way ANOVA components
grand_mean = np.mean(data)
row_means = np.mean(data, axis=1)
col_means = np.mean(data, axis=0)
ss_total = np.sum((data - grand_mean) ** 2)
ss_rows = 2 * np.sum((row_means - grand_mean) ** 2)
ss_cols = n * np.sum((col_means - grand_mean) ** 2)
ss_error = ss_total - ss_rows - ss_cols
k = 2 # number of raters (ECG, pseudo)
ms_rows = ss_rows / (n - 1)
ms_cols = ss_cols / (k - 1)
ms_error = ss_error / ((n - 1) * (k - 1))
icc = (ms_rows - ms_error) / (ms_rows + (k - 1) * ms_error +
k * (ms_cols - ms_error) / n)
return {'icc': icc, 'n_observations': n}
6.3.3 Responsiveness Concordance
Beyond absolute agreement, we assess whether pseudo-HRV and ECG-HRV change in the same direction across experimental phases. This is often more relevant for stress detection than absolute accuracy:
def responsiveness_concordance(ecg_baseline: float, ecg_stress: float,
pseudo_baseline: float, pseudo_stress: float) -> bool:
"""
Check if both methods detect the same direction of change
from baseline to stress.
"""
ecg_delta = ecg_stress - ecg_baseline
pseudo_delta = pseudo_stress - pseudo_baseline
return np.sign(ecg_delta) == np.sign(pseudo_delta)
7. Expected Results and Interpretation Guidelines
7.1 Feature Reliability Hierarchy
Based on the literature and theoretical analysis, we propose the following reliability hierarchy for pseudo-HRV features derived from 1 Hz smartwatch HR:
| Feature | Expected ICC vs ECG | Stress Sensitivity | Recommended? |
|---|---|---|---|
| HR mean | > 0.95 | High (direction) | Yes |
| HR slope | > 0.85 | High (acute stress onset) | Yes |
| HR range / IQR | 0.70–0.85 | Moderate | Yes |
| HR CV | 0.65–0.80 | Moderate | Yes |
| Pseudo-SDNN | 0.50–0.75 | Moderate (underestimates) | With caution |
| LF power (Lomb-Scargle) | 0.40–0.65 | Low–Moderate | With caution |
| Pseudo-RMSSD | 0.30–0.55 | Low (substantial bias) | Report bias |
| Trend features | 0.70–0.90 | High | Yes |
| Sample entropy | 0.35–0.60 | Moderate | Exploratory |
| DFA α1 | 0.30–0.55 | Low–Moderate | Exploratory |
| HF power | < 0.40 | Low (aliased at 1 Hz) | Not recommended |
| Pseudo-pNN50 | < 0.35 | Low (threshold too coarse) | Not recommended |
7.2 When Pseudo-HRV Is Sufficient
Pseudo-HRV features are appropriate when:
- Detecting stress direction: Baseline vs. stress classification (binary), where relative changes matter more than absolute values
- Within-subject comparisons: Tracking how an individual’s autonomic state changes over time
- Low-frequency dynamics: Studying sympathetic activation patterns that unfold over minutes
- Large-sample studies: When ECG is impractical and statistical power compensates for individual measurement noise
- Screening and triage: Identifying individuals who may benefit from more detailed clinical assessment
7.3 When ECG Is Required
Pseudo-HRV is not a substitute for clinical ECG-HRV when:
- Absolute HRV values are needed (e.g., clinical risk stratification)
- High-frequency analysis is required (respiratory sinus arrhythmia, vagal tone)
- Beat-to-beat precision matters (arrhythmia detection, ectopic beat analysis)
- Short windows (< 60 seconds) are used, where few HR samples provide insufficient data
- Regulatory or clinical contexts demand validated measurement instruments
8. Practical Recommendations
8.1 Data Collection
- Maximize sampling rate: Use foreground recording on the smartwatch (activity mode) rather than background monitoring. Garmin devices typically provide 1 Hz HR in activity mode vs. irregular sampling in background mode
- Minimize motion: For controlled experiments, keep participants seated during critical measurement windows. Use accelerometer data to flag and exclude high-motion periods
- Record long windows: Use at least 120-second analysis windows to ensure sufficient samples. For frequency-domain analysis, 300 seconds is preferable
- Collect simultaneously: When possible, record ECG in parallel for at least a subset of participants to calibrate pseudo-HRV against ground truth
8.2 Analysis
- Always report the source: Clearly label features as “pseudo-HRV” or “HR-derived” to distinguish from ECG-based HRV
- Use robust features first: Prioritize HR mean, slope, range, and CV before attempting RMSSD or frequency-domain approximations
- Apply baseline normalization: Within-subject percent-change from baseline improves comparability and reduces the impact of systematic bias
- Combine features: A multi-feature approach (HR trend + variability + nonlinear) outperforms any single metric for stress classification
- Validate with ground truth: When ECG data are available (even from prior studies with the same protocol), quantify agreement and report Bland-Altman statistics
8.3 Reporting
When publishing results based on pseudo-HRV:
- State the smartwatch model, firmware version, and recording mode
- Report the actual sampling interval distribution (mean, SD, range)
- Describe artifact rejection criteria and the percentage of data removed
- Specify interpolation method and parameters
- Report all pseudo-HRV features with clear notation (e.g., “pseudo-RMSSD” rather than “RMSSD”)
- Include agreement statistics against ECG if available
- Discuss limitations of the measurement approach
9. Discussion
9.1 The Pragmatic Case for Pseudo-HRV
The clinical HRV literature spans decades and thousands of studies, nearly all using ECG. Moving to smartwatch-derived approximations inevitably sacrifices fidelity. However, this tradeoff enables something ECG cannot: continuous, unobtrusive monitoring in daily life at scale. A participant wearing a Garmin watch for two weeks generates thousands of HR samples across dozens of contexts—sleeping, commuting, working, exercising—providing ecological validity that no clinical session can match.
The pseudo-HRV framework acknowledges this tradeoff explicitly. Rather than claiming equivalence with ECG-HRV, it defines a parallel set of features with known biases and documented validity ranges, allowing researchers to make informed decisions about which analyses are defensible.
9.2 The Role of Machine Learning
An important alternative to computing individual pseudo-HRV features is to feed raw HR time series (or simple statistical summaries) directly into machine learning models trained to predict stress labels. In this approach, the model implicitly learns whatever information about autonomic state is available in the HR signal, without the intermediate step of approximating traditional HRV metrics. Our feature taxonomy remains valuable as an interpretability tool—understanding why a model classifies a window as stressed is easier when the features have physiological meaning.
9.3 Future Directions
- Device-specific calibration: Different smartwatch models introduce different smoothing and averaging. Transfer functions mapping pseudo-HRV to ECG-HRV could be device-specific
- IBI-capable devices: Some newer Garmin models (Fenix 7+, Forerunner 965) expose inter-beat intervals via the
.fitfile’shrv_statusorstressmessages. When available, these should be used instead of HR-derived pseudo-IBI, dramatically improving feature quality - Multi-sensor fusion: Combining HR with accelerometer, skin temperature, and electrodermal activity (on supported devices) may compensate for the limited HRV information in HR-only data
- Normative databases: Building population-level norms for pseudo-HRV features, stratified by device and recording context, would facilitate clinical translation
10. Conclusion
Pseudo-HRV features derived from intermittent smartwatch heart rate samples occupy a pragmatic middle ground between no physiological measurement and clinical ECG-HRV. They cannot replace the gold standard, but they can extend the reach of stress research into daily life, enable large-scale studies, and provide meaningful within-subject stress tracking.
The methodology presented here—artifact rejection, gap-aware interpolation, a four-category feature taxonomy, and systematic validation against ECG ground truth—provides a principled framework for extracting and evaluating these features. By being transparent about the biases and limitations inherent in the approach, researchers can make defensible use of consumer wearable data while clearly communicating the boundaries of their measurements.
For the StressSmartWatch project specifically, pseudo-HRV features form the core of the smartwatch-only stress estimation pipeline, complemented by behavioral data from joystick interactions and self-reported stress. The validation phase will quantify exactly which features survive the transition from ECG to smartwatch and how effectively they support stress detection in controlled and daily-life settings.
References
Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology. (1996). Heart rate variability: Standards of measurement, physiological interpretation, and clinical use. Circulation, 93(5), 1043–1065.
Thayer, J. F., Åhs, F., Fredrikson, M., Sollers, J. J., & Wager, T. D. (2012). A meta-analysis of heart rate variability and neuroimaging studies: Implications for heart rate variability as a marker of stress and health. Neuroscience & Biobehavioral Reviews, 36(2), 747–756.
Bland, J. M., & Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet, 327(8476), 307–310.
Richman, J. S., & Moorman, J. R. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology—Heart and Circulatory Physiology, 278(6), H2039–H2049.
Penttilä, J., Helminen, A., Jartti, T., Kuusela, T., Huikuri, H. V., Tulppo, M. P., & Scheinin, H. (2001). Time domain, geometrical and frequency domain analysis of cardiac vagal outflow: Effects of various respiratory patterns. Clinical Physiology, 21(3), 365–376.
Pinto, G., Carvalho, J. M., Barros, F., Soares, S. C., Pinho, A. J., & Brás, S. (2020). Multimodal emotion evaluation: A physiological model for cost-effective emotion classification. Sensors, 20(12), 3510.
Schäfer, A., & Vagedes, J. (2013). How accurate is pulse rate variability as an estimate of heart rate variability? A review on studies comparing photoplethysmographic technology with an electrocardiogram. International Journal of Cardiology, 166(1), 15–29.
Lomb, N. R. (1976). Least-squares frequency analysis of unequally spaced data. Astrophysics and Space Science, 39, 447–462.
Peng, C. K., Havlin, S., Stanley, H. E., & Goldberger, A. L. (1995). Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series. Chaos, 5(1), 82–87.
Plews, D. J., Scott, B., Altini, M., Wood, M., Kilding, A. E., & Laursen, P. B. (2017). Comparison of heart-rate-variability recording with smartphone photoplethysmography, Polar H7 chest strap, and electrocardiography. International Journal of Sports Physiology and Performance, 12(10), 1324–1328.
Appendix A: Quick-Reference Feature Table
| Feature | Formula (from HR in bpm) | Window | Units |
|---|---|---|---|
| HR mean | mean(HR) | 60–300 s | bpm |
| HR SD | std(HR) | 60–300 s | bpm |
| HR CV | 100 × std(HR)/mean(HR) | 60–300 s | % |
| HR range | max(HR) - min(HR) | 60–300 s | bpm |
| HR slope | linear regression slope | 60–300 s | bpm/s |
| HR residual SD | std(HR - trend) | 60–300 s | bpm |
| Pseudo-SDNN | std(60000/HR) | 120–300 s | ms |
| Pseudo-RMSSD | sqrt(mean(diff(60000/HR)²)) | 120–300 s | ms |
| Pseudo-pNN20 | %|diff(60000/HR)| > 20 ms | 120–300 s | % |
| LF power | Lomb-Scargle 0.04–0.15 Hz | 300 s | ms² |
| LF/HF ratio | LF power / HF power | 300 s | — |
| Sample entropy | SampEn(IBI, m=2, r=0.2×SD) | 120–300 s | — |
| DFA α1 | DFA scaling exponent | 120–300 s | — |
Appendix B: Minimum Data Requirements
| Analysis Type | Min Window | Min Sampling Rate | Min Samples |
|---|---|---|---|
| HR trend features | 60 s | 0.2 Hz | 12 |
| Time-domain pseudo-HRV | 120 s | 1 Hz | 120 |
| Frequency-domain (VLF+LF) | 300 s | 0.5 Hz | 150 |
| Frequency-domain (with HF) | 300 s | 1 Hz | 300 |
| Nonlinear (SampEn, DFA) | 120 s | 1 Hz | 120 |
| Stress classification model | 120 s | 1 Hz | 120 |