Before rPPG-Toolbox existed, replicating someone else's rPPG results was a frustrating exercise. Every research group had their own preprocessing pipeline, their own data splits, their own evaluation code. Two papers could report numbers on the same dataset and still not be comparable because of differences in face detection, frame sampling, or signal post-processing. The field was producing papers faster than anyone could verify them.
rPPG-Toolbox changed that. Published at NeurIPS 2023 by Xin Liu, Girish Narayanswamy, Akshay Paruchuri, and collaborators at the University of Washington, it gave the remote photoplethysmography community something it badly needed: a single, open-source platform where models and datasets meet under consistent evaluation conditions.
"Camera-based physiological measurement is a fast growing field of computer vision. However, the lack of standardized benchmarking and the difficulty in reproducing results have hindered progress." — Liu et al., rPPG-Toolbox paper, NeurIPS 2023 Datasets and Benchmarks Track
What rPPG-Toolbox actually is
At its core, rPPG-Toolbox is a Python-based platform for training, testing, and benchmarking rPPG algorithms. It is not a single model. It is the infrastructure that lets you run many models against many datasets with the same preprocessing, the same data splits, and the same evaluation metrics. The GitHub repository lives at ubicomplab/rPPG-Toolbox and has become one of the most-starred repos in the camera-based physiological sensing space.
The toolbox handles the unglamorous but critical parts of the pipeline: face detection and cropping, video frame extraction, ground truth alignment, signal post-processing, and metric computation. These are the steps where subtle differences between implementations create the reproducibility problems that plagued the field. By standardizing them, rPPG-Toolbox makes apples-to-apples comparison possible for the first time.
Configuration is YAML-based. You specify your model, dataset, training parameters, and preprocessing options in a config file, and the toolbox runs the full pipeline. This design makes it straightforward to swap models or datasets without rewriting code.
Algorithms included in rPPG-Toolbox
The toolbox covers both traditional unsupervised methods and modern deep learning approaches. This matters because unsupervised methods remain relevant baselines and sometimes outperform neural models in cross-dataset settings where training data does not match the test distribution.
Unsupervised methods
These algorithms extract the blood volume pulse signal using signal processing rather than learned parameters:
- GREEN (Verkruysse et al., 2008) uses the green color channel, which carries the strongest photoplethysmographic signal
- ICA (Poh et al., 2011) applies independent component analysis to separate the pulse signal from noise across color channels
- CHROM (de Haan et al., 2013) uses a chrominance-based approach for motion-robust pulse extraction
- POS (Wang et al., 2016) projects skin color changes into a plane orthogonal to the specular reflection axis
- LGI (Pilz et al., 2018) applies local group invariance for handling videos recorded outside controlled settings
- PBV (de Haan et al., 2014) leverages the blood volume pulse signature for improved motion robustness
- OMIT (Alvarez et al., 2023) is a more recent unsupervised pipeline using face-to-PPG conversion
Supervised neural models
The neural methods in rPPG-Toolbox represent the main architectures that have shaped the field:
- DeepPhys (Chen and McDuff, 2018): A 2D CNN with two branches processing appearance and motion. Uses attention to weight facial regions by signal quality. Computationally cheap but misses long-range temporal dependencies.
- PhysNet (Yu et al., 2019): A 3D CNN that processes short video clips as spatiotemporal volumes. The 3D convolutions capture temporal correlations directly in feature extraction, making it consistently strong on benchmark datasets.
- TS-CAN (Liu et al., 2020): Uses temporal shift modules within 2D convolutions to capture temporal information without the computational cost of 3D convolutions. Achieves roughly 1.29 BPM mean absolute error on UBFC-rPPG in published benchmarks.
- EfficientPhys (Liu and McDuff, 2023): Built for deployment on mobile devices. Uses depthwise separable convolutions and neural architecture search to find compact models that run in real time on smartphone GPUs.
- PhysFormer (Yu et al., CVPR 2022): Brought transformer architecture to rPPG with temporal difference transformers and global spatio-temporal attention. Outperforms CNN methods in scenarios involving head movement.
- BigSmall (Narayanswamy et al., 2023): A multi-task learning architecture handling disparate spatial and temporal physiological measurements.
The toolbox has continued expanding since its initial release. Recent additions include iBVPNet (Joshi et al., 2024), PhysMamba (Luo et al., 2024), RhythmFormer (Zou et al., 2024), and FactorizePhys (Joshi et al., NeurIPS 2024).
Supported datasets in rPPG-Toolbox
| Dataset | Source | Subjects | Key characteristics | Recommended for training |
|---|---|---|---|---|
| UBFC-rPPG | University of Burgundy | 42 | Seated subjects, controlled lighting, widely used benchmark | Yes |
| PURE | TU Ilmenau | 10 | Multiple motion scenarios (talking, head rotation), high-quality sync | Yes |
| SCAMPS | Microsoft Research | 2,800 (synthetic) | Synthetically generated with known ground truth, large scale | Yes |
| MMPD | UW / Tsinghua | 33 | Mobile phone recordings, multiple activities, diverse lighting | No (mobile domain) |
| BP4D+ | Binghamton University | 140 | Spontaneous expressions, multimodal data, large subject pool | No (expression-heavy) |
| UBFC-Phys | University of Burgundy | 56 | Stress-inducing tasks, multi-task physiological measurements | No (task-specific) |
| iBVP | — | — | High-framerate BVP ground truth | Yes |
The authors recommend UBFC-rPPG, PURE, iBVP, and SCAMPS for training, primarily because of their synchronization quality between video and ground truth signals. MMPD is particularly interesting for mobile rPPG research since it was recorded entirely on smartphones across different activities and lighting conditions.
Benchmark results and what they tell us
The value of rPPG-Toolbox is not in any single set of benchmark numbers but in making those numbers comparable. When researchers use the same toolbox to train and evaluate, you can actually trust that differences in reported metrics reflect differences in the models rather than differences in evaluation pipelines.
That said, some patterns emerge from the published benchmarks. On UBFC-rPPG under intra-dataset testing (train and test on the same dataset), supervised neural methods generally beat unsupervised ones. TS-CAN and PhysNet tend to lead among the original models, with PhysFormer showing particular strength when subjects move. On the PhysBench benchmark (Kegan Wang et al.), PhysFormer achieved 0.78 BPM MAE on the rPPG subset, with PhysNet at 1.04 and TS-CAN at 1.23.
Cross-dataset testing tells a different story. When you train on one dataset and evaluate on another, the gap between supervised and unsupervised methods often shrinks. POS and CHROM can outperform neural models that were trained on a mismatched domain. This is the generalization problem that the entire field is still working through.
rPPG-Toolbox vs. building from scratch
| Consideration | Using rPPG-Toolbox | Building your own pipeline |
|---|---|---|
| Time to first experiment | Hours | Weeks to months |
| Reproducibility | High (standardized pipeline) | Depends on documentation discipline |
| Model coverage | 13+ models included | Only what you implement |
| Dataset support | 7 datasets with loaders | Write your own for each |
| Community validation | Peer-reviewed, widely cited | Your team only |
| Flexibility for novel architectures | Moderate (requires fitting the framework) | Full control |
| Preprocessing consistency | Guaranteed across experiments | Easy to introduce hidden variance |
For most research groups, rPPG-Toolbox is the obvious starting point. The time savings alone are substantial. Where custom pipelines make sense is when you need preprocessing or evaluation approaches that do not fit the toolbox's design, or when you are working with proprietary datasets that require specialized handling.
The toolbox has been cited extensively since its NeurIPS publication. Papers benchmarking new architectures increasingly report rPPG-Toolbox results alongside or instead of custom evaluation, which has improved the field's ability to track genuine progress.
Why rPPG-Toolbox matters for the field
The contribution is less about any individual algorithm and more about research infrastructure. Before standardized tooling, the rPPG field had a reproducibility problem that was slowing down real progress. Groups would report state-of-the-art results that other groups could not replicate, not because the results were wrong but because the evaluation conditions were impossible to match exactly.
rPPG-Toolbox made it possible to answer a simple question: given the same data, the same preprocessing, and the same metrics, which approach actually works best? That question turns out to be harder to answer than it sounds, and having shared infrastructure to answer it is more valuable than any single algorithmic innovation.
The toolbox also lowered the barrier to entry. A graduate student starting in rPPG research no longer needs to spend months building data loaders, implementing baselines, and debugging preprocessing pipelines. They can start running experiments on day one and focus their effort on the novel parts of their work.
Circadify is developing rPPG technology for real-world contactless vital sign measurement. Open-source tools like rPPG-Toolbox have been instrumental in advancing the foundational research that makes applied rPPG systems possible.
Frequently asked questions
What is rPPG-Toolbox?
rPPG-Toolbox is an open-source platform for camera-based physiological sensing published at NeurIPS 2023. Built by researchers at the University of Washington, it provides standardized implementations of both neural and unsupervised rPPG methods alongside support for seven public datasets, enabling reproducible benchmarking across the field.
What algorithms does rPPG-Toolbox include?
rPPG-Toolbox includes six unsupervised methods (GREEN, ICA, CHROM, POS, LGI, PBV, OMIT) and multiple supervised neural models including DeepPhys, PhysNet, TS-CAN, EfficientPhys, PhysFormer, BigSmall, iBVPNet, PhysMamba, RhythmFormer, and FactorizePhys.
Which datasets does rPPG-Toolbox support?
The toolbox supports seven datasets: UBFC-rPPG, PURE, SCAMPS, MMPD, BP4D+, UBFC-Phys, and iBVP. The authors recommend UBFC-rPPG, PURE, iBVP, or SCAMPS for training due to their synchronization quality and data volume.
How do I get started with rPPG-Toolbox?
The rPPG-Toolbox GitHub repository (ubicomplab/rPPG-Toolbox) includes YAML-based configuration files for training and evaluation. You select a model, point it at a supported dataset organized in the expected directory structure, and run the training or inference pipeline.
Related Articles
- Deep Learning for rPPG Vital Sign Extraction — Analysis of the neural architectures that rPPG-Toolbox benchmarks, from CNNs to transformers.
- rPPG Signal Processing: Raw Video to Vital Signs — How the signal processing pipeline works, including the unsupervised methods implemented in rPPG-Toolbox.
- What is rPPG Technology — Foundational overview of remote photoplethysmography and how cameras extract vital signs from skin color changes.