GitHub - johnamit/motionbench: A real-time pose-based exercise recognition project. It classifies exercise motion from short temporal windows, estimates repetition counts with a deterministic finite-state method, and reports similarity against class-level motion centroids.

MotionBench is a real-time pose-based exercise recognition project designed for practical local usage. It classifies exercise motion from short temporal windows, estimates repetition counts with a deterministic finite-state method, and reports similarity against class-level motion centroids.

Overview

MotionBench is built to run the full workflow from start to finish. You can prepare sequence data, train models, benchmark inference, and run real-time prediction from a webcam.

The runtime pipeline is simple. It captures frames, extracts pose-based features, builds rolling windows, and predicts one of six exercise classes. It also estimates repetitions with a deterministic finite-state method and reports a centroid similarity score for live feedback.

Core work happens in data/, models/, scripts/, and results/. Older or non-essential files are moved to archive/ to keep the main repository clear and easy to review.

Dataset

This repo stays lightweight on GitHub. Download the dataset files from Hugging Face and place them in data/.

git clone https://huggingface.co/datasets/johnamit/motionbench-data data

For local usage, keep split files under data/.

The workflow expects fixed sequence splits (train, val, test_internal) and optionally a separate home/generalization test split.

Expected split files:

data/train_sequences.csv
data/val_sequences.csv
data/test_internal_sequences.csv
data/test_home_sequences.csv (optional for home/generalization evaluation)

To regenerate centralized fixed splits:

python scripts/preprocess/create_fixed_splits.py --input-file data/train_sequences_full.csv --output-dir data

Models

Download trained model files from Hugging Face and place them in models/.

git clone https://huggingface.co/johnamit/motionbench-models models

This project includes six sequence models with different strengths. Some are strong on temporal memory, some are better for latency, and some are better at capturing structured feature relationships.

BiLSTM: The bidirectional LSTM processes each sequence in forward and backward directions within the input window, so the classifier can use context from both ends of the motion segment. This helps when important movement details are spread across the whole sequence, not just a single frame.

LSTM: Unidirectional LSTM reads movement step by step in time. It is a simple and reliable sequence model, so it works well as a strong baseline for exercise classification while keeping runtime reasonable.

GRU: The GRU uses gating similar to LSTM but with fewer internal components, which can reduce parameter count and improve efficiency. In practice, it is a strong candidate when you want robust sequence modeling with lighter recurrent overhead.

TCN: The temporal convolutional network uses dilated 1D convolutions and residual blocks to to learn patterns over short and long time ranges. Because convolutional operations are parallelizable, it is often fast at inference, which makes it a good option when responsiveness matters.

CNN-BiLSTM: This hybrid architecture first applies temporal convolutions to capture short local motion patterns, then a BiLSTM models how those patterns evolve over time. This gives both local detail and sequence context.

ST-GCN-inspired (feature-graph variant): This ST-GCN-style model treats features as connected nodes and learns both their relationships and how they change over time. It can help when interactions between pose features are important for classification.

Training

Train each model from the shared sequence splits in data/.

python models/bilstm/train.py --train-file data/train_sequences.csv --val-file data/val_sequences.csv --test-file data/test_internal_sequences.csv --output-dir models/bilstm/results
python models/lstm/train.py --train-file data/train_sequences.csv --val-file data/val_sequences.csv --test-file data/test_internal_sequences.csv --output-dir models/lstm/results
python models/gru/train.py --train-file data/train_sequences.csv --val-file data/val_sequences.csv --test-file data/test_internal_sequences.csv --output-dir models/gru/results
python models/tcn/train.py --train-file data/train_sequences.csv --val-file data/val_sequences.csv --test-file data/test_internal_sequences.csv --output-dir models/tcn/results
python models/cnn_bilstm/train.py --train-file data/train_sequences.csv --val-file data/val_sequences.csv --test-file data/test_internal_sequences.csv --output-dir models/cnn_bilstm/results
python models/st_gcn/train.py --train-file data/train_sequences.csv --val-file data/val_sequences.csv --test-file data/test_internal_sequences.csv --output-dir models/st_gcn/results

If centroid assets are missing or if models were retrained, rebuild similarity assets:

python scripts/preprocess/build_similarity_assets.py --train-file data/train_sequences.csv --models-root models

Inference (Local)

Run offline evaluation on home/generalization test data:

python scripts/evaluate/evaluate_home_set.py --test-file data/test_home_sequences.csv --models-root models --output-dir results/eval_offline_home

Run inference benchmarking:

python scripts/benchmark/benchmark_inference.py --input-file data/test_home_sequences.csv --models-root models --output-dir results/benchmark_inference

Run realtime webcam evaluation (CLI):

python scripts/realtime_eval/evaluate_realtime_webcam.py --model-name bilstm --models-root models --output-dir results/eval_realtime

Run MotionBench

Option 1: Linux Docker (local webcam)

Use this when you want camera input from /dev/video0 inside Docker. Works for Linux.

docker build -t motionbench-space .
docker run --rm -p 7860:7860 --device=/dev/video0:/dev/video0 motionbench-space

Option 2: Local Streamlit Run (Windows / macOS / Linux)

Use this to run directly on your machine (recommended for Windows/macOS webcam support).

First, clone model weights from Hugging Face and replace the local models/ folder with the cloned models/ folder:

git clone https://huggingface.co/johnamit/motionbench-models

Then create the environment:

conda create -n motionbench python=3.11 -y
conda activate motionbench
pip install -r requirements.txt

Then run the app:

streamlit run scripts/app/motionbench.py

A live app of the streamlit app is hosted on HuggingFace spaces via Docker. However this has bugs so i not yet complete.

The demo videos show the app running locally via Streamlit (Option 2).

Model Performance Evaluation

Metric Legend

Higher is better: Accuracy, F1 (Macro), F1 (Weighted), Precision, Recall
Lower is better: Mean Latency (ms), P95 Latency (ms), Peak Memory (MB), Model Size (MB)

Generalisation Evaluation

This test evaluates how well each model recognises exercises on real-life home exercise videos it did not see during training.

It reports classification performance using Accuracy, Macro F1 (class balance sensitivity), and Weighted F1/Precision/Recall (overall performance weighted by class frequency).

Model	Accuracy	F1 (Macro)	F1 (Weighted)	Precision (Weighted)	Recall (Weighted)
gru	0.9665	0.9672	0.9663	0.9675	0.9665
bilstm	0.9553	0.9575	0.9552	0.9582	0.9553
cnn_bilstm	0.9553	0.9585	0.9552	0.9576	0.9553
lstm	0.9497	0.9521	0.9500	0.9518	0.9497
tcn	0.9497	0.9529	0.9495	0.9528	0.9497
st_gcn	0.8715	0.8801	0.8679	0.8937	0.8715

GRU performs best on unseen home videos (highest Accuracy and Weighted F1), with BiLSTM and CNN-BiLSTM close behind.

Inference Benchmark (CPU)

This test measures how fast each model runs on a CPU, which is useful for laptops, edge devices and low-cost deployment.

It reports mean latency and P95 latency (worst-case tail behavior), plus model size in MB.

Model	Device	Model Size (MB)	Mean Latency (ms)	P95 Latency (ms)
lstm	cpu	0.778	0.226	0.253
bilstm	cpu	0.839	0.359	0.406
cnn_bilstm	cpu	1.066	0.423	0.464
gru	cpu	0.412	0.555	0.589
tcn	cpu	1.175	0.820	0.850
st_gcn	cpu	0.984	2.793	2.870

LSTM is the fastest on CPU, while GRU has the smallest model size.

Inference Benchmark (CUDA)

This test measures how fast each model runs on a GPU (RTX 3090), where lower latency results in smoother live predictions.

It reports mean latency and P95 latency, model size and peak GPU memory usage, which helps choose a model that is both fast and resource-efficient.

Model	Device	Model Size (MB)	Mean Latency (ms)	P95 Latency (ms)	Peak Memory (MB)
gru	cuda	0.412	0.127	0.135	10.146
bilstm	cuda	0.839	0.189	0.208	43.994
lstm	cuda	0.778	0.221	0.229	10.992
cnn_bilstm	cuda	1.066	0.248	0.256	44.249
tcn	cuda	1.175	0.360	0.377	10.554
st_gcn	cuda	0.984	0.533	0.560	18.131

GRU is fastest on GPU and also has one of the lowest memory footprints, making it the best real-time deployment candidate in this benchmark.

Citations

Bidirectional Long Short-Term Memory (BiLSTM)

@article{riccio2024real,
  title={Real-time fitness exercise classification and counting from video frames},
  author={Riccio, Riccardo},
  journal={arXiv preprint arXiv:2411.11548},
  year={2024}
}

Gated Recurrent Unit (GRU)

@article{chung2014empirical,
  title={Empirical evaluation of gated recurrent neural networks on sequence modeling},
  author={Chung, Junyoung and Gulcehre, Caglar and Cho, KyungHyun and Bengio, Yoshua},
  journal={arXiv preprint arXiv:1412.3555},
  year={2014}
}

Temporal Convolutional Network (TCN)

@inproceedings{lea2017temporal,
  title={Temporal convolutional networks for action segmentation and detection},
  author={Lea, Colin and Flynn, Michael D and Vidal, Rene and Reiter, Austin and Hager, Gregory D},
  booktitle={proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={156--165},
  year={2017}
}

Spatial Temporal Graph Convolutional Network

@inproceedings{yan2018spatial,
  title={Spatial temporal graph convolutional networks for skeleton-based action recognition},
  author={Yan, Sijie and Xiong, Yuanjun and Lin, Dahua},
  booktitle={Proceedings of the AAAI conference on artificial intelligence},
  volume={32},
  number={1},
  year={2018}
}

CNN BiLSTM Hybrid

@online{dhomane2024cnnbilstm,
  author  = {Shreyas Dhomane},
  title   = {CNN + BiLSTM Architecture: A Practical Guide},
  year    = {2024},
  month   = oct,
  day     = {23},
  url     = {https://medium.com/@shreyas.dhomane22/cnn-bilstm-architecture-a-practical-guide-c81829022820},
  note    = {Medium article. Accessed: 2026-04-22}
}

License

This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
data		data
models		models
results		results
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Dataset

Models

Training

Inference (Local)

Run MotionBench

Option 1: Linux Docker (local webcam)

Option 2: Local Streamlit Run (Windows / macOS / Linux)

Model Performance Evaluation

Metric Legend

Generalisation Evaluation

Inference Benchmark (CPU)

Inference Benchmark (CUDA)

Citations

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Dataset

Models

Training

Inference (Local)

Run MotionBench

Option 1: Linux Docker (local webcam)

Option 2: Local Streamlit Run (Windows / macOS / Linux)

Model Performance Evaluation

Metric Legend

Generalisation Evaluation

Inference Benchmark (CPU)

Inference Benchmark (CUDA)

Citations

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages