GSoC 2026 — NumFOCUS / PyMC
Scalable Online Bayesian State Space Models: Sequential Updates, Structured Linear Algebra, and Parallel Time Series Inference
Prototype implementation for the GSoC 2026 proposal. Implements a Cholesky-based Kalman filter validated against SciPy's reference — the foundation for online Bayesian inference in PyMC without re-running MCMC from scratch.
Mentors: Jesse Grabowski, Jonathan Dekermanjian
PyMC PR: #8211
Discourse: GSoC 2026 thread
PyMC's pymc-extras state space models require full re-sampling when new observations arrive.
For streaming applications this is O(n³) per step and computationally prohibitive.
This project solves it with a sequential Kalman update — O(n²) per step, no MCMC re-run.
statespace/
├── online/
│ ├── kalman_filter.py # Cholesky-based predict + update (cho_solve, Joseph form)
│ ├── cholesky.py # Covariance utilities: cholesky_predict, log_det, innovation_cov
│ ├── jax_backend.py # jax.lax.scan filter, jnp.linalg.solve (no explicit inv)
│ └── api.py # OnlineSSM: fit(x0, P0), update(y), forecast(h)
tests/
├── test_kalman.py # 6 tests, validated against scipy reference
└── test_api.py # 6 integration tests
from statespace.online import OnlineSSM
ssm = OnlineSSM(F, H, Q, R)
ssm.fit(x0, P0)
for obs in data_stream:
x, P = ssm.update(obs) # O(n²), Cholesky-based
means, covs = ssm.forecast(h=10)git clone https://github.com/KRYSTALM7/pymc-online-ssm
cd pymc-online-ssm
pip install -r requirements.txt
pytest tests/| Method | Gain |
|---|---|
| Cholesky covariance | O(n³) → O(n²) per update |
| Diagonal Q and R | Simplified Kalman gain |
| Kronecker transition (F = Fₜ ⊗ Fₛ) | O(n⁶) → O(n³) predict |
| Metric | Target |
|---|---|
| Diagonal noise speedup | ≥ 5× at n=200 |
| Kronecker predict speedup | ≥ 10× at n_space=n_time=20 |
| Parallel scaling (Ray) | Linear to N=500 series |
| Test coverage | ≥ 80% |
| Tutorial notebook | Published to PyMC Examples |
MIT — see LICENSE