Skip to content

docs(data): pipeline overview, band contract, and smoke tests#33

Merged
Goldokpa merged 3 commits intodevelopfrom
docs/data-pipeline-overview-and-band-tests
May 5, 2026
Merged

docs(data): pipeline overview, band contract, and smoke tests#33
Goldokpa merged 3 commits intodevelopfrom
docs/data-pipeline-overview-and-band-tests

Conversation

@Oshgig
Copy link
Copy Markdown
Collaborator

@Oshgig Oshgig commented May 2, 2026

Summary

  • src/climatevision/data/README.md — first single-page overview of the
    data module: per-file purpose (gee_downloader, band_mapping,
    preprocessing, transforms, sampling, quality, validation), the
    per-analysis band contract (4/4/3 channels for deforestation/ice/flood),
    the SCL cloud-masking rules, and the synthetic-fallback metadata
    convention (is_synthetic: true).
  • tests/test_band_mapping.py — 12 smoke tests guarding the
    analysis-type → band contract: per-analysis band counts, SCL
    append-without-duplicate invariant, band-index resolution, and
    enabled vs disabled types from config.yaml.
  • team_docs/Adeolu_Mary_Oshadare_Role.pdf — Data Pipeline & GIS Lead
    role specification (related to the team docs effort).

Why

The data module had no overview document — new contributors had to read
seven files to understand the pipeline. The README is now the single
landing page.

The band contract has only been enforced by convention so far. The smoke
tests catch silent regressions where, e.g., the flooding analysis
unexpectedly receives 4-channel inputs instead of 3 — the kind of break
that surfaces as cryptic shape mismatches at inference time. Closes
part of the analysis-aware data work tracked in PM_ROLE_UPDATES_FOR_GAPS.md.

Test plan

  • pytest tests/test_band_mapping.py -v passes (12 tests)
  • python -c "from climatevision.data.band_mapping import get_bands_for_analysis; print(get_bands_for_analysis('flooding'))" returns ['B03', 'B08', 'B11']
  • README renders correctly in the GitHub diff

Oshgig added 3 commits May 2, 2026 23:25
Codifies Data Pipeline & GIS Lead responsibilities: real GEE tile downloads,
analysis-specific band mapping, SCL cloud masking at inference time, and the
synthetic-fallback guardrail.
Single page covering each file in the data package, the analysis-type band
contract, the SCL cloud-masking rules, and the synthetic-fallback metadata
convention. Helps new contributors avoid hardcoding band lists.
Verifies the analysis-type → band contract holds:
- Sentinel-2 13-band canonical order
- Per-analysis band counts (4/4/3 for deforestation/ice/flood)
- SCL append-without-duplicate invariant
- Band index resolution and rejection of unknown bands
- Enabled vs disabled analysis types from config.yaml
Copy link
Copy Markdown
Member

@Goldokpa Goldokpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README is a good summary of the band-mapping contract — that's what new contributors actually need to align on before touching the preprocessor. The smoke tests in test_band_mapping.py give us a real safety net on the band lookups; we've been flying without those.

Note for the team: this PR adds a role doc back to team_docs/. We previously closed #17 and #25 on the rule that team_docs stays local-only. If we're reversing that decision, let's make it explicit in CONTRIBUTING.md so we don't keep re-litigating it. Approving and merging on the strength of the docs + tests.

@Goldokpa Goldokpa marked this pull request as ready for review May 5, 2026 22:48
@Goldokpa Goldokpa merged commit 833e08a into develop May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants