Skip to content

Refactor group analysis CLI: Support arbitrary stats models and pairwise contrasts (replace contrast_column/values approach) #134

Description

@akhanf

Motivation

Currently, group-level analysis in SPIMquant requires specifying a single --contrast_column and two --contrast_values, which limits flexibility for complex experimental designs (e.g., multiple grouping factors, continuous covariates). To better support common workflows (such as stratified pairwise comparisons within genotype and sex, treatment effects adjusted for covariates, etc.), we want to refactor the CLI to use a statistical modeling approach.

Proposal

  • Replace the current --contrast_column/--contrast_values CLI interface with a formula/model-based approach:
    • Add a --model argument that accepts a patsy/statsmodels-compatible formula (e.g. metric ~ C(treatment) * C(genotype) * C(sex) + age).
    • Add --pairwise <factor> (repeatable) for requesting all pairwise comparisons between levels of a categorical variable, optionally within strata.
    • Add --within <factor1> <factor2>... arguments to define strata for contrasts (e.g. all treatment effects within genotype×sex cells).
  • Remove the --contrast_column/--contrast_values parameters (and their usage in Snakemake config, groupstats rules, and scripts).
  • Implementation details:
    • Fit a single global model per region/metric (as specified by --model), with the formula passed strictly as provided by the user (no automatic interaction expansion).
    • To compute contrast effect sizes and stats, generate predicted means for each combination of contrast factor levels and strata, then compute all desired pairwise differences and their standard errors/p-values using the model’s covariance.
    • Output one results table/map per contrast (with clear contrast and strata labels in filenames/headers).
  • Document these changes in the CLI, usage guides, and output file conventions.

Benefits

  • Users can specify arbitrary models including multiple grouping factors, interactions, and continuous covariates.
  • Enables streamlined many-to-many (e.g., all pairwise) contrasts, stratified analyses, and effects of interest in a single run.
  • Makes the workflow more general, future-proof, and accessible to advanced and basic users.

Not in scope

  • No legacy CLI/backward compatibility required. All changes can break the old interface (major version bump if needed).
  • No support for the old “per-stratum fit” mode—global model only.
  • No automatic upgrading of the model formula beyond what the user supplies (to avoid ‘magic’).

Example usage

pixi run spimquant /bids /out group \
  --model "metric ~ C(treatment) * C(genotype) * C(sex) + age" \
  --pairwise treatment \
  --within genotype sex \
  --cores all

Tasks

  • Remove existing contrast_column/contrast_values interface and filtering logic
  • Implement formula-based config parsing (--model, --pairwise, --within)
  • Update groupstats Snakemake rule and scripts to use global model fitting
  • Refactor result/filename conventions to include contrast and strata entities
  • Update documentation (CLI, examples, output descriptions)
  • Add migration and “what’s changed” note in docs/changelog

cc @akhanf

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions