Skip to content

Add 14-class merged-group labels and no-re-extract relabel tool for l…#3

Merged
sezginerr merged 1 commit into
mainfrom
linear-probe-merged-labels
Jun 15, 2026
Merged

Add 14-class merged-group labels and no-re-extract relabel tool for l…#3
sezginerr merged 1 commit into
mainfrom
linear-probe-merged-labels

Conversation

@sezginerr

Copy link
Copy Markdown
Member

…inear probing

  • scripts/eval_labels/splits_merged_majority/: the 14 clinical-group labels for the probe. Each of 37 pathologies is set by a 3-model majority vote (Claude Opus 4.7 + GPT-5.5 + Nemotron-3 Super 120B), then collapsed into the neuroradiologist's 8 pathophysiology (PP_) + 6 imaging-phenotype (BP_) groups. Reproducible via build_merged_group_labels.py --source majority (the builder also supports raw/csv32 agreement variants on demand).

  • scripts/relabel_features.py: build a linear-probe feature dir for a different labels CSV by reusing already-cached encoder features (symlink features_.npy / subject_ids_.txt, rebuild only labels_*.npy + label_names.json). Avoids re-running the frozen encoder when only labels change; errors out if a cached subject is missing from the new labels CSV.

  • README: Linear Probing section documents the label set (class count is derived from the CSV, no code change) and the extract-once / relabel workflow.

Bulky derived CSVs (all/train/val/test) are gitignored.

…inear probing

- scripts/eval_labels/splits_merged_majority/: the 14 clinical-group labels for
  the probe. Each of 37 pathologies is set by a 3-model majority vote
  (Claude Opus 4.7 + GPT-5.5 + Nemotron-3 Super 120B), then collapsed into the
  neuroradiologist's 8 pathophysiology (PP_*) + 6 imaging-phenotype (BP_*)
  groups. Reproducible via build_merged_group_labels.py --source majority
  (the builder also supports raw/csv32 agreement variants on demand).

- scripts/relabel_features.py: build a linear-probe feature dir for a different
  labels CSV by reusing already-cached encoder features (symlink features_*.npy
  / subject_ids_*.txt, rebuild only labels_*.npy + label_names.json). Avoids
  re-running the frozen encoder when only labels change; errors out if a cached
  subject is missing from the new labels CSV.

- README: Linear Probing section documents the label set (class count is
  derived from the CSV, no code change) and the extract-once / relabel workflow.

Bulky derived CSVs (all/train/val/test) are gitignored.
@sezginerr sezginerr merged commit d93222e into main Jun 15, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant