feat: switch evaluation metric to log-odds drop#18
Conversation
…on results Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
|
Warning Review limit reached
More reviews will be available in 40 minutes and 9 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR updates CIAO’s evaluation/scoring pipeline to use log-odds drop (instead of raw logit drop) as the primary delta metric, and threads a precomputed baseline log-odds through the explainer/search code to avoid redundant forward passes.
Changes:
- Added a numerically-stable log-odds helper and switched region delta computation to log-odds drop.
- Precomputes baseline logits/probabilities/log-odds once per image in
CIAOExplainerand passes the baseline downstream. - Simplified
ModelPredictorAPI (logits as the core output; probabilities derived via softmax) and updated callers + MLflow logging.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
ciao/scoring/segments.py |
Threads original_log_odds into surrogate dataset generation and updates target definition to log-odds drop. |
ciao/scoring/region.py |
Adds log_odds_for_class and switches _compute_batch_deltas/calculate_region_deltas to log-odds drop. |
ciao/model/predictor.py |
Makes get_logits the core inference method and redefines get_predictions as softmax over logits; removes class-logit helper. |
ciao/explainer/ciao_explainer.py |
Computes baseline logits/probabilities/log-odds once and passes baseline values through scoring/builder. |
ciao/algorithm/lookahead.py |
Updates calls into region scoring to pass original_log_odds. |
ciao/algorithm/context.py |
Extends SearchContext to carry original_log_odds. |
ciao/algorithm/builder.py |
Accepts/passes original_log_odds and takes original_prob as an input (no baseline forward pass here). |
ciao/__main__.py |
Logs original_log_odds as an MLflow metric alongside existing per-region metrics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request refactors the scoring mechanism to use log-odds instead of raw logits for evaluating model predictions. It introduces a new utility function log_odds_for_class for numerically stable computation and updates the ModelPredictor and various scoring functions to propagate the pre-computed original_log_odds throughout the explanation pipeline. I have no feedback to provide as there are no review comments.
Co-authored-by: Copilot <copilot@github.com>
…log_odds device/dtype Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
ciao/explainer/ciao_explainer.py (1)
56-57: ⚡ Quick winValidate
sigmaat theexplain()boundary.
CIAOExplainer.explain()now exposessigma, but invalid values are only rejected later in_select_seed_and_sign()after the expensive preprocessing/segmentation/scoring path has already run. Please validatesigmaalongside the other public inputs here so config errors fail fast and consistently.Suggested change
if max_regions <= 0: raise ValueError(f"max_regions must be positive, got {max_regions}") if desired_length <= 0: raise ValueError(f"desired_length must be positive, got {desired_length}") if batch_size <= 0: raise ValueError(f"batch_size must be positive, got {batch_size}") + if sigma not in (None, -1, 1): + raise ValueError(f"sigma must be one of None, -1, or 1, got {sigma!r}")Based on learnings: Follow the “Boundary vs. Core” principle: parameter validation should be done only at the boundary layer (e.g.,
CIAOExplainer.explain), while core/internal functions should assume validated inputs.Also applies to: 70-74
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ciao/explainer/ciao_explainer.py` around lines 56 - 57, Validate the sigma parameter at the public boundary inside CIAOExplainer.explain: add input checks for sigma (e.g., non-None allowed types/range/allowed SeedSelectionMode values) alongside the existing validations for desired_length and other public inputs, and raise a clear ValueError/TypeError if invalid; then remove or keep only assertions in the internal _select_seed_and_sign() that assume validated inputs so expensive preprocessing/segmentation/scoring only runs for valid sigma values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ciao/algorithm/lookahead.py`:
- Around line 88-97: The code stores a sign-normalized score
(signed_best/current_best_score) into the trajectory "best_score" field causing
logged values to differ from RegionResult.score when optimization_sign == -1;
change the append logic in the lookahead trajectory code so the comparison still
uses signed_best/current_best_score for decision-making but the trajectory entry
stores the raw best_score (or rename the field if you prefer) instead of
current_best_score; apply the same fix to the second occurrence that mirrors
lines 124-129 so both trajectory logs persist the raw score while retaining the
signed value for comparisons.
---
Nitpick comments:
In `@ciao/explainer/ciao_explainer.py`:
- Around line 56-57: Validate the sigma parameter at the public boundary inside
CIAOExplainer.explain: add input checks for sigma (e.g., non-None allowed
types/range/allowed SeedSelectionMode values) alongside the existing validations
for desired_length and other public inputs, and raise a clear
ValueError/TypeError if invalid; then remove or keep only assertions in the
internal _select_seed_and_sign() that assume validated inputs so expensive
preprocessing/segmentation/scoring only runs for valid sigma values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 726551b2-936b-47b5-adc7-4538fd4e36af
📒 Files selected for processing (10)
.gitignoreciao/__main__.pyciao/algorithm/builder.pyciao/algorithm/context.pyciao/algorithm/lookahead.pyciao/explainer/ciao_explainer.pyciao/model/predictor.pyciao/scoring/region.pyciao/scoring/segments.pyconfigs/base.yaml
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Context:
This PR changes the logit drop to log-odds drop. Decided on this after consultation with Vojtěch Kůr.
What's Changed / Added:
ciao/scoring/region.py:log_odds_for_classfunction, which computes the numerically stable log-odds metric (target_logit - logsumexp(other_logits))._compute_batch_deltasandcalculate_region_deltasto compute and return the Log-Odds drop instead of raw logit differences.ciao/explainer/ciao_explainer.py: Extracted the baseoriginal_log_oddscomputation directly to theCIAOExplainerclass. This value is computed exactly once per image and passed down the pipeline, removing redundant baseline forward passes.ciao/model/predictor.py: Cleaned up theModelPredictorclass by consolidating outputs.get_logitsis now the core method for raw tensors,get_predictionsreturns the Softmax, andget_class_logit_batchwas removed in favor of direct tensor slicing.ciao/algorithm/builder.py,context.py,lookahead.py, &segments.py: Updated typing, constructors, and parameter passing to expectoriginal_log_oddsinstead oforiginal_logit.ciao/__main__.py: Updated MLflow logging to trackoriginal_log_oddsboth globally and per-region for easier debugging of model confidence.Related Task:
XAI-29
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Chores