Skip to content

feat: switch evaluation metric to log-odds drop#18

Open
dhalmazna wants to merge 13 commits into
masterfrom
feat/log-odds
Open

feat: switch evaluation metric to log-odds drop#18
dhalmazna wants to merge 13 commits into
masterfrom
feat/log-odds

Conversation

@dhalmazna

@dhalmazna dhalmazna commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator

Context:
This PR changes the logit drop to log-odds drop. Decided on this after consultation with Vojtěch Kůr.

What's Changed / Added:

  • ciao/scoring/region.py:
    • Introduced the log_odds_for_class function, which computes the numerically stable log-odds metric (target_logit - logsumexp(other_logits)).
    • Updated _compute_batch_deltas and calculate_region_deltas to compute and return the Log-Odds drop instead of raw logit differences.
  • ciao/explainer/ciao_explainer.py: Extracted the base original_log_odds computation directly to the CIAOExplainer class. This value is computed exactly once per image and passed down the pipeline, removing redundant baseline forward passes.
  • ciao/model/predictor.py: Cleaned up the ModelPredictor class by consolidating outputs. get_logits is now the core method for raw tensors, get_predictions returns the Softmax, and get_class_logit_batch was removed in favor of direct tensor slicing.
  • ciao/algorithm/builder.py, context.py, lookahead.py, & segments.py: Updated typing, constructors, and parameter passing to expect original_log_odds instead of original_logit.
  • ciao/__main__.py: Updated MLflow logging to track original_log_odds both globally and per-region for easier debugging of model confidence.

Related Task:
XAI-29

Summary by CodeRabbit

Release Notes

  • New Features

    • Added configurable seed selection mode to control explanation behavior
    • Included original model probability and log-odds in explanation results
    • Added combined metrics for multi-region explanations
  • Bug Fixes

    • Fixed duplicate metric logging issues in explanation runs
    • Improved conditional metric handling for multi-region scenarios
  • Chores

    • Updated gitignore entries for test and temporary directories

@dhalmazna dhalmazna self-assigned this Apr 28, 2026
@dhalmazna dhalmazna requested review from a team and Copilot April 28, 2026 18:51
@coderabbitai

coderabbitai Bot commented Apr 28, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@dhalmazna, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 40 minutes and 9 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8f5003d2-2ec5-4781-8e3f-874a6cb1a127

📥 Commits

Reviewing files that changed from the base of the PR and between 76cbeb0 and 9b0f026.

📒 Files selected for processing (1)
  • ciao/algorithm/lookahead.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/log-odds

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dhalmazna dhalmazna marked this pull request as draft April 28, 2026 18:52

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates CIAO’s evaluation/scoring pipeline to use log-odds drop (instead of raw logit drop) as the primary delta metric, and threads a precomputed baseline log-odds through the explainer/search code to avoid redundant forward passes.

Changes:

  • Added a numerically-stable log-odds helper and switched region delta computation to log-odds drop.
  • Precomputes baseline logits/probabilities/log-odds once per image in CIAOExplainer and passes the baseline downstream.
  • Simplified ModelPredictor API (logits as the core output; probabilities derived via softmax) and updated callers + MLflow logging.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
ciao/scoring/segments.py Threads original_log_odds into surrogate dataset generation and updates target definition to log-odds drop.
ciao/scoring/region.py Adds log_odds_for_class and switches _compute_batch_deltas/calculate_region_deltas to log-odds drop.
ciao/model/predictor.py Makes get_logits the core inference method and redefines get_predictions as softmax over logits; removes class-logit helper.
ciao/explainer/ciao_explainer.py Computes baseline logits/probabilities/log-odds once and passes baseline values through scoring/builder.
ciao/algorithm/lookahead.py Updates calls into region scoring to pass original_log_odds.
ciao/algorithm/context.py Extends SearchContext to carry original_log_odds.
ciao/algorithm/builder.py Accepts/passes original_log_odds and takes original_prob as an input (no baseline forward pass here).
ciao/__main__.py Logs original_log_odds as an MLflow metric alongside existing per-region metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ciao/scoring/region.py
Comment thread ciao/scoring/region.py Outdated

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the scoring mechanism to use log-odds instead of raw logits for evaluating model predictions. It introduces a new utility function log_odds_for_class for numerically stable computation and updates the ModelPredictor and various scoring functions to propagate the pre-computed original_log_odds throughout the explanation pipeline. I have no feedback to provide as there are no review comments.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
ciao/explainer/ciao_explainer.py (1)

56-57: ⚡ Quick win

Validate sigma at the explain() boundary.

CIAOExplainer.explain() now exposes sigma, but invalid values are only rejected later in _select_seed_and_sign() after the expensive preprocessing/segmentation/scoring path has already run. Please validate sigma alongside the other public inputs here so config errors fail fast and consistently.

Suggested change
         if max_regions <= 0:
             raise ValueError(f"max_regions must be positive, got {max_regions}")
         if desired_length <= 0:
             raise ValueError(f"desired_length must be positive, got {desired_length}")
         if batch_size <= 0:
             raise ValueError(f"batch_size must be positive, got {batch_size}")
+        if sigma not in (None, -1, 1):
+            raise ValueError(f"sigma must be one of None, -1, or 1, got {sigma!r}")

Based on learnings: Follow the “Boundary vs. Core” principle: parameter validation should be done only at the boundary layer (e.g., CIAOExplainer.explain), while core/internal functions should assume validated inputs.

Also applies to: 70-74

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ciao/explainer/ciao_explainer.py` around lines 56 - 57, Validate the sigma
parameter at the public boundary inside CIAOExplainer.explain: add input checks
for sigma (e.g., non-None allowed types/range/allowed SeedSelectionMode values)
alongside the existing validations for desired_length and other public inputs,
and raise a clear ValueError/TypeError if invalid; then remove or keep only
assertions in the internal _select_seed_and_sign() that assume validated inputs
so expensive preprocessing/segmentation/scoring only runs for valid sigma
values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ciao/algorithm/lookahead.py`:
- Around line 88-97: The code stores a sign-normalized score
(signed_best/current_best_score) into the trajectory "best_score" field causing
logged values to differ from RegionResult.score when optimization_sign == -1;
change the append logic in the lookahead trajectory code so the comparison still
uses signed_best/current_best_score for decision-making but the trajectory entry
stores the raw best_score (or rename the field if you prefer) instead of
current_best_score; apply the same fix to the second occurrence that mirrors
lines 124-129 so both trajectory logs persist the raw score while retaining the
signed value for comparisons.

---

Nitpick comments:
In `@ciao/explainer/ciao_explainer.py`:
- Around line 56-57: Validate the sigma parameter at the public boundary inside
CIAOExplainer.explain: add input checks for sigma (e.g., non-None allowed
types/range/allowed SeedSelectionMode values) alongside the existing validations
for desired_length and other public inputs, and raise a clear
ValueError/TypeError if invalid; then remove or keep only assertions in the
internal _select_seed_and_sign() that assume validated inputs so expensive
preprocessing/segmentation/scoring only runs for valid sigma values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 726551b2-936b-47b5-adc7-4538fd4e36af

📥 Commits

Reviewing files that changed from the base of the PR and between d31cf6d and 76cbeb0.

📒 Files selected for processing (10)
  • .gitignore
  • ciao/__main__.py
  • ciao/algorithm/builder.py
  • ciao/algorithm/context.py
  • ciao/algorithm/lookahead.py
  • ciao/explainer/ciao_explainer.py
  • ciao/model/predictor.py
  • ciao/scoring/region.py
  • ciao/scoring/segments.py
  • configs/base.yaml

Comment thread ciao/algorithm/lookahead.py
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dhalmazna dhalmazna marked this pull request as ready for review June 3, 2026 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants