Skip to content

Add JARVI3 DTL/SuperMath deterministic solver#33

Draft
kyal102 wants to merge 1 commit into
eth-sri:mainfrom
kyal102:dtl-supermath-aime-audit
Draft

Add JARVI3 DTL/SuperMath deterministic solver#33
kyal102 wants to merge 1 commit into
eth-sri:mainfrom
kyal102:dtl-supermath-aime-audit

Conversation

@kyal102

@kyal102 kyal102 commented Jun 30, 2026

Copy link
Copy Markdown

Summary

Adds a draft JARVI3 DTL/SuperMath solver integration for MathArena.

This PR includes:

  • DTLSuperMathSolver, a deterministic routed solver that emits boxed answers for supported AIME 2026 lanes and \boxed{None} for unsupported cases.
  • Vendored standalone DTL/SuperMath lane logic with no dependency on the JARVIS repository.
  • A model config at configs/models/jarvi3/dtl-supermath.yaml with type: dtl_supermath.
  • Runner wiring for the new solver type.

Claim Boundary

This is not an official leaderboard result. It is a reproducibility/audit integration candidate for maintainer review.

Public audit artifact:
https://github.com/kyal102/dtl-supermath-aime-2026-audit/releases/tag/dtl-supermath-aime-2026-audit-v1

Local audit result from that artifact:

  • 14/30 routed answers correct
  • 16/30 abstained
  • 0 incorrect verified emissions
  • 100% accuracy on answered routed cases

Validation

Validated locally in the source artifact before PR packaging:

  • 24 passed across the MathArena-focused DTL/SuperMath adapter/proposer/solver/submission tests.

Validated in this MathArena fork branch:

  • AST parse passed for src/matharena/solvers/dtl_supermath_solver.py and src/matharena/solvers/dtl_supermath_lanes.py.
  • Deterministic lane smoke test returned 277 for aime_2026_01 and None for unsupported aime_2026_05.

Full runner.py bytecode compile was not run on this machine because MathArena declares requires-python = ">=3.12", while the available local Python is 3.11.9; Python 3.11 cannot parse existing upstream 3.12 f-string syntax in runner.py.

Maintainer Notes

Opening as a draft because the integration boundary may need adjustment. In particular, this deterministic lane solver is intentionally abstention-aware and competition-specific for the AIME 2026 audit artifact, rather than a general model baseline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants