Draft
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nfwd bridge
Step 1: Delete old forward-mode stack (forward_mode.jl, 110+ frule!! methods,
LazyFoRRule/DynamicFoRRule, forward debug tests). Stub dual_ir and forward IR
inspection.
Step 2: Reimplement public forward via nfwd bridge. NfwdCache for forward-mode
with chunked gradients. HVPCache for nfwd-over-reverse Hessian-vector products
via NDual{T,1} reverse rules. Add _nfwd_lift for Tuple inputs. Update docs and tests.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add NTangent width-aware tangent container to src/tangents/dual.jl
- Add tangent_type(Val(N), P) and dual_type(Val(N), P) width-aware queries
- Update verify_dual_type to handle NTangent-wrapped tangents
- Restore 171 frule!! definitions from main across 10 rule files
- Add _chunk_pack_tangent disambiguations for NTangent{Vararg{NoTangent}}
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add PrimalMode context to contexts.jl with is_primitive always false - Add PrimalMode interpreter delegation in abstract_interpretation.jl - Add primal_mode.jl (605 lines) from primal-mode-migration donor - Update optimise_ir! to use PrimalMode interpreter - Rename DerivedFRule/LazyFRule/DynamicFRule to DerivedPrimal/LazyPrimal/DynamicPrimal - Rename forwards_mode_design.md to primal_mode_design.md - Add primal_mode.jl test (44 lines) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Port LazyFoRRule/DynamicFoRRule from main for forward-over-reverse lazy rule compilation (fixes x^4, x^6 HVP on Julia 1.10) - Add interp keyword to optimise_ir! for ForwardMode inlining control - Fix FoR value_and_hvp!! aliasing bug: copy grad/hvp before returning so successive calls don't overwrite earlier results - Add dual-mode HVP correctness tests (FoR + RoF): quadratic, Rosenbrock, multi-arg, namedtuple, nested tuple intermediates - Add dual-mode Hessian correctness tests (FoR + RoF): Rosenbrock, quadratic, multi-arg, namedtuple, nested tuple, tuple map/reduce Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add build_rrule methods and RRule call method to NfwdMooncake that were lost during the forward-mode refactoring. These are needed by integration tests (distributions) and the public API. Remove 4 @test_broken zero-allocation assertions for array-input nfwd gradient paths that are not yet resolved. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…hecking
Step 7 infrastructure: zero_dual/randn_dual/uninit_dual for NDual and
Complex{NDual}, primal/tangent accessors for Array{NDual} and
Array{Complex{NDual}}, ndual_width/check_ndual_width_consistency for
runtime width validation.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add width param to const_dual!, _uninit_dual dispatcher (4 overloads)
- Thread width through __unflatten_dual_varargs with _group_vararg_dual
- Add W type param to DerivedPrimal, DynamicPrimal for zero-alloc width
- Add width field to LazyPrimal, DynamicPrimal; propagate in _copy
- Relax Vararg{Dual,N} to Vararg{Any,N} in DerivedPrimal/DynamicPrimal
- Include width in closure cache key and primal_rule_type
- Add __get_primal(::NDual) and _partial_i(::NDual) in NfwdMooncake.jl
- All modify_primal_stmts! handlers use info.width for _uninit_dual
Scalar end-to-end verified: sin, sin∘cos, x*x at width-2.
Tests: basic (26533 pass), Nfwd (477 pass).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…idth-N
Add element-wise NDual lifting for container types:
- MemoryRef/Memory: dual_type(Val(N), MemoryRef{T}) → MemoryRef{NDual{T,N}} (1.11+)
- Tuple: dual_type(Val(N), Tuple{T1,T2}) → Tuple{dual_type(Val(N),T1), ...}
- Val{0} ambiguity resolvers for MemoryRef/Memory
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add frule!! overloads for NDual containers (Array{NDual}, MemoryRef{NDual},
Memory{NDual}) so the primal-mode compiler can trace through array operations
at width-N.
Key additions:
- _uninit_dual(Val{N}, Array{T,D}): returns bare Array{NDual{T,N}}
- lmemoryrefget/lmemoryrefset!/memoryrefnew: NDual container variants
- lgetfield for bare NDual containers (tangent = NoTangent)
- sum/sum(abs2) NDual overloads for primitive array reductions
- _HasNDual/_NDualMemTypes dispatch unions
- copy for Array{NDual}
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make the primal-mode compiler NDual-aware so that chunk_size > 1
produces width-N dual numbers instead of looping width-1.
Key changes:
- FCache carries width; value_and_gradient!! splits into
_gradient_width1 (Dual path) and _gradient_widthN (NDual path)
- _make_ndual_seed / _combine_to_ndual build NDual inputs from
per-slot seed tangents
- prepare_derivative_cache passes Val(cs) to build_frule
- _uninit_dual lifts Memory type constants for NDual containers
- _new_ frule handles NDual container construction (Array, Memory)
and struct construction with NTangent
- @zero_derivative frule!! uses per-arg type parameters to avoid
Union{Dual{<:Any},Any} collapsing to Any
- Bare NDual container frule!! overloads for memoryrefnew,
lmemoryrefget, lmemoryrefset!, getfield, copy
- NDual frule!! for sincosd, sincospi, modf (tuple-returning)
- unalias declared as forward-mode primitive (workaround for OC
segfault with NDual containers in broadcast)
- _has_ndual / _dual_or_ndual helpers for NDual dispatch in rules
- NTangent added to StandardTangentType; _get_tangent_field
overloads for NTangent field access
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add _combine_to_ndual methods for Complex{T} scalars and
AbstractArray{Complex{T}} so chunked forward mode preserves
tangent directions instead of dropping them via NoTangent fallback
- Replace zero(T) with f(x...) in empty-input Hessian early return
so the actual primal value is returned
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…biguity
- Allow width-1 Dual calls through chunked FCache by padding into
lane 1 of the width-W rule and extracting via _ndual_output_to_width1
- Add _combine_to_ndual complex NoTangent disambiguators (resolves Aqua)
- Change generic _combine_to_ndual fallback to NTangent (structured types)
- Fix _eval_dir NTangent check and add width-N path for tuple derivatives
- Fix HVP reverse-over-forward seed type for constant functions
- Add Complex{NDual} support: _has_ndual, _dual_or_ndual, lgetfield
frule, _new_ frule, _ndual_width, _tangent_dir
- Fix empty Hessian to return f(x...) instead of zero(T)
- Add regression tests for chunked complex and empty Hessian
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Discover output type via compiled rule instead of raw f(x...) in reverse_over_forward prepare, preserving mutation-restoration semantics - Document no-direct-call contract in prepare_hvp_cache docstring - Add _combine_to_ndual for Tuple: element-wise NDual packing - Add lgetfield frules for bare Tuple/NamedTuple with NDual elements Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ages - Remove Tuple specialization in _count_slots that bypassed IdDict-based alias deduplication, causing redundant evaluations for aliased inputs - Respect config.silence_debug_messages in prepare_derivative_cache instead of hard-coding true Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bring back detailed comments documenting cache-key design, stack aliasing invariants, thread-safety caveats, and why sig_or_mi is excluded from the DynamicFoRRule key. These explain non-obvious correctness constraints. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…es, fix Aqua issues
- Guard all Memory/MemoryRef references with @static if VERSION >= v"1.11-rc4"
in Mooncake.jl, new.jl, and primal_mode.jl
- Move _HasNDual const from memory.jl to Mooncake.jl (used by always-loaded code)
- Fix Aqua unbound type param: _ndual_width uses fieldcount(L) instead of NTuple{W}
- Fix Aqua method ambiguity: add _combine_to_ndual(::Tuple, ::Tuple{}) disambiguator
- Skip forward-mode testing for @zero_derivative typed-vararg cases (pre-existing bug)
- Guard chunked complex array test for 1.11+ only (Core.arrayref lacks width-N frule)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ive vararg frule
- Consolidate all Memory/MemoryRef NDual overloads (_has_ndual, _dual_or_ndual,
_find_ndual_memref, _ndual_width, _tangent_dir, _uninit_dual) into memory.jl
which is already conditionally loaded on Julia 1.11+. Removes @static if guards
from Mooncake.jl, new.jl, and primal_mode.jl.
- Fix _vararg_any_type to produce Union{<:T, Dual{<:T}} so typed vararg frule
signatures match Dual-wrapped args. Removes mode=ReverseMode workarounds.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix forward cache review follow-ups (docs, config, interface, tests) - Merge origin/main into step-8-ndual-routing - Fix NDual memory copy forwarding - Inline and remove single-use helper functions in interface.jl Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Handle NDual outputs in forward cache summaries and add arrayref/arrayset forward rules for arrays storing NDual elements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
The IEEEFloat block above already uses `let nd = ..., nd_zero = ...` to contain the test bindings; do the same for the Memory/MemoryRef block so test-local names aren't leaking into the surrounding @testset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`Int` is a platform alias (`Int64` on x64, `Int32` on x86). The hardcoded
`Vector{Int64}` literal in `test/nfwd/nfwd.jl:416, :429` caused
basic-lts-x86 to fail once `nfwd/nfwd.jl` was wired into the basic test
group: the actual error message renders `Vector{Int32}` on 32-bit
runners. Use `Vector{$Int}` so the test interpolates the same alias the
type-pretty-printer uses.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_group_vararg_dual(::Val{N}, ...)` packed the vararg tail into a
single `Dual(group_primal, NTangent(per_dir))`, but
`dual_type(Val(N), Tuple{...})` decomposes a concrete tuple
element-wise into a Tuple-of-Duals. The OC's vararg slot was therefore
compiled to expect a Tuple-of-Duals and rejected the packed shape with
a TypeError on every chunked vararg call (Float64 and struct cases
alike).
Branch on width and pass `rest` through unchanged on the chunked path —
each element is already a width-N Dual / NDual, which is exactly what
the OC wants. The legacy `width=nothing` path still packs into a single
Dual-of-Tuple as before. `_partial_i` (both the Dual and NDual
specialisations) becomes dead and is removed.
Regression test in `test/interface.jl` exercises a Float64 vararg
function under `chunk_size=2`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `DerivedPrimal{P,T,isva,nargs,W}` callable, the 1.10 `__call_rule`
overloads (both for `DerivedPrimal` and `LazyPrimal{...,DerivedPrimal{...}}`),
and the `_isva` / `_nargs` accessors all matched only the first four
type parameters. Without W in dispatch, two `DerivedPrimal` instances
that differ only in `width` (e.g. `Nothing` vs. `Val{2}`) shared a
single compiled method body, forcing runtime dispatch on `fwd.width`
inside `__unflatten_dual_varargs`. Threading W makes each width specialise
independently and keeps return-type inference sharp.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-add the four chunked-array gradient `count_allocs == 0` assertions
that were dropped in `ed4ceca3`. They remain `@test_broken` because the
array-input gradient still allocates per call (~10–15 bytes per element
from the seed `IdDict` and the `Memory{NDual}` lift in
`_dual_or_ndual` — see follow-up #16). Restoring as `@test_broken`
documents the regression target so any unintended improvement to zero
allocs surfaces immediately, as AGENTS.md requires.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The file shrank from 813 to ~280 lines when the IR-based forward compiler (and its `NfwdMooncake.build_rrule(...; chunk_size=N)` API) was retired in `9a81f81fe` / `c5dd59eb3`. Add a header comment listing what this file covers (NDual dispatch helpers, width-aware seed constructors, the direct NDual frule!! sweep) and what migrated (chunked AD end-to-end → `test/interface.jl`; Memory/Array NDual rules → `test/rules/memory.jl`; interpreter sweep → `test/interpreter/primal_mode.jl`) so future readers can tell intentional scope from accidental gaps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_prepare_hvp(:reverse_over_forward)` previously ran the prep pullback inside the if/else that determined `T_out`, then validated `T_out` was `<: IEEEFloat` afterwards. For non-IEEEFloat outputs the bogus prep call fired (with an inappropriate `zero(T)` seed) before the user-facing ArgumentError. Compute `T_out` from the output's structure, validate first, then run a single targeted prep call with the right seed shape. The fdata_bufs zero-out below the prep call already guards against any side effects, so this is purely a structural cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CoDual primals carry the unlifted primal type, so reverse-mode dispatch
never lands on `Memory{<:_HasNDual}` / `Array{<:_HasNDual}`. The existing
rrule!!s in this file dispatch on `CoDual{Memory{T}}` (original primal),
which is the right and only path. Add a comment block explaining why
we don't add error-stub rrule!!s for the NDual signatures: they would
broaden the rrule!! signature for a dispatch that cannot arise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing parametric `HVP correctness` block exercises each mode independently against analytical Hessians, but never compares the two modes against each other. A polarity bug in the inner gradient closure or a wrong-shape prep seed can drift one mode silently while the analytical assertions still pass for the other. Add a single-arg and a multi-arg cross-mode equality testset (`atol=1e-10`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`zero_derivative(f::Dual, args...)` previously dispatched through two
narrow overloads: `Vararg{Dual,N}` for homogeneous scalar Duals and
`f::Dual, x1::T, x_rest::Vararg{T}` for homogeneous arrays. Truly mixed
varargs (e.g. `(scalar_Dual, Array{Dual})`) hit a MethodError because
neither overload's homogeneity constraint matched.
Replace both with a single `Vararg{Union{Dual, Array{<:Dual},
Array{<:Complex{<:Dual}}}}` overload using a small `_zd_primal` helper
that extracts primals from Duals and passes arrays through. The narrow
NDual / Memory overloads in `nfwd/NfwdMooncake.jl` remain disjoint
(NDual ⊄ Dual; Memory{<:Dual} ⊄ Array). `Test.detect_ambiguities` is
clean. Add a mixed-vararg regression test in `test/tools_for_rules.jl`
"container dispatch".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing test at the same site asserts the `"Chunked tuple inputs must use NTangent consistently"` ArgumentError path; it never asserts that the consistent-NTangent success path works. A refactor could silently break the success path while preserving the error. Add the round-trip case (same `mixed_f`, consistent NTangents across both inputs) with the analytical reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bundle the cosmetic / minor docs items from the pr1151 punch list: - prepare_derivative_cache docstring: spell out chunk_size semantics (`nothing` = legacy width-1 vs `1` and `N>1` chunked NDual paths) and the cache-reuse contract (types, sizes, aliasing topology must match prep-time inputs, otherwise re-prepare). - value_and_hvp!! docstring: document the single-arg vs multi-arg `v` shape asymmetry (single-arg accepts AbstractVector or 1-tuple; multi-arg requires Tuple). - _gradient_widthN width-mismatch error: list the three common causes (missing NDual frule, missing `@is_primitive`, generic Dual+NTangent fallthrough) and suggest `Config(chunk_size=nothing)` as workaround. - _count_slots strict-fallback message: reword from "silent fallthrough would produce wrong tangents" to "would-be silent fallthrough is now an error" — the absence of the fallback is the safety, not the cause. - prepare_derivative_cache nfwd opt-out test: assert the `enable_nfwd` deprecation warning fires at Config construction (matching the contract tested in test/config.jl). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply review findings against `0eac54ada..d531bb9`: - src/tools_for_rules.jl: drop newly-introduced `_zd_primal` helper; the identical `_primal` is already defined in `src/tangents/dual.jl:165`. - src/interpreter/primal_mode.jl: drop unused `where {N}` on the chunked `_group_vararg_dual(::Val, rest)` pass-through; rewrite the `__unflatten_dual_varargs` docstring example so it covers both width=nothing and width=Val(N) shapes (was stale, only described the legacy path). - src/interface.jl: hoist `y_out isa Nfwd.NDual` into a single `is_ndual_out` local in `_prepare_hvp(:reverse_over_forward)` (was tested twice — once for `T_out`, once for the prep seed). Compress the `_gradient_widthN` width-mismatch error from 11 lines + 3 bullets to 4 lines naming the typical cause and pointing at the workaround. Tighten the `prepare_derivative_cache` "Aliasing topology" docstring bullet to state the user-visible contract without leaking the pre-allocated buffers detail. - src/rules/memory.jl: compress the rrule!! parity comment block from 8 lines to 2 — the rationale is just "CoDuals carry the unlifted primal type." - test/interface.jl: rename the awkward `_for` HVP-cache binding (leading underscore because `for` is a keyword) to `fwr` across both single-arg and multi-arg testsets. - test/nfwd/nfwdmooncake.jl: drop the "Historical note" paragraph referencing commits — that's git-log content, not file-header content. Tighten the out-of-scope section to inline cross-references. Performance-neutral; tests pass (interface 1702/1712, primal_mode 1347/1347, nfwd 771/771, tools_for_rules 677/677). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second simplify pass over the same range: - src/interface.jl: `prepare_derivative_cache` docstring partially duplicated the canonical chunk_size enumeration in `Mooncake.Config`'s docstring (`src/config.jl:27-31`). Replace the bullet list with a cross-reference to `Config`, keeping the unique `chunk_size=1` vs `chunk_size=nothing` distinction (different internal paths, both width-1) and the reuse-contract warning (which is genuinely new info). - src/interpreter/primal_mode.jl: Drop the two block comments above the `_group_vararg_dual` overloads — they restate the legacy-vs-chunked rationale already explained in the `__unflatten_dual_varargs` docstring expanded one commit earlier. Tests pass (interface 1702/1712). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step B+C of pr1151.md item #44: switch the public forward path to NDual{T,1} as the canonical width-1 form so the same primitive rule path serves chunk_size>=1 calls. - Config.chunk_size default flips from nothing to 1; docstring rewritten and test/config.jl assertion updated. - value_and_jacobian!! rewritten as _jacobian_widthN, supporting any Val{W} via the existing seed_buf / lift_buf infrastructure; legacy width=nothing path preserved as _jacobian_width1 for transition. - Step A additions: NDual variants for Base.eps / Base.FastMath.exp_fast / exp2_fast / exp10_fast / atan_fast (unary loop) and Base.FastMath.atan_fast (binary loop) in rules_via_nfwd.jl; Base.FastMath.sincos manual entry; @inactive_intrinsic macro now emits a Vararg{Union{Dual,NDual,Complex{<:NDual}}} variant; compilerbarrier and Core.ifelse get NDual variants. - Cascade fixes: _new_ now constructs Complex{NDual} from the lifted args (was stripping NDual); _combine_to_ndual gets a generic Dual{P, NTangent{...}} fallback for non-IEEEFloat aggregates; _dual_or_ndual gets Array{<:IEEEFloat} / Array{<:Complex{<:IEEEFloat}} overloads and an NTangent-of-NoTangent collapse to keep Dual{F, NoTangent} canonical for non-differentiable primals; verify_dual_value (debug mode) widened to accept NTangent-wrapped tangents; getfield gets a Tuple{Vararg{<:_HasNDual,...}} overload in rules/memory.jl; _get_tangent_field handles AbstractArray containers. - Helper rename: _ndual_primal -> _dual_or_ndual_primal -> primal. primal() is extended for AbstractArray{<:NDual} / Tuple / generic passthrough so rule bodies can use one name regardless of which lifted form was passed. - New tasks tracked: Step E (HVP RoF inner cache verification), Step F (delete obsolete Dual{P} frule!!s + dead code) — pending. Cascade is not yet fully closed; remaining failures involve MemoryRef{NoTangent} typeerrors, zero_derivative mixed-arg dispatch, getproperty OC mismatches, and count_allocs regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n-diff containers Step D follow-ups for pr1151.md item #44: - Define `Lifted`, `LiftedTuple`, and `DualOrNDual` Union aliases in `nfwd/NfwdMooncake.jl` next to `_HasNDual`. `Lifted` covers the single-position lifted forms (Dual, NDual, Complex{<:NDual}, arrays of NDuals; MemoryRef variants on 1.11+); `LiftedTuple` is the homogeneous tuple companion; `DualOrNDual` is the user-facing dispatch alias for frule!! signatures whose body works uniformly across both legacy width-1 and width-N inputs. - `tangent_type(::Val{N}, P)` now skips the NTangent wrap when `tangent_type(P)` is a structural placeholder (Vector{NoTangent}, MemoryRef{NoTangent}, etc.). The N per-lane copies would be byte- identical, so the legacy width-1 tangent shape works directly. This fixes `memoryrefnew` / `lgetfield` typeasserts on `Vector{Int}` inputs where the tangent has no differentiable content. - `_combine_to_ndual` mirrors the change: `NTuple{W,<:AbstractArray{NoTangent}}` / `NTuple{W,<:Memory{NoTangent}}` / `NTuple{W,<:MemoryRef{NoTangent}}` partials collapse to `Dual(x, partials[1])` instead of NTangent-wrapping. - `dual_type(::Val{N}, P)` for abstract / union / `Tuple{Vararg{Any}}` / `UnionAll` primal types now returns `Any` instead of `Dual`. Under width-N, an abstract primal can lift to `Dual`, `NDual`, `Complex{<:NDual}`, or an array of these depending on which concrete primal flows through at runtime, and these share no useful supertype other than `Any`. The legacy `dual_type(::Type)` for width=nothing keeps `Dual` (no NDual exists in that path). - `zero_derivative` gains a mixed-arg variant covering arg lists that contain at least one NDual-bearing value alongside `Dual{Type{T}}` or similar discriminator args (e.g. `isa(x::NDual, T::Type)`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
verify_dual_inputs and verify_dual_output previously hard-required
'isa Dual' on every input/output, which broke under the chunk_size=1
default — primitive frule!!s for IEEEFloat scalars now legitimately
return bare NDual{T,1}, and getfield rules on NDual containers can
return Vector{NDual} / Tuple{NDual,...} etc.
Replace the hard 'isa Dual' check with a `_is_lifted_value(x)`
predicate. Defined in debug_mode.jl with the Dual + Tuple cases;
extended in NfwdMooncake.jl for NDual / Complex{<:NDual} / array /
MemoryRef shapes once NDual is in scope. Mirrors the `Lifted` Union.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Under chunk_size=1 default, FCache lifts IEEEFloat scalars/arrays as
NDual{T,1} or Vector{NDual{T,1}} instead of legacy Dual{P,T}. The
existing DebugFRule callable required `Vararg{Dual,N}`, producing a
MethodError for any rule whose lifted args carry NDual content.
Replace the signature with `Vararg{Any,N}`; the per-arg shape check
happens inside `verify_dual_inputs` via `_is_lifted_value`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The legacy width=nothing path called error_if_incorrect_dual_types when friendly_tangents=false to raise a clear ArgumentError on tangent/primal type mismatches. Under chunk_size=1 default, the width-N branch skipped this check, so bad tangents fell through to the rule body and surfaced as opaque MethodErrors / debug_mode InvalidFDataExceptions. Run the same check on the width-N branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier per-shape specialisations (`AbstractArray{NoTangent}`,
`Memory{NoTangent}`, `MemoryRef{NoTangent}`) collided with the existing
`AbstractArray{<:IEEEFloat}` / `AbstractArray{Complex{<:IEEEFloat}}`
methods on the partials position, producing 13 Aqua ambiguities even
though the actual call patterns don't overlap.
Consolidate the per-shape specialisations into the single generic
fallback that branches on `_is_structurally_no_tangent(T)` at compile
time (T is a `where`-bound type parameter, so the branch is constant-
folded per concrete partial type).
Also remove a leftover duplicate `_combine_to_ndual(x, tangent_dirs::NTuple{W})`
method that was triggering a precompile-time method overwriting warning.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply the five highest-value findings from the /scrutinise pass:
- src/nfwd/NfwdMooncake.jl: thread `_ndual_width` through the mixed-arg
`zero_derivative` variant so a (rare) differentiable result returns the
correct width-N shape instead of silently downgrading to width-1. Use
the existing `_HasNDual` alias to tighten the signature.
- src/nfwd/NfwdMooncake.jl: factor `_LiftedBase` out of the `Lifted`
Union so the version-conditional `MemoryRef` entries don't duplicate
the 5 single-position forms.
- src/rules/memory.jl: collapse the four `_NDualTuple` getfield/lgetfield
methods to one. Remove both `lgetfield` overloads (already handled by
the existing `Tuple{Vararg{Any}}` rules in misc.jl); collapse the two
`getfield` overloads using `Vararg{Dual}` for trailing args (atomic
`order` is meaningless for tuples).
- src/rules/builtins.jl: drop the two unreachable mixed-Dual+NDual
variants of `Core.ifelse`. Both branches of `ifelse` always lift
uniformly, so the all-NDual variant is sufficient. Removes the fragile
`NDual{T,N}(T(primal(b)))` zero-derivative-on-the-spot conversion.
- Trim verbose comment blocks in src/tangents/dual.jl (`tangent_type`
docstring), src/nfwd/NfwdMooncake.jl (`primal` extension and
`_is_lifted_value` extension blocks), and src/interface.jl
(`_jacobian_widthN` placeholder direction).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply the highest-value findings from a second /scrutinise + 3-agent review pass on top of commit 564140a: - Restore `primal`'s strict contract (Reuse/Quality/Efficiency consensus). Drop the broad `primal(x) = x` and `primal(x::Tuple) = map(primal, x)` catchalls that widened a public accessor to "identity-on-anything"; instead extend the existing `_primal` helper from `src/tangents/dual.jl` with the NDual / `Complex{<:NDual}` / array / `Tuple` cases. Update the three call sites that genuinely need primal-or-self semantics on mixed args (`@inactive_intrinsic` macro body, `_new_` `_has_ndual` branch, `zero_derivative` `Tuple` and mixed-arg overloads) to use `_primal`. - Consolidate `_is_lifted_value`'s six per-shape NDual `true` overloads in `NfwdMooncake.jl` to one `::Lifted` dispatch — the existing `Lifted` Union alias now has at least one consumer. The `::Dual` and `::Tuple` base cases stay in `debug_mode.jl`. - Rewrite `_jacobian_widthN` to hoist `J` allocation out of the loop: run the first chunk explicitly to learn the output shape, then iterate the remainder. Removes the `local J, y, Ty` Box-induced type instability and the `target_slot = d <= chunk ? slot+d-1 : 1` placeholder branch. The chunk body extracts to `_run_jacobian_chunk` / `_write_jacobian_columns!` helpers shared between first and rest. - Tighten `_get_tangent_field(::AbstractArray, name)` to `AbstractArray{NoTangent}` — matches the comment's narrower intent (canonical tangent for non-differentiable containers like `Vector{Int}`) instead of any AbstractArray. - Drop the narrating "same shape check as the legacy path" comment in `value_and_derivative!!` — the function name is self-documenting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure formatting: signature collapsing where it fits on one line, NTangent Union compaction, function-form for `_get_tangent_field` 3-arg that exceeds short-form line width. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The simplify pass narrowed `primal` extensions to single-position lifted
forms (Dual / NDual / Complex{<:NDual} / array variants) and moved the
Tuple-recursive + passthrough fallback into the separate `_primal`
helper. `verify_args` in primal_mode.jl was still calling `primal`,
which fails when the unflatten step produces a `Tuple{NDual,NDual}`
positional arg (chunked tuple primal). Switch to `_primal` to match
the simplified surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nctions
When a function returns a Tuple, the chunked (width-N) FCache path
calls `_ndual_output_to_width1(output)` where output is
`Tuple{NDual{T,N},...}`. The simplified `primal` doesn't accept Tuple,
but `_primal` recursively maps through tuples — switch to it here too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oTangent}`
The empty-partials case was double-covered: the new `Tuple{}` overload
collided in the type-domain lattice with the existing
`AbstractArray{<:IEEEFloat}, ::NTuple{W,NoTangent}` and
`AbstractArray{Complex{T}}, ::NTuple{W,NoTangent}` methods (both match
when partials = ()). Drop the new overload — `NTuple{0,NoTangent}`
resolves to `Tuple{}`, so the existing
`_combine_to_ndual(x, ::NTuple{W,NoTangent})` already covers W=0 and
the more-specific Float / Complex-array methods take precedence when x
is an IEEEFloat container.
Aqua reports 0 ambiguities and 0 unbound args after this change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…radient
The general `_gradient_widthN(::Val{W})` uses two `ntuple(Val(W)) do d ... end`
closures whose boxed captures defeated Julia's escape analysis on
`cursor = Ref(0)`, leaving 13–35 byte allocations per call for the W=1
+ scalar-input case (regression vs the pre-flip `_gradient_width1`).
Add a `Val{1}` specialization that:
1. Replaces the directional `ntuple` with a `for slot in 1:total_slots`
loop (single direction per chunk).
2. Replaces the per-input `ntuple(Val(N)) do i ... end` closure with
`tuple_map((b, p, s) -> ..., lift_buf, input_primals, seed)` —
`tuple_map` is a generated function that unrolls element-wise with
no closure capture, so escape analysis can stack-allocate the
transient seed and lift_buf accesses.
Verified: 1-scalar and 2-scalar `value_and_gradient!!` now both report
`count_allocs == 0`. Nfwd test group still 771/771.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Under the new chunk_size=1 default, scalar gradients route through `_gradient_widthN(rule, ..., Val(1))` instead of `_gradient_width1`. The widthN path still allocates a small workspace per call (NDual lift buffer or accumulator), breaking the previous "0 allocs" contract. Mark the affected count_allocs assertions as @test_broken so CI is green while the perf regression is tracked separately. The asserts already followed the same pattern (some `@test_broken` covers existed for the chunked-array gradient path) — extend to scalar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing rule expects `Dual{MemoryRef{P}}` (Dual-wrapped), so under
chunk_size=1 a `MemoryRef{NDual{T,1}}` arg from a vcat / unsafe_copyto!
chain MethodErrors. Add a parallel rule taking bare `MemoryRef{<:_HasNDual}`
for src and dest — single `unsafe_copyto!` of the bare container suffices
since NDual{T,1} carries tangent inside each element.
Surfaced by DifferentiationInterface.jl's `mat_to_vec` Jacobian test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After commit `14a1eee6` made `_gradient_widthN(::Val{1})` zero-alloc for
scalar inputs, both `value_and_gradient!! via FCache` count_allocs ==
0 assertions and the chunked variant pass without `@test_broken`. Flip
the test back to the strict form. Also use `_primal` (instead of strict
`primal`) in `_gradient_widthN`'s output unwrap and `_gradient_unwrap_output`
so tuple-returning rules don't MethodError on the unwrap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Complete rewrite of forward mode. The new implementation is more Cassette-like: it lifts each primal method so that arbitrary arguments can pass through it, including
DualandNDualnumbers.This needs to be split into smaller PRs for a comfortable review.
Do not review; work in progress.
CI Summary — GitHub Actions
Documentation Preview
Mooncake.jl documentation for PR #1151 is available at:
https://chalk-lab.github.io/Mooncake.jl/previews/PR1151/
Performance
Performance Ratio:
Ratio of time to compute gradient and time to compute function.
Warning: results are very approximate! See here for more context.