Skip to content

fix umax issue#2795

Open
minansys wants to merge 8 commits intomainfrom
mixu/umax
Open

fix umax issue#2795
minansys wants to merge 8 commits intomainfrom
mixu/umax

Conversation

@minansys
Copy link
Copy Markdown
Collaborator

Enzyme Crash: invertPointerM assertion failure on @llvm.umax.i32 with -enzyme-loose-types

Summary

This is to fix #2794

Enzyme crashes with assert(0 && "cannot find deal with ptr that isnt arg") in GradientUtils::invertPointerM() when differentiating code that uses the @llvm.umax.i32 intrinsic in reverse mode with -enzyme-loose-types=1 enabled.

Root Cause

When a struct passed as enzyme_dup contains both floating-point and integer pointer fields, and the function stores the result of @llvm.umax.i32 to an integer pointer loaded from that struct:

  1. visitCommonStore handles the store of the i32 umax result to the dup pointer
  2. With loose types, Enzyme treats the stored i32 value as potentially active
  3. visitCommonStore calls invertPointerM() on the i32 umax result to obtain its shadow
  4. invertPointerM() cannot find or create a shadow for the @llvm.umax.i32 call result
  5. The function falls through to the assertion at GradientUtils.cpp:6601

The @llvm.umax.i32 intrinsic arises from SerialOperations::AtomicMax(dest, flag), which the LLVM optimizer lowers to a load → @llvm.umax.i32 → store pattern.

Reproducer

LLVM IR reproducer (standalone)

File: test/Enzyme/ReverseMode/loosetypes_umax.ll

opt -load-pass-plugin=LLVMEnzyme-21.so \
    -passes="enzyme" -enzyme-preopt=false -enzyme-loose-types=1 \
    -S loosetypes_umax.ll

The IR models a struct with float* output data, i32* flags, and a float scalar. The function:

  1. Loads and computes active float values
  2. Derives an i32 flag from a float comparison
  3. Loads existing i32 from the flags pointer, calls @llvm.umax.i32, stores result back

Original failing code

Cromwell's ComputeAlpha.cxx boundary operation using Oak's DualFixedNeighborMapIteration reverse-mode template with SerialOperations.

The explicit template instantiation flux_map_kernel_cuda_reverse<OperationBoundaryComputeAlpha, ...> triggers the bug during x86 host compilation when the __enzyme_autodiff call inside the reverse kernel is processed with -enzyme-loose-types=1.

Stack trace

GradientUtils::invertPointerM(...)
  GradientUtils.cpp:6601: Assertion `0 && "cannot find deal with ptr that isnt arg"' failed.
AdjointGenerator::visitCommonStore(...)
AdjointGenerator::visitStoreInst(...)
EnzymeLogic::CreatePrimalAndGradient(...)

Proposed Fix

In GradientUtils::invertPointerM(), before the CustomErrorHandler fallback and the assertion, add a guard for integer/integer-vector values under loose type analysis:

if (looseTypeAnalysis && oval->getType()->isIntOrIntVectorTy()) {
    auto *shadow = Constant::getNullValue(getShadowType(oval->getType()));
    invertedPointers.insert(
        std::make_pair((const Value *)oval, InvertedPointerVH(this, shadow)));
    return shadow;
}

This returns a zero shadow for integer values that cannot have meaningful floating-point derivatives, which is correct because:

  • Under loose types, integer stores to dup pointers are processed by visitCommonStore
  • The integer flag values (derived from float comparisons) have zero derivative
  • A null constant shadow is the correct adjoint for non-differentiable integer operations

Affected versions

  • Enzyme v0.0.256 with LLVM 21
  • Likely affects all Enzyme versions with -enzyme-loose-types support

Test

The regression test test/Enzyme/ReverseMode/loosetypes_umax.ll guards against this issue.

Comment thread enzyme/Enzyme/GradientUtils.cpp Outdated
return Constant::getNullValue(getShadowType(oval->getType()));
}

if (looseTypeAnalysis && oval->getType()->isIntOrIntVectorTy()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong place for the fix imo, you should instead make a special binop case for umax (like there is for (inteegr) Add, etc)

switch (II->getIntrinsicID()) {
default:
goto end;
#if LLVM_VERSION_MAJOR >= 12
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't modify invertPointerM

instead do a special case handling for these intrinsics in visitIntrinsicInst, similar to how the binary operator handling is for some interger instructions like on AdjointGenerator.h:2614

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

crash at GradientUtils::invertPointerM

2 participants