I'm running some tests in QuantumKitHub/TensorKit.jl#380 where the Enzyme tests take hours (!!!) and it's 99% compilation. Looking at the output of --trace-compile --trace-compile-timing it seems a LOT of time is spent in https://github.com/EnzymeAD/Enzyme.jl/blob/main/lib/EnzymeCore/src/rules.jl#L351 , where the parent function has a @nospecialize annotation but the interior composed function + map doesn't. This can seemingly be a big problem if your package has a lot of custom types running around (as TensorKit does). Is there any chance someone a bit more familiar with the logic here could help prevent so many custom specializations? @vchuravy suggested making tt a vector or for loop.
I'm running some tests in QuantumKitHub/TensorKit.jl#380 where the Enzyme tests take hours (!!!) and it's 99% compilation. Looking at the output of
--trace-compile --trace-compile-timingit seems a LOT of time is spent in https://github.com/EnzymeAD/Enzyme.jl/blob/main/lib/EnzymeCore/src/rules.jl#L351 , where the parent function has a@nospecializeannotation but the interior composed function +mapdoesn't. This can seemingly be a big problem if your package has a lot of custom types running around (as TensorKit does). Is there any chance someone a bit more familiar with the logic here could help prevent so many custom specializations? @vchuravy suggested makingtta vector or for loop.