Due to their lack of backward pass, forward-mode autograd often has considerably different implementation properties than reverse-mode autograd. Given its different performance tradeoffs, I wonder whether forward-mode transformation could be more friendly for autograd compilers than reverse-mode (like Mooncake/Zygote), or at least compensate for some extreme cases of reverse-mode autograd.
For example, the sum_1000 example is a vector-input scalar-output function, which is a perfect example for forward-mode autograd but likely hard (at least requiring significantly more compiler optimization efforts) for the reverse-mode compiler to work well. This advantage goes further if we have chunk-mode forward-mode autograd.
I am talking about the source-to-source approach for both forward- and reverse-mode autograd implementations.
Due to their lack of backward pass, forward-mode autograd often has considerably different implementation properties than reverse-mode autograd. Given its different performance tradeoffs, I wonder whether forward-mode transformation could be more friendly for autograd compilers than reverse-mode (like Mooncake/Zygote), or at least compensate for some extreme cases of reverse-mode autograd.
For example, the
sum_1000example is a vector-input scalar-output function, which is a perfect example for forward-mode autograd but likely hard (at least requiring significantly more compiler optimization efforts) for the reverse-mode compiler to work well. This advantage goes further if we have chunk-mode forward-mode autograd.I am talking about the source-to-source approach for both forward- and reverse-mode autograd implementations.