https://github.com/tensorflow/mesh/blob/fbf7b1e547e8b8cb134e81e1cd350c312c0b5a16/mesh_tensorflow/transformer/moe.py#L935 I try load-balanced loss in my project and find load-balanced loss does not help loss converge. Does it only balance the load, but does not help the loss convergence, or even slightly hurt the model?
mesh/mesh_tensorflow/transformer/moe.py
Line 935 in fbf7b1e
I try load-balanced loss in my project and find load-balanced loss does not help loss converge.
Does it only balance the load, but does not help the loss convergence, or even slightly hurt the model?