You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Revisiting the Adam-SGD Gap in LLM Pre-Training: The Role of Large Effective Learning Rates
Preliminary code release for our paper "Revisiting the Adam-SGD Gap in LLM Pre-Training: The Role of Large Effective Learning Rates", by Athanasios Glentis, Dawei Li, Chung-Yiu Yau and Mingyi Hong.