Skip to content

Fix division by zero in LR scheduler when max_steps equals warmup_steps#2212

Open
Br1an67 wants to merge 2 commits into
Lightning-AI:mainfrom
Br1an67:fix/issue-1393-lr-scheduler-validation
Open

Fix division by zero in LR scheduler when max_steps equals warmup_steps#2212
Br1an67 wants to merge 2 commits into
Lightning-AI:mainfrom
Br1an67:fix/issue-1393-lr-scheduler-validation

Conversation

@Br1an67
Copy link
Copy Markdown
Contributor

@Br1an67 Br1an67 commented Mar 9, 2026

Fixes #1393

Summary

This PR fixes a division by zero error that occurs in the learning rate scheduler when max_steps equals lr_warmup_steps. The issue was caused by the CosineAnnealingLR scheduler receiving a T_max value of 0, which results in a ZeroDivisionError during training.

Changes

  • Added validation in the get_lr_scheduler function across all finetune modules (adapter.py, adapter_v2.py, full.py, lora.py, lora_legacy.py)
  • The validation ensures that max_steps > warmup_steps before creating the scheduler
  • If validation fails, a clear error message is raised indicating the problematic values

Testing

The fix was verified by:

  1. Ensuring all modified Python files parse correctly
  2. The validation will catch the problematic case early with a clear error message instead of failing during training with a cryptic division by zero error

Files Changed

 litgpt/finetune/adapter.py     | 2 ++
 litgpt/finetune/adapter_v2.py  | 2 ++
 litgpt/finetune/full.py        | 2 ++
 litgpt/finetune/lora.py        | 2 ++
 litgpt/finetune/lora_legacy.py | 2 ++
 5 files changed, 10 insertions(+)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LR scheduler can result in a division by 0

2 participants