Skip to content

fix one big and 2 small bugs in training#24

Open
bkal01 wants to merge 2 commits intomainfrom
bhavesh/fix-lr-scheduler
Open

fix one big and 2 small bugs in training#24
bkal01 wants to merge 2 commits intomainfrom
bhavesh/fix-lr-scheduler

Conversation

@bkal01
Copy link
Collaborator

@bkal01 bkal01 commented Mar 11, 2026

steps were being computed incorrectly so the LR scheduler was configured wrong. this means that the entire llava-pretrain run would just be in warmup

fixed some logging/saving logic to be more aligned with actual optimizer steps

added seed for reproducibility

fixed LR scheduler configuration on resuming training

fix some small bugs in training

steps were being computed incorrectly so the LR scheduler was configured wrong. this means that the entire llava-pretrain run would just be in warmup

fixed some logging/saving logic to be more aligned with actual optimizer steps

added seed for reproducibility

fixed LR scheduler configuration on resuming training

swapped torch.compile and gradient checkpointing order so torch.compile gets the proper final computational graph
@bkal01 bkal01 force-pushed the bhavesh/fix-lr-scheduler branch from cd4b3ea to d29da76 Compare March 11, 2026 16:18
@bkal01 bkal01 marked this pull request as ready for review March 11, 2026 16:19
@bkal01 bkal01 force-pushed the bhavesh/fix-lr-scheduler branch from bdc66c5 to 8fd683c Compare March 11, 2026 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant