-
Notifications
You must be signed in to change notification settings - Fork 49
(neurons) Add warmup_inner_steps LR scaling #650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds Changes
Sequence Diagram(s)sequenceDiagram
participant Config as Configuration
participant Trainer as Trainer
participant Scheduler as Inner Scheduler
participant Optimizer as Inner Optimizer
Trainer->>Config: read warmup_inner_steps
loop each inner step
Trainer->>Trainer: check flatten_window / null_round
alt warmup allowed
Trainer->>Scheduler: is warmup (warmup_steps_taken < warmup_inner_steps)?
alt in warmup
Trainer->>Optimizer: apply warmup scale (LR *= (step+1)/warmup_inner_steps)
Trainer->>Optimizer: step()
Trainer->>Optimizer: restore original LR
else warmup complete
Trainer->>Optimizer: step() (normal LR)
end
else not allowed
Trainer->>Optimizer: step() (skip warmup scaling)
end
Trainer->>Scheduler: increment warmup_steps_taken / inner scheduler step
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (57.91%) is below the target coverage (85.00%). You can increase the head coverage or adjust the target coverage. @@ Coverage Diff @@
## dev #650 +/- ##
=======================================
Coverage 57.91% 57.91%
=======================================
Files 27 27
Lines 4890 4890
=======================================
Hits 2832 2832
Misses 2058 2058 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
neurons/trainer.py (1)
136-142: Configuration reading looks correct.The warmup_inner_steps retrieval correctly navigates the nested config structure and defaults to 0 when not specified.
Consider adding validation and logging for better debugging:
# Get warmup_inner_steps from config optimizer_config = getattr(self.hparams, "optimizer", {}) optimizer_type = optimizer_config.get("type", "adamw").lower() opt_config = optimizer_config.get(optimizer_type, {}) scheduler_config = opt_config.get("scheduler", {}) self.warmup_inner_steps = scheduler_config.get("warmup_inner_steps", 0) + +# Validate warmup_inner_steps +if not isinstance(self.warmup_inner_steps, int) or self.warmup_inner_steps < 0: + raise ValueError(f"warmup_inner_steps must be a non-negative integer, got {self.warmup_inner_steps}") + +if self.warmup_inner_steps > 0: + tplr.logger.info(f"[Init] Inner warmup enabled: {self.warmup_inner_steps} steps")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
hparams/hparams.json(2 hunks)neurons/trainer.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
🔇 Additional comments (2)
hparams/hparams.json (1)
64-64: LGTM: Backward-compatible warmup configuration added.The addition of
warmup_inner_stepswith a default value of 0 correctly disables the warmup feature by default, ensuring backward compatibility. The configuration is consistently applied to both optimizer types (adamw and muon).Also applies to: 80-80
neurons/trainer.py (1)
874-900: Interaction is intentional by design; multiplicative compounding is expected.The commit message confirms "The warmup happens independently of the scheduler's existing warmup, allowing gradual LR ramp-up at the start of training while preserving the scheduler's baseline behavior." "Independently" refers to step-count independence—the inner warmup does not affect scheduler step increments. The multiplicative LR effect during overlap is intentional: the scheduler sets the base LR for each step, and the inner warmup temporarily scales it (multiplying by
(step+1)/N) for the actual optimizer step, then restores it before the scheduler advances.Since
warmup_inner_stepsdefaults to 0 across all configs, this feature is currently disabled and has no impact on training. No tests verify the behavior whenwarmup_inner_steps > 0, but the implementation correctly preserves scheduler independence while allowing gradual early-phase LR scaling.
Implement warmup_inner_steps parameter that applies linear LR scaling over the first N inner steps after initialization. Scales from 1/N to N/N while preserving scheduler progression. - Initialize warmup_inner_steps and warmup_steps_taken counter - Apply LR scaling before optimizer.step() during warmup period - Store and restore original LRs to avoid floating point drift - Increment warmup counter only during non-null rounds - Warmup counter resets on restart/bootstrap for consistent behavior - Scheduler continues stepping normally throughout warmup Warmup applies multiplicatively to scheduler's current LR, allowing gradual ramp-up at training start or after restart.
Add comprehensive test coverage for warmup LR scaling functionality. Tests verify initialization, scaling behavior, and edge cases. - Test warmup initialization from config with default fallback - Test LR scaling at first, middle, last, and post-warmup steps - Test warmup counter increments and stops after warmup period - Test warmup resets on restart/bootstrap initialization - Test warmup disabled when warmup_inner_steps=0 - Test warmup works with both adamw and muon optimizers - Test LR precision (no floating point drift from store/restore) - Test multiple param groups with different base LRs
25b7e78 to
a293187
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/unit/test_warmup.py (2)
54-56: Minor redundancy: warmup_inner_steps already in config.The
warmup_inner_stepsvalue is already set in the hparams config at line 33. This explicit assignment is redundant but harmless.Consider removing this line since the value is already configured:
- # Set warmup_inner_steps - trainer.warmup_inner_steps = 100 - return trainer
123-149: Minor inconsistency: check both param groups.Line 149 only checks
param_groups[0], while other similar tests check both param groups for consistency. Consider checking both for completeness.Add assertion for the second param group:
# Should be restored exactly assert trainer.inner_optimizer.param_groups[0]["lr"] == base_lr + assert trainer.inner_optimizer.param_groups[1]["lr"] == base_lr
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
hparams/hparams.json(2 hunks)neurons/trainer.py(2 hunks)tests/unit/test_warmup.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- neurons/trainer.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/test_warmup.py (2)
tests/test_trainer_meta_init.py (1)
trainer(17-27)neurons/trainer.py (1)
Trainer(41-1034)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
🔇 Additional comments (5)
hparams/hparams.json (2)
64-64: LGTM! Clean config addition.The addition of
warmup_inner_stepswith a default of 0 ensures backward compatibility while enabling the new warmup feature when explicitly configured.
80-80: LGTM! Consistent with adamw config.The muon scheduler config correctly mirrors the warmup_inner_steps addition, maintaining consistency across optimizer types.
tests/unit/test_warmup.py (3)
1-14: LGTM! Clean test setup.The imports and mock initialization approach are appropriate for unit testing the warmup functionality in isolation.
60-90: LGTM! Thorough initialization testing.Both tests correctly verify warmup_inner_steps initialization from config and the default fallback behavior.
195-358: Excellent test coverage of edge cases and scenarios.These tests thoroughly validate:
- Counter increment logic and boundaries
- Restart/re-initialization behavior
- Disabled state (warmup_inner_steps=0)
- Multi-optimizer support (muon)
- Floating-point precision preservation
- Multiple parameter groups with different LRs
The comprehensive coverage provides strong confidence in the warmup implementation.
Add warmup_inner_steps parameter to enable linear LR scaling over the
first N inner steps of training. This scales the learning rate from 1/N
to N/N without affecting the scheduler's step count.
The warmup happens independently of the scheduler's existing warmup,
allowing gradual LR ramp-up at the start of training while preserving
the scheduler's baseline behavior.
Description
Related Issue(s)
Type of Change
Branch Naming
Commit Messages
Code Quality
Testing
Documentation
If this is a breaking change
Screenshots/Examples
Additional Notes
Summary by CodeRabbit
New Features
Tests