(neurons) Add warmup_inner_steps LR scaling #650

joellidin · 2025-11-08T19:33:02Z

Add warmup_inner_steps parameter to enable linear LR scaling over the
first N inner steps of training. This scales the learning rate from 1/N
to N/N without affecting the scheduler's step count.

Add warmup_inner_steps to adamw and muon scheduler configs
Initialize warmup_inner_steps from hparams in Trainer.init
Apply LR scaling before optimizer.step() during warmup period
Restore original LR after step to avoid affecting scheduler
Only apply warmup during non-null rounds when optimizer steps
Default to 0 (disabled) for backward compatibility

The warmup happens independently of the scheduler's existing warmup,
allowing gradual LR ramp-up at the start of training while preserving
the scheduler's baseline behavior.

Description

Related Issue(s)

Closes #[issue number]

Type of Change

Feature (adding new functionality)
Fix (resolving a bug or issue)
Docs (documentation updates)
Refactor (code changes that don't affect functionality)
Maintenance (dependency updates or other maintenance)
Tests (adding or improving tests)
Breaking change (fix or feature with incompatible API changes)
Other: _____

Branch Naming

My branch follows the project's naming convention (e.g., feature/add-new-capability)

Commit Messages

My commits are small, atomic, and have proper commit messages
Commit messages are in imperative mood with a capitalized summary under 50 chars

Code Quality

I've performed a self-review of my code
I've added appropriate docstrings following the project's conventions
I've added proper logging where necessary (without trailing periods)
I've applied linting and formatting with Ruff
My code generates no new warnings

Testing

I've added tests for new functionality or bug fixes
All tests pass locally with my changes
Test coverage has not decreased

Documentation

I've updated documentation to reflect my changes
I've updated comments in hard-to-understand areas

If this is a breaking change

Screenshots/Examples

Additional Notes

Summary by CodeRabbit

New Features
- Added a new warmup_inner_steps configuration for AdamW and Muon schedulers.
- Implemented inner-optimizer learning-rate warmup with automatic restoration after each inner step.
Tests
- Added comprehensive unit tests validating initialization, scaling/restoration of LR, counter behavior, restart handling, multi-group scaling, and precision preservation.

coderabbitai · 2025-11-08T19:33:10Z

Walkthrough

Adds warmup_inner_steps to scheduler configs and implements runtime inner-optimizer warmup handling in the Trainer, scaling LRs during inner warmup steps and restoring them after each inner step. Adds unit tests covering initialization, scaling, restoration, counters, and restart behavior.

Changes

Cohort / File(s)	Summary
Configuration `hparams/hparams.json`	Adds new `warmup_inner_steps` field (set to 0) under `optimizer.adamw.scheduler` and `optimizer.muon.scheduler`.
Trainer logic `neurons/trainer.py`	Reads `warmup_inner_steps` and tracks `warmup_steps_taken`; scales inner-optimizer LR during warmup steps (when allowed), steps optimizer, restores original LRs, and updates scheduler/step counters. Conditional checks respect flatten-window/null-round semantics.
Tests `tests/unit/test_warmup.py`	New comprehensive unit tests validating initialization defaults, LR scaling/restoration across warmup steps, counter behavior, restart/reset semantics, multi-group scaling, optimizer variants (adamw/muon), and floating-point stability.

Sequence Diagram(s)

sequenceDiagram
    participant Config as Configuration
    participant Trainer as Trainer
    participant Scheduler as Inner Scheduler
    participant Optimizer as Inner Optimizer

    Trainer->>Config: read warmup_inner_steps
    loop each inner step
        Trainer->>Trainer: check flatten_window / null_round
        alt warmup allowed
            Trainer->>Scheduler: is warmup (warmup_steps_taken < warmup_inner_steps)?
            alt in warmup
                Trainer->>Optimizer: apply warmup scale (LR *= (step+1)/warmup_inner_steps)
                Trainer->>Optimizer: step()
                Trainer->>Optimizer: restore original LR
            else warmup complete
                Trainer->>Optimizer: step() (normal LR)
            end
        else not allowed
            Trainer->>Optimizer: step() (skip warmup scaling)
        end
        Trainer->>Scheduler: increment warmup_steps_taken / inner scheduler step
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Attention areas:
- LR scale and restore correctness for multiple param groups and floating-point stability
- Correct initialization and reset of warmup_steps_taken on restart/re-init
- Interaction with existing scheduler stepping and flatten-window/null-round conditions

Possibly related PRs

feat/plateu lr #642: Modifies inner-scheduler stepping and skip logic; strongly related to warmup bookkeeping and inner_scheduler step control.

Poem

I’m a rabbit, tiny and spry, 🐇
I nudge the rates to help them try,
Scale up, step, then back in tune,
Warmup bright beneath the moon,
Hop on — models learn by noon!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description provides detailed information about what was added and why, but the template checkboxes are not completed, which is a template compliance issue.	Complete the template by checking the relevant 'Type of Change' checkbox (Feature), confirming branch naming convention, and marking completed checklist items (self-review, tests added, etc.).

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding warmup_inner_steps parameter for LR scaling, which is clearly reflected in the code changes across hparams.json, trainer.py, and tests.
Docstring Coverage	✅ Passed	Docstring coverage is 88.24% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/warmup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-11-08T19:35:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (57.91%) is below the target coverage (85.00%). You can increase the head coverage or adjust the target coverage.

@@           Coverage Diff           @@
##              dev     #650   +/-   ##
=======================================
  Coverage   57.91%   57.91%           
=======================================
  Files          27       27           
  Lines        4890     4890           
=======================================
  Hits         2832     2832           
  Misses       2058     2058

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

neurons/trainer.py (1)

136-142: Configuration reading looks correct.

The warmup_inner_steps retrieval correctly navigates the nested config structure and defaults to 0 when not specified.

Consider adding validation and logging for better debugging:

 # Get warmup_inner_steps from config
 optimizer_config = getattr(self.hparams, "optimizer", {})
 optimizer_type = optimizer_config.get("type", "adamw").lower()
 opt_config = optimizer_config.get(optimizer_type, {})
 scheduler_config = opt_config.get("scheduler", {})
 self.warmup_inner_steps = scheduler_config.get("warmup_inner_steps", 0)
+
+# Validate warmup_inner_steps
+if not isinstance(self.warmup_inner_steps, int) or self.warmup_inner_steps < 0:
+    raise ValueError(f"warmup_inner_steps must be a non-negative integer, got {self.warmup_inner_steps}")
+
+if self.warmup_inner_steps > 0:
+    tplr.logger.info(f"[Init] Inner warmup enabled: {self.warmup_inner_steps} steps")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 080905b and 25b7e78.

📒 Files selected for processing (2)

hparams/hparams.json (2 hunks)
neurons/trainer.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: test (3.11)
GitHub Check: test (3.12)

🔇 Additional comments (2)

hparams/hparams.json (1)

64-64: LGTM: Backward-compatible warmup configuration added.

The addition of warmup_inner_steps with a default value of 0 correctly disables the warmup feature by default, ensuring backward compatibility. The configuration is consistently applied to both optimizer types (adamw and muon).

Also applies to: 80-80

neurons/trainer.py (1)

874-900: Interaction is intentional by design; multiplicative compounding is expected.

The commit message confirms "The warmup happens independently of the scheduler's existing warmup, allowing gradual LR ramp-up at the start of training while preserving the scheduler's baseline behavior." "Independently" refers to step-count independence—the inner warmup does not affect scheduler step increments. The multiplicative LR effect during overlap is intentional: the scheduler sets the base LR for each step, and the inner warmup temporarily scales it (multiplying by (step+1)/N) for the actual optimizer step, then restores it before the scheduler advances.

Since warmup_inner_steps defaults to 0 across all configs, this feature is currently disabled and has no impact on training. No tests verify the behavior when warmup_inner_steps > 0, but the implementation correctly preserves scheduler independence while allowing gradual early-phase LR scaling.

neurons/trainer.py

Implement warmup_inner_steps parameter that applies linear LR scaling over the first N inner steps after initialization. Scales from 1/N to N/N while preserving scheduler progression. - Initialize warmup_inner_steps and warmup_steps_taken counter - Apply LR scaling before optimizer.step() during warmup period - Store and restore original LRs to avoid floating point drift - Increment warmup counter only during non-null rounds - Warmup counter resets on restart/bootstrap for consistent behavior - Scheduler continues stepping normally throughout warmup Warmup applies multiplicatively to scheduler's current LR, allowing gradual ramp-up at training start or after restart.

Add comprehensive test coverage for warmup LR scaling functionality. Tests verify initialization, scaling behavior, and edge cases. - Test warmup initialization from config with default fallback - Test LR scaling at first, middle, last, and post-warmup steps - Test warmup counter increments and stops after warmup period - Test warmup resets on restart/bootstrap initialization - Test warmup disabled when warmup_inner_steps=0 - Test warmup works with both adamw and muon optimizers - Test LR precision (no floating point drift from store/restore) - Test multiple param groups with different base LRs

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

tests/unit/test_warmup.py (2)
54-56: Minor redundancy: warmup_inner_steps already in config.

The warmup_inner_steps value is already set in the hparams config at line 33. This explicit assignment is redundant but harmless.

Consider removing this line since the value is already configured:
-    # Set warmup_inner_steps
-    trainer.warmup_inner_steps = 100
-
     return trainer
123-149: Minor inconsistency: check both param groups.

Line 149 only checks param_groups[0], while other similar tests check both param groups for consistency. Consider checking both for completeness.

Add assertion for the second param group:
     # Should be restored exactly
     assert trainer.inner_optimizer.param_groups[0]["lr"] == base_lr
+    assert trainer.inner_optimizer.param_groups[1]["lr"] == base_lr

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25b7e78 and a293187.

📒 Files selected for processing (3)

hparams/hparams.json (2 hunks)
neurons/trainer.py (2 hunks)
tests/unit/test_warmup.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

neurons/trainer.py

🧰 Additional context used

🧬 Code graph analysis (1)

tests/unit/test_warmup.py (2)

tests/test_trainer_meta_init.py (1)

trainer (17-27)

neurons/trainer.py (1)

Trainer (41-1034)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: test (3.11)
GitHub Check: test (3.12)

🔇 Additional comments (5)

hparams/hparams.json (2)

64-64: LGTM! Clean config addition.

The addition of warmup_inner_steps with a default of 0 ensures backward compatibility while enabling the new warmup feature when explicitly configured.

80-80: LGTM! Consistent with adamw config.

The muon scheduler config correctly mirrors the warmup_inner_steps addition, maintaining consistency across optimizer types.

tests/unit/test_warmup.py (3)

1-14: LGTM! Clean test setup.

The imports and mock initialization approach are appropriate for unit testing the warmup functionality in isolation.

60-90: LGTM! Thorough initialization testing.

Both tests correctly verify warmup_inner_steps initialization from config and the default fallback behavior.

195-358: Excellent test coverage of edge cases and scenarios.

These tests thoroughly validate:

Counter increment logic and boundaries

Restart/re-initialization behavior

Disabled state (warmup_inner_steps=0)

Multi-optimizer support (muon)

Floating-point precision preservation

Multiple parameter groups with different LRs

The comprehensive coverage provides strong confidence in the warmup implementation.

coderabbitai bot reviewed Nov 8, 2025

View reviewed changes

neurons/trainer.py Outdated Show resolved Hide resolved

joellidin added 2 commits November 9, 2025 16:23

joellidin force-pushed the feat/warmup branch from 25b7e78 to a293187 Compare November 9, 2025 12:26

joellidin requested a review from shivam-MBZUAI November 9, 2025 12:26

coderabbitai bot reviewed Nov 9, 2025

View reviewed changes

joellidin merged commit 81e37e0 into dev Nov 9, 2025
7 of 8 checks passed

joellidin deleted the feat/warmup branch November 9, 2025 13:07

coderabbitai bot mentioned this pull request Nov 9, 2025

v2.1.15 #652

Merged

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(neurons) Add warmup_inner_steps LR scaling #650

(neurons) Add warmup_inner_steps LR scaling #650

Uh oh!

joellidin commented Nov 8, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 8, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 8, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

(neurons) Add warmup_inner_steps LR scaling #650

(neurons) Add warmup_inner_steps LR scaling #650

Uh oh!

Conversation

joellidin commented Nov 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue(s)

Type of Change

Branch Naming

Commit Messages

Code Quality

Testing

Documentation

If this is a breaking change

Screenshots/Examples

Additional Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joellidin commented Nov 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 8, 2025 •

edited

Loading

codecov bot commented Nov 8, 2025 •

edited

Loading