Skip to content

Conversation

@joellidin
Copy link
Collaborator

@joellidin joellidin commented Nov 8, 2025

Add warmup_inner_steps parameter to enable linear LR scaling over the
first N inner steps of training. This scales the learning rate from 1/N
to N/N without affecting the scheduler's step count.

  • Add warmup_inner_steps to adamw and muon scheduler configs
  • Initialize warmup_inner_steps from hparams in Trainer.init
  • Apply LR scaling before optimizer.step() during warmup period
  • Restore original LR after step to avoid affecting scheduler
  • Only apply warmup during non-null rounds when optimizer steps
  • Default to 0 (disabled) for backward compatibility

The warmup happens independently of the scheduler's existing warmup,
allowing gradual LR ramp-up at the start of training while preserving
the scheduler's baseline behavior.

Description

Related Issue(s)

  • Closes #[issue number]

Type of Change

  • Feature (adding new functionality)
  • Fix (resolving a bug or issue)
  • Docs (documentation updates)
  • Refactor (code changes that don't affect functionality)
  • Maintenance (dependency updates or other maintenance)
  • Tests (adding or improving tests)
  • Breaking change (fix or feature with incompatible API changes)
  • Other: _____

Branch Naming

  • My branch follows the project's naming convention (e.g., feature/add-new-capability)

Commit Messages

  • My commits are small, atomic, and have proper commit messages
  • Commit messages are in imperative mood with a capitalized summary under 50 chars

Code Quality

  • I've performed a self-review of my code
  • I've added appropriate docstrings following the project's conventions
  • I've added proper logging where necessary (without trailing periods)
  • I've applied linting and formatting with Ruff
  • My code generates no new warnings

Testing

  • I've added tests for new functionality or bug fixes
  • All tests pass locally with my changes
  • Test coverage has not decreased

Documentation

  • I've updated documentation to reflect my changes
  • I've updated comments in hard-to-understand areas

If this is a breaking change

Screenshots/Examples

Additional Notes

Summary by CodeRabbit

  • New Features

    • Added a new warmup_inner_steps configuration for AdamW and Muon schedulers.
    • Implemented inner-optimizer learning-rate warmup with automatic restoration after each inner step.
  • Tests

    • Added comprehensive unit tests validating initialization, scaling/restoration of LR, counter behavior, restart handling, multi-group scaling, and precision preservation.

@coderabbitai
Copy link

coderabbitai bot commented Nov 8, 2025

Walkthrough

Adds warmup_inner_steps to scheduler configs and implements runtime inner-optimizer warmup handling in the Trainer, scaling LRs during inner warmup steps and restoring them after each inner step. Adds unit tests covering initialization, scaling, restoration, counters, and restart behavior.

Changes

Cohort / File(s) Summary
Configuration
hparams/hparams.json
Adds new warmup_inner_steps field (set to 0) under optimizer.adamw.scheduler and optimizer.muon.scheduler.
Trainer logic
neurons/trainer.py
Reads warmup_inner_steps and tracks warmup_steps_taken; scales inner-optimizer LR during warmup steps (when allowed), steps optimizer, restores original LRs, and updates scheduler/step counters. Conditional checks respect flatten-window/null-round semantics.
Tests
tests/unit/test_warmup.py
New comprehensive unit tests validating initialization defaults, LR scaling/restoration across warmup steps, counter behavior, restart/reset semantics, multi-group scaling, optimizer variants (adamw/muon), and floating-point stability.

Sequence Diagram(s)

sequenceDiagram
    participant Config as Configuration
    participant Trainer as Trainer
    participant Scheduler as Inner Scheduler
    participant Optimizer as Inner Optimizer

    Trainer->>Config: read warmup_inner_steps
    loop each inner step
        Trainer->>Trainer: check flatten_window / null_round
        alt warmup allowed
            Trainer->>Scheduler: is warmup (warmup_steps_taken < warmup_inner_steps)?
            alt in warmup
                Trainer->>Optimizer: apply warmup scale (LR *= (step+1)/warmup_inner_steps)
                Trainer->>Optimizer: step()
                Trainer->>Optimizer: restore original LR
            else warmup complete
                Trainer->>Optimizer: step() (normal LR)
            end
        else not allowed
            Trainer->>Optimizer: step() (skip warmup scaling)
        end
        Trainer->>Scheduler: increment warmup_steps_taken / inner scheduler step
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Attention areas:
    • LR scale and restore correctness for multiple param groups and floating-point stability
    • Correct initialization and reset of warmup_steps_taken on restart/re-init
    • Interaction with existing scheduler stepping and flatten-window/null-round conditions

Possibly related PRs

  • feat/plateu lr #642: Modifies inner-scheduler stepping and skip logic; strongly related to warmup bookkeeping and inner_scheduler step control.

Poem

I’m a rabbit, tiny and spry, 🐇
I nudge the rates to help them try,
Scale up, step, then back in tune,
Warmup bright beneath the moon,
Hop on — models learn by noon!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description provides detailed information about what was added and why, but the template checkboxes are not completed, which is a template compliance issue. Complete the template by checking the relevant 'Type of Change' checkbox (Feature), confirming branch naming convention, and marking completed checklist items (self-review, tests added, etc.).
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding warmup_inner_steps parameter for LR scaling, which is clearly reflected in the code changes across hparams.json, trainer.py, and tests.
Docstring Coverage ✅ Passed Docstring coverage is 88.24% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/warmup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Nov 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (57.91%) is below the target coverage (85.00%). You can increase the head coverage or adjust the target coverage.

Impacted file tree graph

@@           Coverage Diff           @@
##              dev     #650   +/-   ##
=======================================
  Coverage   57.91%   57.91%           
=======================================
  Files          27       27           
  Lines        4890     4890           
=======================================
  Hits         2832     2832           
  Misses       2058     2058           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
neurons/trainer.py (1)

136-142: Configuration reading looks correct.

The warmup_inner_steps retrieval correctly navigates the nested config structure and defaults to 0 when not specified.

Consider adding validation and logging for better debugging:

 # Get warmup_inner_steps from config
 optimizer_config = getattr(self.hparams, "optimizer", {})
 optimizer_type = optimizer_config.get("type", "adamw").lower()
 opt_config = optimizer_config.get(optimizer_type, {})
 scheduler_config = opt_config.get("scheduler", {})
 self.warmup_inner_steps = scheduler_config.get("warmup_inner_steps", 0)
+
+# Validate warmup_inner_steps
+if not isinstance(self.warmup_inner_steps, int) or self.warmup_inner_steps < 0:
+    raise ValueError(f"warmup_inner_steps must be a non-negative integer, got {self.warmup_inner_steps}")
+
+if self.warmup_inner_steps > 0:
+    tplr.logger.info(f"[Init] Inner warmup enabled: {self.warmup_inner_steps} steps")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 080905b and 25b7e78.

📒 Files selected for processing (2)
  • hparams/hparams.json (2 hunks)
  • neurons/trainer.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.12)
🔇 Additional comments (2)
hparams/hparams.json (1)

64-64: LGTM: Backward-compatible warmup configuration added.

The addition of warmup_inner_steps with a default value of 0 correctly disables the warmup feature by default, ensuring backward compatibility. The configuration is consistently applied to both optimizer types (adamw and muon).

Also applies to: 80-80

neurons/trainer.py (1)

874-900: Interaction is intentional by design; multiplicative compounding is expected.

The commit message confirms "The warmup happens independently of the scheduler's existing warmup, allowing gradual LR ramp-up at the start of training while preserving the scheduler's baseline behavior." "Independently" refers to step-count independence—the inner warmup does not affect scheduler step increments. The multiplicative LR effect during overlap is intentional: the scheduler sets the base LR for each step, and the inner warmup temporarily scales it (multiplying by (step+1)/N) for the actual optimizer step, then restores it before the scheduler advances.

Since warmup_inner_steps defaults to 0 across all configs, this feature is currently disabled and has no impact on training. No tests verify the behavior when warmup_inner_steps > 0, but the implementation correctly preserves scheduler independence while allowing gradual early-phase LR scaling.

Implement warmup_inner_steps parameter that applies linear LR scaling
over the first N inner steps after initialization. Scales from 1/N to
N/N while preserving scheduler progression.

- Initialize warmup_inner_steps and warmup_steps_taken counter
- Apply LR scaling before optimizer.step() during warmup period
- Store and restore original LRs to avoid floating point drift
- Increment warmup counter only during non-null rounds
- Warmup counter resets on restart/bootstrap for consistent behavior
- Scheduler continues stepping normally throughout warmup

Warmup applies multiplicatively to scheduler's current LR, allowing
gradual ramp-up at training start or after restart.
Add comprehensive test coverage for warmup LR scaling functionality.
Tests verify initialization, scaling behavior, and edge cases.

- Test warmup initialization from config with default fallback
- Test LR scaling at first, middle, last, and post-warmup steps
- Test warmup counter increments and stops after warmup period
- Test warmup resets on restart/bootstrap initialization
- Test warmup disabled when warmup_inner_steps=0
- Test warmup works with both adamw and muon optimizers
- Test LR precision (no floating point drift from store/restore)
- Test multiple param groups with different base LRs
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/unit/test_warmup.py (2)

54-56: Minor redundancy: warmup_inner_steps already in config.

The warmup_inner_steps value is already set in the hparams config at line 33. This explicit assignment is redundant but harmless.

Consider removing this line since the value is already configured:

-    # Set warmup_inner_steps
-    trainer.warmup_inner_steps = 100
-
     return trainer

123-149: Minor inconsistency: check both param groups.

Line 149 only checks param_groups[0], while other similar tests check both param groups for consistency. Consider checking both for completeness.

Add assertion for the second param group:

     # Should be restored exactly
     assert trainer.inner_optimizer.param_groups[0]["lr"] == base_lr
+    assert trainer.inner_optimizer.param_groups[1]["lr"] == base_lr
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25b7e78 and a293187.

📒 Files selected for processing (3)
  • hparams/hparams.json (2 hunks)
  • neurons/trainer.py (2 hunks)
  • tests/unit/test_warmup.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • neurons/trainer.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/test_warmup.py (2)
tests/test_trainer_meta_init.py (1)
  • trainer (17-27)
neurons/trainer.py (1)
  • Trainer (41-1034)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.12)
🔇 Additional comments (5)
hparams/hparams.json (2)

64-64: LGTM! Clean config addition.

The addition of warmup_inner_steps with a default of 0 ensures backward compatibility while enabling the new warmup feature when explicitly configured.


80-80: LGTM! Consistent with adamw config.

The muon scheduler config correctly mirrors the warmup_inner_steps addition, maintaining consistency across optimizer types.

tests/unit/test_warmup.py (3)

1-14: LGTM! Clean test setup.

The imports and mock initialization approach are appropriate for unit testing the warmup functionality in isolation.


60-90: LGTM! Thorough initialization testing.

Both tests correctly verify warmup_inner_steps initialization from config and the default fallback behavior.


195-358: Excellent test coverage of edge cases and scenarios.

These tests thoroughly validate:

  • Counter increment logic and boundaries
  • Restart/re-initialization behavior
  • Disabled state (warmup_inner_steps=0)
  • Multi-optimizer support (muon)
  • Floating-point precision preservation
  • Multiple parameter groups with different LRs

The comprehensive coverage provides strong confidence in the warmup implementation.

@joellidin joellidin merged commit 81e37e0 into dev Nov 9, 2025
7 of 8 checks passed
@joellidin joellidin deleted the feat/warmup branch November 9, 2025 13:07
@coderabbitai coderabbitai bot mentioned this pull request Nov 9, 2025
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants