Skip to content

Conversation

@joellidin
Copy link
Collaborator

@joellidin joellidin commented Nov 3, 2025

  • (hparams) Update bootstrap version to 2.1.12
  • Bump run version

Description

Related Issue(s)

  • Closes #[issue number]

Type of Change

  • Feature (adding new functionality)
  • Fix (resolving a bug or issue)
  • Docs (documentation updates)
  • Refactor (code changes that don't affect functionality)
  • Maintenance (dependency updates or other maintenance)
  • Tests (adding or improving tests)
  • Breaking change (fix or feature with incompatible API changes)
  • Other: _____

Branch Naming

  • My branch follows the project's naming convention (e.g., feature/add-new-capability)

Commit Messages

  • My commits are small, atomic, and have proper commit messages
  • Commit messages are in imperative mood with a capitalized summary under 50 chars

Code Quality

  • I've performed a self-review of my code
  • I've added appropriate docstrings following the project's conventions
  • I've added proper logging where necessary (without trailing periods)
  • I've applied linting and formatting with Ruff
  • My code generates no new warnings

Testing

  • I've added tests for new functionality or bug fixes
  • All tests pass locally with my changes
  • Test coverage has not decreased

Documentation

  • I've updated documentation to reflect my changes
  • I've updated comments in hard-to-understand areas

If this is a breaking change

Screenshots/Examples

Additional Notes

Summary by CodeRabbit

  • Chores
    • Released version 2.1.13.
    • Updated default training hyperparameters (outer learning rate reduced) and checkpoint initialization parameters.
    • Adjusted optimizer scheduler behavior to enable a flattening phase earlier and for a limited duration.

@coderabbitai
Copy link

coderabbitai bot commented Nov 3, 2025

Walkthrough

Updated hyperparameters in hparams/hparams.json (outer_learning_rate, checkpoint_init_version/window, and adamw scheduler flattening settings). Bumped package version in src/tplr/__init__.py from "2.1.12" to "2.1.13".

Changes

Cohort / File(s) Summary
Hyperparameter Configuration
hparams/hparams.json
outer_learning_rate changed 0.7 → 0.6; checkpoint_init_version changed "2.1.9""2.1.12"; checkpoint_init_window changed 5818159057; in optimizer.scheduler.adamw: flatten_start_step set to 2650, flatten_duration set to 1000.
Package Version
src/tplr/__init__.py
Bumped __version__ from "2.1.12""2.1.13".

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Trainer as Trainer
    participant HP as hparams/hparams.json
    participant Scheduler as AdamW Scheduler
    note right of HP #D3E4CD: Config values\n(outer_learning_rate,\ncheckpoint_init_*, flatten params)
    Trainer->>HP: load hyperparameters
    Trainer->>Scheduler: init with hyperparams
    Scheduler-->>Trainer: scheduler ready
    alt training step < flatten_start_step
        Trainer->>Scheduler: step(n) [normal lr schedule]
    else training step between flatten range
        Trainer->>Scheduler: step(n) [apply flattening: duration=1000]
    else after flatten range
        Trainer->>Scheduler: step(n) [resume normal schedule]
    end
    Note over Scheduler: checkpoint_init_version/window are\nused during checkpoint init logic (external to scheduler)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Pay attention to runtime impact of changing flatten_start_step/flatten_duration and outer_learning_rate.
  • Confirm checkpoint_init_version/checkpoint_init_window align with checkpoint loading code expectations.

Possibly related PRs

Poem

🐰 A hop, a tweak, a learning rate low,
Versions nudged forward, checkpoints set to go.
Flattening starts where steps align,
Small changes stitched in tidy design —
I nibble bugs and bounce, all aglow. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is largely incomplete. While it provides two bullet points summarizing the changes, the author has not filled in the essential template sections such as a detailed description of what was changed and why, related issues, type of change classification, or any verification of checklist items. All checkboxes remain unchecked, and most template sections contain only placeholder comments with no actual content provided by the author. Complete the pull request description by adding: (1) a detailed description explaining what changes were made and why they are necessary; (2) classification under 'Type of Change' (appears to be 'Maintenance'); (3) verification of applicable checklist items like 'Branch Naming', 'Commit Messages', and 'Code Quality'; (4) confirmation that tests pass and documentation is updated if relevant. At minimum, provide substantive content for the Description and Type of Change sections rather than leaving them blank.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title 'hparams/update bootstrap version' clearly and concisely summarizes the primary changes in the changeset. It indicates that the main focus is updating the bootstrap version in the hyperparameters file, which aligns with the actual changes (checkpoint_init_version bump from 2.1.9 to 2.1.12) and the package version bump to 2.1.13. The title is specific enough to convey meaningful information about the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hparams/update-bootstrap-version

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Impacted file tree graph

@@           Coverage Diff           @@
##              dev     #645   +/-   ##
=======================================
  Coverage   57.91%   57.91%           
=======================================
  Files          27       27           
  Lines        4890     4890           
=======================================
  Hits         2832     2832           
  Misses       2058     2058           
Files with missing lines Coverage Δ
src/tplr/__init__.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Update checkpoint initialization to version 2.1.12 and window 59057 to
bootstrap from the latest stable checkpoint. Configure LR flattening to
start at step 2650 with a 1000-step duration for improved training
stability.

- Update checkpoint_init_version from 2.1.9 to 2.1.12
- Update checkpoint_init_window from 58181 to 59057
- Set flatten_start_step to 2650
- Set flatten_duration to 1000
@joellidin joellidin force-pushed the hparams/update-bootstrap-version branch from f4b8c63 to e7b6d97 Compare November 3, 2025 14:10
@joellidin joellidin merged commit 06aff26 into dev Nov 3, 2025
5 of 6 checks passed
@joellidin joellidin deleted the hparams/update-bootstrap-version branch November 3, 2025 14:12
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
hparams/hparams.json (1)

1-114: PR description incomplete—does not capture all hyperparameter changes.

The PR objectives state only "Update bootstrap version to 2.1.12 (hparams). Bump run version." but the actual changes include four additional hyperparameter modifications:

  • outer_learning_rate reduction
  • checkpoint_init_window advancement
  • Two new adamw scheduler flattening parameters

This discrepancy raises concerns about whether all changes were intentional or if the PR description was incomplete. Update the PR description to fully document the rationale for each change, particularly the learning rate adjustment and scheduler tuning.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4b8c63 and e7b6d97.

📒 Files selected for processing (2)
  • hparams/hparams.json (3 hunks)
  • src/tplr/__init__.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/tplr/init.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
🔇 Additional comments (2)
hparams/hparams.json (2)

46-47: Verify checkpoint initialization version and window alignment.

The checkpoint_init_version update from "2.1.9" to "2.1.12" aligns with the PR objective, but the simultaneous advancement of checkpoint_init_window from 58181 to 59057 is not documented. Confirm:

  • Whether this window advance is a necessary migration step alongside the version bump
  • Whether existing checkpoints at version 2.1.12 and window 59057 exist and are compatible with the current model state

66-67: Verify scheduler flattening parameters are tested and appropriate.

The adamw scheduler now includes flatten_start_step: 2650 and flatten_duration: 1000. These are new parameters that fundamentally alter optimizer behavior by flattening the learning rate schedule starting at step 2650 for 1000 steps. Confirm:

  • These values have been tuned/validated for this model configuration
  • The flattening behavior is appropriate for training stability and convergence
  • muon scheduler remains unaffected (flatten_start_step: null, flatten_duration: 0) for a reason

"batch_size": 192,
"inner_steps": 30,
"outer_learning_rate": 0.7,
"outer_learning_rate": 0.6,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Undocumented learning rate reduction—verify this is intentional.

The outer learning rate has been reduced by ~14% (0.7 → 0.6) but is not mentioned in the PR description. This is a significant hyperparameter change that affects training dynamics. Confirm this change was intentional and has been validated.


🏁 Script executed:

#!/bin/bash
# Check the current state and git diff for hparams/hparams.json
echo "=== Current hparams/hparams.json (lines 8-12) ==="
sed -n '8,12p' hparams/hparams.json

echo -e "\n=== Full git diff for hparams/hparams.json ==="
git diff HEAD^ hparams/hparams.json 2>/dev/null || git diff hparams/hparams.json 2>/dev/null || echo "Git diff unavailable"

echo -e "\n=== Commit message ==="
git log -1 --pretty=format:"%B" 2>/dev/null || echo "Commit message unavailable"

echo -e "\n=== Search for learning rate mentions in documentation ==="
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "README*" \) -exec grep -l "learning.rate\|outer_learning_rate" {} \; 2>/dev/null | head -5

Length of output: 378


🏁 Script executed:

#!/bin/bash
# Search for any documentation about hyperparameter changes or tuning rationale
echo "=== Search for learning rate documentation in codebase ==="
rg -i "outer_learning_rate|learning.rate.*0\.6|learning.rate.*0\.7" -A 2 -B 2 --type=md --type=txt 2>/dev/null | head -30

echo -e "\n=== Check if there's a CHANGELOG or similar ==="
find . -type f \( -name "CHANGELOG*" -o -name "HISTORY*" -o -name "CHANGES*" \) -exec head -50 {} \;

Length of output: 390


🏁 Script executed:

#!/bin/bash
# Check git history for outer_learning_rate value changes
echo "=== Git log for hparams/hparams.json (last 10 commits) ==="
git log --oneline -10 -- hparams/hparams.json 2>/dev/null || echo "No history available"

echo -e "\n=== Check outer_learning_rate value in previous commits ==="
git show HEAD~1:hparams/hparams.json 2>/dev/null | sed -n '8,12p' || echo "Previous version unavailable"

echo -e "\n=== Check if there are any staged or unstaged changes ==="
git status hparams/hparams.json 2>/dev/null || echo "Status unavailable"

echo -e "\n=== Show diff with specific context for lines around outer_learning_rate ==="
git diff --no-index /dev/null hparams/hparams.json 2>/dev/null | grep -A 3 -B 3 "outer_learning_rate" | head -20

Length of output: 1181


🏁 Script executed:

#!/bin/bash
# Find the merge base and compare
echo "=== Current commit info ==="
git log -1 --oneline HEAD

echo -e "\n=== Find merge base with main/master ==="
git merge-base HEAD origin/main 2>/dev/null || git merge-base HEAD origin/master 2>/dev/null || echo "Merge base not found"

echo -e "\n=== Diff between merge base and current HEAD for hparams.json ==="
MERGE_BASE=$(git merge-base HEAD origin/main 2>/dev/null || git merge-base HEAD origin/master 2>/dev/null)
if [ ! -z "$MERGE_BASE" ]; then
  git diff $MERGE_BASE HEAD -- hparams/hparams.json | head -100
else
  echo "Cannot determine merge base"
fi

echo -e "\n=== Check all changes in current PR ==="
git log --oneline $(git merge-base HEAD origin/main 2>/dev/null || git merge-base HEAD origin/master 2>/dev/null)..HEAD 2>/dev/null | head -20

Length of output: 2913


Learning rate reduction is intentional but undocumented—update commit message or PR description to explain all hyperparameter changes.

The reduction from 0.7 to 0.6 is confirmed and is part of an intentional feature (feat/plateu lr) with corresponding test coverage and scheduler updates (new flatten_start_step and flatten_duration parameters). However, the commit message "Bump run version" documents none of these changes. Update the commit message or PR description to explicitly list all hyperparameter modifications and their rationale, including the learning rate reduction, new scheduler flattening behavior, checkpoint version bump, and BMA threshold changes.

🤖 Prompt for AI Agents
In hparams/hparams.json around line 10, the commit message "Bump run version"
fails to document intentional hyperparameter changes; update the commit message
or the PR description to explicitly list each change and rationale: note the
outer_learning_rate reduction from 0.7 to 0.6, the addition of the scheduler
parameters flatten_start_step and flatten_duration with brief explanation of
their plateau/flattening behavior, the checkpoint version bump, and the BMA
threshold adjustments; keep the entry concise, reference affected files/lines,
and include why each change was made and any impact on tests or runtime.

This was referenced Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants