-
Notifications
You must be signed in to change notification settings - Fork 49
hparams/update bootstrap version #645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughUpdated hyperparameters in Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Trainer as Trainer
participant HP as hparams/hparams.json
participant Scheduler as AdamW Scheduler
note right of HP #D3E4CD: Config values\n(outer_learning_rate,\ncheckpoint_init_*, flatten params)
Trainer->>HP: load hyperparameters
Trainer->>Scheduler: init with hyperparams
Scheduler-->>Trainer: scheduler ready
alt training step < flatten_start_step
Trainer->>Scheduler: step(n) [normal lr schedule]
else training step between flatten range
Trainer->>Scheduler: step(n) [apply flattening: duration=1000]
else after flatten range
Trainer->>Scheduler: step(n) [resume normal schedule]
end
Note over Scheduler: checkpoint_init_version/window are\nused during checkpoint init logic (external to scheduler)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. @@ Coverage Diff @@
## dev #645 +/- ##
=======================================
Coverage 57.91% 57.91%
=======================================
Files 27 27
Lines 4890 4890
=======================================
Hits 2832 2832
Misses 2058 2058
🚀 New features to boost your workflow:
|
Update checkpoint initialization to version 2.1.12 and window 59057 to bootstrap from the latest stable checkpoint. Configure LR flattening to start at step 2650 with a 1000-step duration for improved training stability. - Update checkpoint_init_version from 2.1.9 to 2.1.12 - Update checkpoint_init_window from 58181 to 59057 - Set flatten_start_step to 2650 - Set flatten_duration to 1000
f4b8c63 to
e7b6d97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
hparams/hparams.json (1)
1-114: PR description incomplete—does not capture all hyperparameter changes.The PR objectives state only "Update bootstrap version to 2.1.12 (hparams). Bump run version." but the actual changes include four additional hyperparameter modifications:
- outer_learning_rate reduction
- checkpoint_init_window advancement
- Two new adamw scheduler flattening parameters
This discrepancy raises concerns about whether all changes were intentional or if the PR description was incomplete. Update the PR description to fully document the rationale for each change, particularly the learning rate adjustment and scheduler tuning.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
hparams/hparams.json(3 hunks)src/tplr/__init__.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/tplr/init.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
🔇 Additional comments (2)
hparams/hparams.json (2)
46-47: Verify checkpoint initialization version and window alignment.The
checkpoint_init_versionupdate from "2.1.9" to "2.1.12" aligns with the PR objective, but the simultaneous advancement ofcheckpoint_init_windowfrom 58181 to 59057 is not documented. Confirm:
- Whether this window advance is a necessary migration step alongside the version bump
- Whether existing checkpoints at version 2.1.12 and window 59057 exist and are compatible with the current model state
66-67: Verify scheduler flattening parameters are tested and appropriate.The adamw scheduler now includes
flatten_start_step: 2650andflatten_duration: 1000. These are new parameters that fundamentally alter optimizer behavior by flattening the learning rate schedule starting at step 2650 for 1000 steps. Confirm:
- These values have been tuned/validated for this model configuration
- The flattening behavior is appropriate for training stability and convergence
- muon scheduler remains unaffected (flatten_start_step: null, flatten_duration: 0) for a reason
| "batch_size": 192, | ||
| "inner_steps": 30, | ||
| "outer_learning_rate": 0.7, | ||
| "outer_learning_rate": 0.6, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Undocumented learning rate reduction—verify this is intentional.
The outer learning rate has been reduced by ~14% (0.7 → 0.6) but is not mentioned in the PR description. This is a significant hyperparameter change that affects training dynamics. Confirm this change was intentional and has been validated.
🏁 Script executed:
#!/bin/bash
# Check the current state and git diff for hparams/hparams.json
echo "=== Current hparams/hparams.json (lines 8-12) ==="
sed -n '8,12p' hparams/hparams.json
echo -e "\n=== Full git diff for hparams/hparams.json ==="
git diff HEAD^ hparams/hparams.json 2>/dev/null || git diff hparams/hparams.json 2>/dev/null || echo "Git diff unavailable"
echo -e "\n=== Commit message ==="
git log -1 --pretty=format:"%B" 2>/dev/null || echo "Commit message unavailable"
echo -e "\n=== Search for learning rate mentions in documentation ==="
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "README*" \) -exec grep -l "learning.rate\|outer_learning_rate" {} \; 2>/dev/null | head -5Length of output: 378
🏁 Script executed:
#!/bin/bash
# Search for any documentation about hyperparameter changes or tuning rationale
echo "=== Search for learning rate documentation in codebase ==="
rg -i "outer_learning_rate|learning.rate.*0\.6|learning.rate.*0\.7" -A 2 -B 2 --type=md --type=txt 2>/dev/null | head -30
echo -e "\n=== Check if there's a CHANGELOG or similar ==="
find . -type f \( -name "CHANGELOG*" -o -name "HISTORY*" -o -name "CHANGES*" \) -exec head -50 {} \;Length of output: 390
🏁 Script executed:
#!/bin/bash
# Check git history for outer_learning_rate value changes
echo "=== Git log for hparams/hparams.json (last 10 commits) ==="
git log --oneline -10 -- hparams/hparams.json 2>/dev/null || echo "No history available"
echo -e "\n=== Check outer_learning_rate value in previous commits ==="
git show HEAD~1:hparams/hparams.json 2>/dev/null | sed -n '8,12p' || echo "Previous version unavailable"
echo -e "\n=== Check if there are any staged or unstaged changes ==="
git status hparams/hparams.json 2>/dev/null || echo "Status unavailable"
echo -e "\n=== Show diff with specific context for lines around outer_learning_rate ==="
git diff --no-index /dev/null hparams/hparams.json 2>/dev/null | grep -A 3 -B 3 "outer_learning_rate" | head -20Length of output: 1181
🏁 Script executed:
#!/bin/bash
# Find the merge base and compare
echo "=== Current commit info ==="
git log -1 --oneline HEAD
echo -e "\n=== Find merge base with main/master ==="
git merge-base HEAD origin/main 2>/dev/null || git merge-base HEAD origin/master 2>/dev/null || echo "Merge base not found"
echo -e "\n=== Diff between merge base and current HEAD for hparams.json ==="
MERGE_BASE=$(git merge-base HEAD origin/main 2>/dev/null || git merge-base HEAD origin/master 2>/dev/null)
if [ ! -z "$MERGE_BASE" ]; then
git diff $MERGE_BASE HEAD -- hparams/hparams.json | head -100
else
echo "Cannot determine merge base"
fi
echo -e "\n=== Check all changes in current PR ==="
git log --oneline $(git merge-base HEAD origin/main 2>/dev/null || git merge-base HEAD origin/master 2>/dev/null)..HEAD 2>/dev/null | head -20Length of output: 2913
Learning rate reduction is intentional but undocumented—update commit message or PR description to explain all hyperparameter changes.
The reduction from 0.7 to 0.6 is confirmed and is part of an intentional feature (feat/plateu lr) with corresponding test coverage and scheduler updates (new flatten_start_step and flatten_duration parameters). However, the commit message "Bump run version" documents none of these changes. Update the commit message or PR description to explicitly list all hyperparameter modifications and their rationale, including the learning rate reduction, new scheduler flattening behavior, checkpoint version bump, and BMA threshold changes.
🤖 Prompt for AI Agents
In hparams/hparams.json around line 10, the commit message "Bump run version"
fails to document intentional hyperparameter changes; update the commit message
or the PR description to explicitly list each change and rationale: note the
outer_learning_rate reduction from 0.7 to 0.6, the addition of the scheduler
parameters flatten_start_step and flatten_duration with brief explanation of
their plateau/flattening behavior, the checkpoint version bump, and the BMA
threshold adjustments; keep the entry concise, reference affected files/lines,
and include why each change was made and any impact on tests or runtime.
Description
Related Issue(s)
Type of Change
Branch Naming
Commit Messages
Code Quality
Testing
Documentation
If this is a breaking change
Screenshots/Examples
Additional Notes
Summary by CodeRabbit