v2.1.13 #646

joellidin · 2025-11-03T14:13:08Z

(validator) Tighten sync score penalty curve
(validator) Defer penalty for negative gradient
(tests) Add 5th-ranked penalty skip logic tests
(neurons) Remove duplicate LR scaling in gradient
(neurons) Add inner LR flatten window feature
(tests) Add LR flatten support to mock fixtures
(tests) Add comprehensive LR flattening tests
(tests) Fix tests after LR scaling removal
(comms) Add timeouts to large file downloads
(validator) Add BMA threshold with warmup period
(hparams) Update bootstrap version to 2.1.12
Bump run version

Description

Related Issue(s)

Closes #[issue number]

Type of Change

Feature (adding new functionality)
Fix (resolving a bug or issue)
Docs (documentation updates)
Refactor (code changes that don't affect functionality)
Maintenance (dependency updates or other maintenance)
Tests (adding or improving tests)
Breaking change (fix or feature with incompatible API changes)
Other: _____

Branch Naming

My branch follows the project's naming convention (e.g., feature/add-new-capability)

Commit Messages

My commits are small, atomic, and have proper commit messages
Commit messages are in imperative mood with a capitalized summary under 50 chars

Code Quality

I've performed a self-review of my code
I've added appropriate docstrings following the project's conventions
I've added proper logging where necessary (without trailing periods)
I've applied linting and formatting with Ruff
My code generates no new warnings

Testing

I've added tests for new functionality or bug fixes
All tests pass locally with my changes
Test coverage has not decreased

Documentation

I've updated documentation to reflect my changes
I've updated comments in hard-to-understand areas

If this is a breaking change

Screenshots/Examples

Additional Notes

Reduce the sync score tolerance from 5 steps to 3 steps behind, creating a steeper penalty curve that encourages better miner synchronization across the network. - Update sync_score formula cap from 5.0 to 3.0 - Adjust sync_max_steps_behind threshold from 3 to 2 in hparams - Update formula comment to reflect new calculation

Refactor negative evaluation penalty logic to apply slashing and exclusion AFTER all evaluations complete, rather than inline during the evaluation loop. This ensures consistent treatment based on the full window of evaluated UIDs. Add should_skip_negative_penalty() to skip penalties when the 5th-ranked UID in the current window has negative gradient score, indicating overall poor performance across the network. - Add should_skip_negative_penalty() method to check 5th-ranked UID - Refactor track_negative_evaluation() to only track history and consecutive counts, removing inline penalty application - Add apply_negative_evaluation_penalties() to apply all penalties after evaluations complete with consistent skip logic - Update main evaluation loop to call penalty application after all evaluations finish Previously, penalties were applied as each UID was evaluated, causing inconsistent behavior where early UIDs saw incomplete window data. Now penalties are applied consistently when the full picture of gradient scores is available.

Add comprehensive test coverage for new 5th-ranked UID penalty skipping logic. Tests verify that slashing and exclusion penalties are correctly skipped when the 5th-ranked UID has a negative score (indicating overall poor network performance). - Add 12 new test cases for should_skip_negative_penalty logic - Test various scenarios: <5 UIDs, exactly 5, >5 UIDs - Test edge cases: no window scores, all positive scores - Update existing test to use two-step penalty architecture - Verify slashing skipped when 5th-ranked is negative - Verify exclusion skipped when 5th-ranked is negative - All 21 tests pass

Remove learning rate scaling from error_feedback.add_() operation to prevent applying LR twice. The learning rate is already applied in the outer step, so scaling grad_full again here was incorrect. - Remove unused lr variable from prepare_gradient_dict - Change error_feedback.add_(grad_full, alpha=lr) to use default alpha=1.0 by removing the alpha parameter

Add ability to freeze inner learning rate at its current value for a specified number of outer steps (windows). The flatten window is configured in hparams using flatten_start_step and flatten_duration (both in outer steps/windows). - Add flatten_start_step and flatten_duration to hparams config - Add should_skip_scheduler_step() method to check flatten window - Skip scheduler.step() during flatten window (LR stays constant) - Track inner_scheduler_step_count to maintain position - Apply flatten logic in all scheduler step locations: - Main training loop - Window catch-up loop - Validator gather loop (simulates miner inner loop) - Initial checkpoint catch-up replay - Per-window catch-up replay Flatten window correctly handles catch-up scenarios by respecting the window during scheduler replay.

Update test fixtures to include the new inner_scheduler_step_count attribute and should_skip_scheduler_step method required by the LR flattening feature. - Add inner_scheduler_step_count = 0 to mock_instance fixture - Add should_skip_scheduler_step mock returning False by default - Add inner_scheduler_step_count to test validator creation - Ensure checkpoint save/load tests work with new state fields

Add 12 test cases covering all aspects of the LR flattening feature introduced in previous commits. Basic functionality tests (7): - Test disabled states (None, zero duration) - Test window boundaries (before, during, after) - Test optimizer compatibility (AdamW, Muon) - Test outer-to-inner step conversion accuracy Scheduler behavior tests (5): - Verify scheduler.step() not called during flatten - Verify scheduler.step() called before/after flatten - Test partial window overlaps at flatten boundary - Test step count persistence across windows All tests validate the feature works correctly with the existing trainer infrastructure.

Update two test cases in test_prepare_gradient_dict.py to reflect the removal of duplicate LR scaling in commit 5a74fa3

Prevent indefinite hangs during large file downloads by adding asyncio.wait_for timeouts at three critical levels: - Overall download_large_file call uses configurable timeout parameter from function argument - S3 get_object requests have 15 second timeout - Individual stream reads have 15 second timeout These timeouts ensure the download process will fail gracefully rather than hanging indefinitely when network issues occur.

Add binary moving average threshold to filter low-performing peers from final score calculations. Peers with BMA below the threshold receive a final score of 0. - Extract binary_moving_average into bma variable - Apply threshold only after warmup period completes - Track warmup using windows_since_start calculation - Add configurable hparams: bma_threshold (0.10) and bma_warmup_windows (10) The warmup period allows new validator runs to stabilize before applying the threshold, preventing premature peer filtering.

Update checkpoint initialization to version 2.1.12 and window 59057 to bootstrap from the latest stable checkpoint. Configure LR flattening to start at step 2650 with a 1000-step duration for improved training stability. - Update checkpoint_init_version from 2.1.9 to 2.1.12 - Update checkpoint_init_window from 58181 to 59057 - Set flatten_start_step to 2650 - Set flatten_duration to 1000

coderabbitai · 2025-11-03T14:15:17Z

Warning

Rate limit exceeded

@joellidin has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 40 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 7641300 and 06aff26.

📒 Files selected for processing (11)

hparams/hparams.json (5 hunks)
neurons/trainer.py (5 hunks)
neurons/validator.py (8 hunks)
src/tplr/__init__.py (1 hunks)
src/tplr/comms.py (3 hunks)
src/tplr/neurons.py (3 hunks)
tests/test_checkpoint_fallback.py (2 hunks)
tests/test_prepare_gradient_dict.py (3 hunks)
tests/test_state_loading.py (1 hunks)
tests/unit/test_lr_flattening.py (1 hunks)
tests/unit/test_slashing.py (2 hunks)

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-11-03T14:15:36Z

Codecov Report

❌ Patch coverage is 75.00000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/tplr/comms.py	50.00%	2 Missing ⚠️
src/tplr/neurons.py	85.71%	1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (75.00%) is below the target coverage (85.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (57.91%) is below the target coverage (85.00%). You can increase the head coverage or adjust the target coverage.

@@            Coverage Diff             @@
##             main     #646      +/-   ##
==========================================
- Coverage   57.92%   57.91%   -0.01%     
==========================================
  Files          27       27              
  Lines        4886     4890       +4     
==========================================
+ Hits         2830     2832       +2     
- Misses       2056     2058       +2

Files with missing lines	Coverage Δ
src/tplr/__init__.py	`100.00% <100.00%> (ø)`
src/tplr/neurons.py	`77.20% <85.71%> (-0.06%)`	⬇️
src/tplr/comms.py	`65.08% <50.00%> (-0.06%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

joellidin added 19 commits October 21, 2025 08:47

(validator) Tighten sync score penalty curve (#639)

de9ef05

feat/update negative slash (#641)

efb8442

(neurons) Remove duplicate LR scaling in gradient (#618)

608f0c7

(tests) Fix tests after LR scaling removal

acde294

Update two test cases in test_prepare_gradient_dict.py to reflect the removal of duplicate LR scaling in commit 5a74fa3

feat/plateu lr (#642)

89af8ce

(comms) Add timeouts to large file downloads (#643)

904b877

(validator) Add BMA threshold with warmup period (#644)

4ec3593

Bump run version

e7b6d97

hparams/update bootstrap version (#645)

06aff26

joellidin merged commit eb311a8 into main Nov 3, 2025
6 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2.1.13 #646

v2.1.13 #646

Uh oh!

joellidin commented Nov 3, 2025

Uh oh!

coderabbitai bot commented Nov 3, 2025

Rate limit exceeded

Uh oh!

codecov bot commented Nov 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

v2.1.13 #646

v2.1.13 #646

Uh oh!

Conversation

joellidin commented Nov 3, 2025

Description

Related Issue(s)

Type of Change

Branch Naming

Commit Messages

Code Quality

Testing

Documentation

If this is a breaking change

Screenshots/Examples

Additional Notes

Uh oh!

coderabbitai bot commented Nov 3, 2025

Rate limit exceeded

Uh oh!

codecov bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Nov 3, 2025 •

edited

Loading