-
Notifications
You must be signed in to change notification settings - Fork 49
v2.1.13 #646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reduce the sync score tolerance from 5 steps to 3 steps behind, creating a steeper penalty curve that encourages better miner synchronization across the network. - Update sync_score formula cap from 5.0 to 3.0 - Adjust sync_max_steps_behind threshold from 3 to 2 in hparams - Update formula comment to reflect new calculation
Refactor negative evaluation penalty logic to apply slashing and exclusion AFTER all evaluations complete, rather than inline during the evaluation loop. This ensures consistent treatment based on the full window of evaluated UIDs. Add should_skip_negative_penalty() to skip penalties when the 5th-ranked UID in the current window has negative gradient score, indicating overall poor performance across the network. - Add should_skip_negative_penalty() method to check 5th-ranked UID - Refactor track_negative_evaluation() to only track history and consecutive counts, removing inline penalty application - Add apply_negative_evaluation_penalties() to apply all penalties after evaluations complete with consistent skip logic - Update main evaluation loop to call penalty application after all evaluations finish Previously, penalties were applied as each UID was evaluated, causing inconsistent behavior where early UIDs saw incomplete window data. Now penalties are applied consistently when the full picture of gradient scores is available.
Add comprehensive test coverage for new 5th-ranked UID penalty skipping logic. Tests verify that slashing and exclusion penalties are correctly skipped when the 5th-ranked UID has a negative score (indicating overall poor network performance). - Add 12 new test cases for should_skip_negative_penalty logic - Test various scenarios: <5 UIDs, exactly 5, >5 UIDs - Test edge cases: no window scores, all positive scores - Update existing test to use two-step penalty architecture - Verify slashing skipped when 5th-ranked is negative - Verify exclusion skipped when 5th-ranked is negative - All 21 tests pass
Remove learning rate scaling from error_feedback.add_() operation to prevent applying LR twice. The learning rate is already applied in the outer step, so scaling grad_full again here was incorrect. - Remove unused lr variable from prepare_gradient_dict - Change error_feedback.add_(grad_full, alpha=lr) to use default alpha=1.0 by removing the alpha parameter
Add ability to freeze inner learning rate at its current value for a specified number of outer steps (windows). The flatten window is configured in hparams using flatten_start_step and flatten_duration (both in outer steps/windows). - Add flatten_start_step and flatten_duration to hparams config - Add should_skip_scheduler_step() method to check flatten window - Skip scheduler.step() during flatten window (LR stays constant) - Track inner_scheduler_step_count to maintain position - Apply flatten logic in all scheduler step locations: - Main training loop - Window catch-up loop - Validator gather loop (simulates miner inner loop) - Initial checkpoint catch-up replay - Per-window catch-up replay Flatten window correctly handles catch-up scenarios by respecting the window during scheduler replay.
Update test fixtures to include the new inner_scheduler_step_count attribute and should_skip_scheduler_step method required by the LR flattening feature. - Add inner_scheduler_step_count = 0 to mock_instance fixture - Add should_skip_scheduler_step mock returning False by default - Add inner_scheduler_step_count to test validator creation - Ensure checkpoint save/load tests work with new state fields
Add 12 test cases covering all aspects of the LR flattening feature introduced in previous commits. Basic functionality tests (7): - Test disabled states (None, zero duration) - Test window boundaries (before, during, after) - Test optimizer compatibility (AdamW, Muon) - Test outer-to-inner step conversion accuracy Scheduler behavior tests (5): - Verify scheduler.step() not called during flatten - Verify scheduler.step() called before/after flatten - Test partial window overlaps at flatten boundary - Test step count persistence across windows All tests validate the feature works correctly with the existing trainer infrastructure.
Update two test cases in test_prepare_gradient_dict.py to reflect the removal of duplicate LR scaling in commit 5a74fa3
Prevent indefinite hangs during large file downloads by adding asyncio.wait_for timeouts at three critical levels: - Overall download_large_file call uses configurable timeout parameter from function argument - S3 get_object requests have 15 second timeout - Individual stream reads have 15 second timeout These timeouts ensure the download process will fail gracefully rather than hanging indefinitely when network issues occur.
Add binary moving average threshold to filter low-performing peers from final score calculations. Peers with BMA below the threshold receive a final score of 0. - Extract binary_moving_average into bma variable - Apply threshold only after warmup period completes - Track warmup using windows_since_start calculation - Add configurable hparams: bma_threshold (0.10) and bma_warmup_windows (10) The warmup period allows new validator runs to stabilize before applying the threshold, preventing premature peer filtering.
Update checkpoint initialization to version 2.1.12 and window 59057 to bootstrap from the latest stable checkpoint. Configure LR flattening to start at step 2650 with a 1000-step duration for improved training stability. - Update checkpoint_init_version from 2.1.9 to 2.1.12 - Update checkpoint_init_window from 58181 to 59057 - Set flatten_start_step to 2650 - Set flatten_duration to 1000
|
Warning Rate limit exceeded@joellidin has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 40 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (11)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (75.00%) is below the target coverage (85.00%). You can increase the patch coverage or adjust the target coverage. @@ Coverage Diff @@
## main #646 +/- ##
==========================================
- Coverage 57.92% 57.91% -0.01%
==========================================
Files 27 27
Lines 4886 4890 +4
==========================================
+ Hits 2830 2832 +2
- Misses 2056 2058 +2
🚀 New features to boost your workflow:
|
Description
Related Issue(s)
Type of Change
Branch Naming
Commit Messages
Code Quality
Testing
Documentation
If this is a breaking change
Screenshots/Examples
Additional Notes