Skip to content

feat: support pytorch-optimizer training optimizers#1006

Open
mfazrinizar wants to merge 15 commits intoroboflow:developfrom
mfazrinizar:feat/pytorch-optimizer-train
Open

feat: support pytorch-optimizer training optimizers#1006
mfazrinizar wants to merge 15 commits intoroboflow:developfrom
mfazrinizar:feat/pytorch-optimizer-train

Conversation

@mfazrinizar
Copy link
Copy Markdown
Contributor

What does this PR do?

This PR adds configurable optimizer support to RF-DETR training while keeping the existing AdamW behavior as the default. optimizer="adamw" continues to use RF-DETR's built-in fused torch.optim.AdamW path, and non-default optimizer names are resolved through pytorch-optimizer, for example optimizer="lion" or optimizer="pytorch_optimizer:adamw".

The implementation preserves RF-DETR parameter groups and layer-wise learning rates by building optimizers from the existing get_param_dict() output. It also adds optimizer_kwargs so users can pass optimizer-specific arguments such as AdamW betas or Lion weight_decouple without overriding RF-DETR-managed values like params, lr, weight_decay, or fused.

It also wires the options through the public training API, keeps PTL-only optimizer config out of the legacy namespace, adds focused tests, documents the new parameters, and includes pytorch-optimizer in the training extra.

Related Issue(s): Closes #89

Type of Change

  • New feature (non-breaking change that adds functionality)

Testing

  • I have tested this change locally
  • I have added/updated tests for this change

Test details:

Added and updated tests cover:

  • TrainConfig defaults, accepted values, empty-name rejection, and reserved optimizer_kwargs rejection
  • default AdamW behavior remaining backward compatible
  • optimizer_kwargs forwarding to RF-DETR's default AdamW optimizer
  • custom optimizer loading through pytorch-optimizer
  • RF-DETR parameter groups and layer-wise learning rates being preserved for custom optimizers
  • custom optimizer kwargs forwarding
  • explicit pytorch_optimizer: prefix support, including opting into the external AdamW implementation
  • missing dependency and invalid optimizer-name error handling
  • real pytorch-optimizer Lion construction smoke test
  • RFDETR.train(optimizer=..., optimizer_kwargs=...) forwarding into get_train_config()
  • optimizer-only config staying out of the legacy namespace

Local validation was run with PYTHONPATH=src in the rfdetr conda environment via the environment Python executable.

Real optimizer smoke tests passed for:

  • direct pytorch-optimizer Lion construction and one optimizer step with a PyTorch parameter group
  • RF-DETR's _build_pytorch_optimizer() constructing Lion with RF-DETR-style parameter groups and running one optimizer step with optimizer_kwargs={"weight_decouple": True}

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code where necessary, particularly in hard-to-understand areas
  • My changes generate no new warnings or errors
  • I have updated the documentation accordingly (if applicable)

Additional Context

The implementation intentionally uses pytorch_optimizer.load_optimizer() instead of create_optimizer() so RF-DETR keeps its existing parameter grouping, backbone learning rates, and scheduler behavior. Some specialized optimizers may still require optimizer-specific kwargs; initialization errors include a hint to check the selected optimizer's supported arguments.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 84.68468% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 80%. Comparing base (2f81ac0) to head (1c4c735).

❌ Your patch check has failed because the patch coverage (85%) is below the target coverage (95%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (80%) is below the target coverage (95%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop   #1006    +/-   ##
========================================
  Coverage       80%     80%            
========================================
  Files          100     100            
  Lines         8457    8564   +107     
========================================
+ Hits          6784    6875    +91     
- Misses        1673    1689    +16     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Alarmod
Copy link
Copy Markdown
Contributor

Alarmod commented Apr 28, 2026

I tested Yolo26 with AdamW and MuSGD
ultralytics/ultralytics#23789

from ultralytics.optim.muon import MuSGD
# Initialize optimizer with specific parameter groups
# Use 'use_muon=True' only for 2D+ tensors for the hybrid effect
optimizer = MuSGD(model.parameters(), lr=0.01, momentum=0.9)

It will be compatible with external optimizer?

@mfazrinizar
Copy link
Copy Markdown
Contributor Author

@Alarmod it's compatible with external optimizers provided by pytorch-optimizer that accept normal PyTorch param groups. However, it's not automatically compatible with arbitrary imported optimizer classes like Ultralytics MuSGD yet. That could be a follow-up with a small adapter/registry and dedicated tests, interesting to support it. Already working on it.

@Borda Borda requested a review from Copilot April 28, 2026 20:42
@Borda Borda added the enhancement New feature or request label Apr 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds configurable optimizer selection to RF-DETR training while preserving the existing fused torch.optim.AdamW behavior by default, and enabling external optimizers via pytorch-optimizer or importable Python optimizer classes.

Changes:

  • Extend TrainConfig with optimizer, optimizer_kwargs, and rank-based optimizer_param_group_overrides validation.
  • Update RFDETRModelModule.configure_optimizers() to build either the default fused AdamW, a pytorch-optimizer optimizer, or a python: imported optimizer while preserving RF-DETR param groups / LRs.
  • Add tests and documentation covering new optimizer configuration and ensuring optimizer-only fields don’t leak into the legacy namespace.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/rfdetr/training/module_model.py Implements provider parsing, optimizer loading/instantiation, param-group overrides, and updated optimizer construction in configure_optimizers().
src/rfdetr/config.py Adds OptimizerParamGroupOverride model and new TrainConfig fields + validators for optimizer configuration.
src/rfdetr/_namespace.py Ensures optimizer-only config stays out of the legacy namespace mapping.
pyproject.toml Adds pytorch-optimizer to the train extra.
tests/training/test_module_model.py Adds unit tests for optimizer selection, kwargs forwarding, param-group preservation, overrides, and error handling.
tests/training/test_detr_shim.py Verifies RFDETR.train(...) forwards optimizer config through to get_train_config().
tests/training/test_args.py Verifies optimizer fields are not forwarded into the legacy namespace.
tests/models/test_config.py Adds config default/validation tests for new optimizer-related fields.
docs/learn/train/training-parameters.md Documents new optimizer parameters and provides usage examples.
docs/learn/train/customization.md Updates lifecycle hook documentation to reflect configurable optimizer + overrides.

Comment thread src/rfdetr/training/module_model.py Outdated
Comment thread src/rfdetr/training/module_model.py Outdated
Comment thread src/rfdetr/training/module_model.py Outdated
Borda and others added 11 commits April 28, 2026 23:32
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The python:/import: provider called importlib.import_module() on a
dotted path originating from TrainConfig.optimizer — an unconstrained
import-time code-execution surface reachable from LightningCLI YAML
configs. Removed _load_python_optimizer and _build_python_optimizer;
collapsed configure_optimizers to the two-branch dispatch (built-in
fused AdamW / pytorch-optimizer). The pytorch_optimizer: provider and
OptimizerParamGroupOverride are unaffected.

- Remove import importlib (now unused)
- Remove _load_python_optimizer(), _build_python_optimizer()
- Update _split_optimizer_name() — only adamw and pytorch_optimizer: valid
- Retarget rank-aware override tests to mock _load_pytorch_optimizer

[resolve roboflow#4] /review finding by foundry:sw-engineer + Codex co-review (report: .temp/output-review-develop-2026-04-28.md):
"C1: _load_python_optimizer performs unbounded importlib.import_module() on config-string path"

---
Co-authored-by: Claude Code <noreply@anthropic.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
The API call site uses load_optimizer() and get_supported_optimizers()
from the 3.x series. An unpinned dep risks a silent breaking change on
the next major release.

[resolve roboflow#5] /review finding by foundry:linting-expert (report: .temp/output-review-develop-2026-04-28.md):
"S1: pytorch-optimizer unpinned — fast release cadence risks install breakage"

---
Co-authored-by: Claude Code <noreply@anthropic.com>
The fused CUDA kernel path only applies to optimizer='adamw'. Users who
set model_config.fused_optimizer=True and switch to a pytorch-optimizer
optimizer would previously get no feedback. Add a logger.info message
in the non-default branch so the configuration mismatch is visible.

[resolve roboflow#7] /review finding by foundry:sw-engineer (report: .temp/output-review-develop-2026-04-28.md):
"H2: fused=True silently dropped for non-AdamW — add logger.info"

---
Co-authored-by: Claude Code <noreply@anthropic.com>
Add a '### Custom optimizer' section to customization.md documenting
the optimizer=/optimizer_kwargs= TrainConfig fields and linking forward
to training-parameters.md. Add a !!! warning admonition listing SAM,
Lookahead, Ranger, PCGrad, and GradientCentralization as incompatible
with PTL automatic_optimization=True and explaining why.

[resolve roboflow#9] /review finding by foundry:sw-engineer (report: .temp/output-review-develop-2026-04-28.md):
"H4: Wrapping optimizers (SAM/Lookahead) incompatible with PTL; add docs warning"

---
Co-authored-by: Claude Code <noreply@anthropic.com>
The pytorch-optimizer path already wraps TypeError from _instantiate_optimizer
with a hint about optimizer_kwargs. The built-in AdamW branch was unguarded,
so unknown kwargs (e.g. weight_decouple passed to torch AdamW) surfaced as a
bare TypeError with no RF-DETR context. Wrap it consistently.

[resolve roboflow#21] /review finding by foundry:sw-engineer (report: .temp/output-review-develop-2026-04-28.md):
"M5: Built-in AdamW path lacks the _instantiate_optimizer-style TypeError context"

---
Co-authored-by: Claude Code <noreply@anthropic.com>
- test_detr_shim.py: replace python:external_optimizers.HybridOptimizer
  with pytorch_optimizer:lion (valid provider after security removal)
- test_module_model.py: rename unused result/_call_kwargs variables (lint)
- module_model.py: getattr(model, 'num_classes') → direct attr access; fix
  unicode × in comment (lint)

---
Co-authored-by: Claude Code <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support arbitrary optimizer from pytorch-optimizer

4 participants