feat: support pytorch-optimizer training optimizers#1006
feat: support pytorch-optimizer training optimizers#1006mfazrinizar wants to merge 15 commits intoroboflow:developfrom
Conversation
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (85%) is below the target coverage (95%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #1006 +/- ##
========================================
Coverage 80% 80%
========================================
Files 100 100
Lines 8457 8564 +107
========================================
+ Hits 6784 6875 +91
- Misses 1673 1689 +16 🚀 New features to boost your workflow:
|
|
I tested Yolo26 with AdamW and MuSGD It will be compatible with external optimizer? |
|
@Alarmod it's compatible with external optimizers provided by pytorch-optimizer that accept normal PyTorch param groups. However, it's not automatically compatible with arbitrary imported optimizer classes like Ultralytics MuSGD yet. That could be a follow-up with a small adapter/registry and dedicated tests, interesting to support it. Already working on it. |
There was a problem hiding this comment.
Pull request overview
Adds configurable optimizer selection to RF-DETR training while preserving the existing fused torch.optim.AdamW behavior by default, and enabling external optimizers via pytorch-optimizer or importable Python optimizer classes.
Changes:
- Extend
TrainConfigwithoptimizer,optimizer_kwargs, and rank-basedoptimizer_param_group_overridesvalidation. - Update
RFDETRModelModule.configure_optimizers()to build either the default fused AdamW, apytorch-optimizeroptimizer, or apython:imported optimizer while preserving RF-DETR param groups / LRs. - Add tests and documentation covering new optimizer configuration and ensuring optimizer-only fields don’t leak into the legacy namespace.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/rfdetr/training/module_model.py |
Implements provider parsing, optimizer loading/instantiation, param-group overrides, and updated optimizer construction in configure_optimizers(). |
src/rfdetr/config.py |
Adds OptimizerParamGroupOverride model and new TrainConfig fields + validators for optimizer configuration. |
src/rfdetr/_namespace.py |
Ensures optimizer-only config stays out of the legacy namespace mapping. |
pyproject.toml |
Adds pytorch-optimizer to the train extra. |
tests/training/test_module_model.py |
Adds unit tests for optimizer selection, kwargs forwarding, param-group preservation, overrides, and error handling. |
tests/training/test_detr_shim.py |
Verifies RFDETR.train(...) forwards optimizer config through to get_train_config(). |
tests/training/test_args.py |
Verifies optimizer fields are not forwarded into the legacy namespace. |
tests/models/test_config.py |
Adds config default/validation tests for new optimizer-related fields. |
docs/learn/train/training-parameters.md |
Documents new optimizer parameters and provides usage examples. |
docs/learn/train/customization.md |
Updates lifecycle hook documentation to reflect configurable optimizer + overrides. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…zrinizar/rf-detr into feat/pytorch-optimizer-train
The python:/import: provider called importlib.import_module() on a dotted path originating from TrainConfig.optimizer — an unconstrained import-time code-execution surface reachable from LightningCLI YAML configs. Removed _load_python_optimizer and _build_python_optimizer; collapsed configure_optimizers to the two-branch dispatch (built-in fused AdamW / pytorch-optimizer). The pytorch_optimizer: provider and OptimizerParamGroupOverride are unaffected. - Remove import importlib (now unused) - Remove _load_python_optimizer(), _build_python_optimizer() - Update _split_optimizer_name() — only adamw and pytorch_optimizer: valid - Retarget rank-aware override tests to mock _load_pytorch_optimizer [resolve roboflow#4] /review finding by foundry:sw-engineer + Codex co-review (report: .temp/output-review-develop-2026-04-28.md): "C1: _load_python_optimizer performs unbounded importlib.import_module() on config-string path" --- Co-authored-by: Claude Code <noreply@anthropic.com> Co-authored-by: OpenAI Codex <codex@openai.com>
The API call site uses load_optimizer() and get_supported_optimizers() from the 3.x series. An unpinned dep risks a silent breaking change on the next major release. [resolve roboflow#5] /review finding by foundry:linting-expert (report: .temp/output-review-develop-2026-04-28.md): "S1: pytorch-optimizer unpinned — fast release cadence risks install breakage" --- Co-authored-by: Claude Code <noreply@anthropic.com>
The fused CUDA kernel path only applies to optimizer='adamw'. Users who set model_config.fused_optimizer=True and switch to a pytorch-optimizer optimizer would previously get no feedback. Add a logger.info message in the non-default branch so the configuration mismatch is visible. [resolve roboflow#7] /review finding by foundry:sw-engineer (report: .temp/output-review-develop-2026-04-28.md): "H2: fused=True silently dropped for non-AdamW — add logger.info" --- Co-authored-by: Claude Code <noreply@anthropic.com>
Add a '### Custom optimizer' section to customization.md documenting the optimizer=/optimizer_kwargs= TrainConfig fields and linking forward to training-parameters.md. Add a !!! warning admonition listing SAM, Lookahead, Ranger, PCGrad, and GradientCentralization as incompatible with PTL automatic_optimization=True and explaining why. [resolve roboflow#9] /review finding by foundry:sw-engineer (report: .temp/output-review-develop-2026-04-28.md): "H4: Wrapping optimizers (SAM/Lookahead) incompatible with PTL; add docs warning" --- Co-authored-by: Claude Code <noreply@anthropic.com>
The pytorch-optimizer path already wraps TypeError from _instantiate_optimizer with a hint about optimizer_kwargs. The built-in AdamW branch was unguarded, so unknown kwargs (e.g. weight_decouple passed to torch AdamW) surfaced as a bare TypeError with no RF-DETR context. Wrap it consistently. [resolve roboflow#21] /review finding by foundry:sw-engineer (report: .temp/output-review-develop-2026-04-28.md): "M5: Built-in AdamW path lacks the _instantiate_optimizer-style TypeError context" --- Co-authored-by: Claude Code <noreply@anthropic.com>
- test_detr_shim.py: replace python:external_optimizers.HybridOptimizer with pytorch_optimizer:lion (valid provider after security removal) - test_module_model.py: rename unused result/_call_kwargs variables (lint) - module_model.py: getattr(model, 'num_classes') → direct attr access; fix unicode × in comment (lint) --- Co-authored-by: Claude Code <noreply@anthropic.com>
What does this PR do?
This PR adds configurable optimizer support to RF-DETR training while keeping the existing AdamW behavior as the default.
optimizer="adamw"continues to use RF-DETR's built-in fusedtorch.optim.AdamWpath, and non-default optimizer names are resolved throughpytorch-optimizer, for exampleoptimizer="lion"oroptimizer="pytorch_optimizer:adamw".The implementation preserves RF-DETR parameter groups and layer-wise learning rates by building optimizers from the existing
get_param_dict()output. It also addsoptimizer_kwargsso users can pass optimizer-specific arguments such as AdamWbetasor Lionweight_decouplewithout overriding RF-DETR-managed values likeparams,lr,weight_decay, orfused.It also wires the options through the public training API, keeps PTL-only optimizer config out of the legacy namespace, adds focused tests, documents the new parameters, and includes
pytorch-optimizerin the training extra.Related Issue(s): Closes #89
Type of Change
Testing
Test details:
Added and updated tests cover:
TrainConfigdefaults, accepted values, empty-name rejection, and reservedoptimizer_kwargsrejectionoptimizer_kwargsforwarding to RF-DETR's default AdamW optimizerpytorch-optimizerpytorch_optimizer:prefix support, including opting into the external AdamW implementationpytorch-optimizerLion construction smoke testRFDETR.train(optimizer=..., optimizer_kwargs=...)forwarding intoget_train_config()Local validation was run with
PYTHONPATH=srcin therfdetrconda environment via the environment Python executable.Real optimizer smoke tests passed for:
pytorch-optimizerLion construction and one optimizer step with a PyTorch parameter group_build_pytorch_optimizer()constructing Lion with RF-DETR-style parameter groups and running one optimizer step withoptimizer_kwargs={"weight_decouple": True}Checklist
Additional Context
The implementation intentionally uses
pytorch_optimizer.load_optimizer()instead ofcreate_optimizer()so RF-DETR keeps its existing parameter grouping, backbone learning rates, and scheduler behavior. Some specialized optimizers may still require optimizer-specific kwargs; initialization errors include a hint to check the selected optimizer's supported arguments.