Search before asking
Bug: RF-DETR v1.6.4: 2x Slower Training, Logging Issues, and Missing GPU Memory Bar
Hi,
I'm currently training an instance segmentation model using RF-DETR v1.6.4 and I'm experiencing several issues compared to version 1.5.0.
Dataset
- ~18,000 images
- Resolution: 512x512
Issues
1. GPU Memory Bar Not Showing
The GPU memory usage bar that was visible in previous versions is no longer displayed during training.
2. Logging Issues
- A
log.txt file is generated, but it only contains information for a single epoch.
hparams.yaml is created but remains empty.
3. Significant Performance Drop (Critical)
Training with v1.6.4 is approximately 2x slower than v1.5.0 under the same conditions:
- Same dataset
- Same hardware
- Similar training configuration
This is the most concerning issue.
Additional Notes
- No major dataset or hardware changes were introduced between versions.
- The issues appeared after upgrading from v1.5.0 to v1.6.4.
Possible Hypotheses
I am not sure whether the slowdown is caused by one of the following:
- A change in how
num_workers is determined internally
- Changes in the augmentation pipeline, especially when using
AUG_INDUSTRIAL
- A change in how the effective batch size is computed or handled internally
- Possible changes related to PyTorch Lightning (e.g., trainer behavior, logging, or performance overhead)
Questions
- Has anyone experienced similar issues with v1.6.4?
- Are there known regressions affecting performance or logging?
- Has anything changed internally regarding
num_workers, augmentations, batch size handling, or PyTorch Lightning integration?
- Could this be related to PyTorch 2.8 or CUDA 12.9 compatibility?
Thanks in advance for any help!
Environment
- GPU: NVIDIA RTX 5090
- OS: Linux
- Python: 3.10.19
- PyTorch: 2.8.0+cu129
- CUDA: 12.9
Minimal Reproducible Example
#!/usr/bin/env python3
DATASET_DIR = "/path/to/dataset"
OUTPUT_DIR = "/path/to/output"
EPOCHS = 35
BATCH_SIZE = 9
GRAD_ACCUM_STEPS = 8
LEARNING_RATE = 1e-4
from rfdetr import RFDETRSegLarge
from rfdetr.datasets.aug_config import AUG_INDUSTRIAL
import torch
EARLY_STOPPING = True
EARLY_STOPPING_PATIENCE = 10
PROJECT_NAME = "rfdetr"
def main():
model = RFDETRSegLarge()
print(f"Model input resolution: {model.model.resolution}")
model.train(
dataset_dir=DATASET_DIR,
aug_config=AUG_INDUSTRIAL,
run_test=False,
checkpoint_interval=2,
epochs=EPOCHS,
batch_size=BATCH_SIZE,
grad_accum_steps=GRAD_ACCUM_STEPS,
lr=LEARNING_RATE,
output_dir=OUTPUT_DIR,
project=PROJECT_NAME,
early_stopping=EARLY_STOPPING,
early_stopping_patience=EARLY_STOPPING_PATIENCE,
progress_bar=True
)
if __name__ == "__main__":
main()
Search before asking
Bug: RF-DETR v1.6.4: 2x Slower Training, Logging Issues, and Missing GPU Memory Bar
Hi,
I'm currently training an instance segmentation model using RF-DETR v1.6.4 and I'm experiencing several issues compared to version 1.5.0.
Dataset
Issues
1. GPU Memory Bar Not Showing
The GPU memory usage bar that was visible in previous versions is no longer displayed during training.
2. Logging Issues
log.txtfile is generated, but it only contains information for a single epoch.hparams.yamlis created but remains empty.3. Significant Performance Drop (Critical)
Training with v1.6.4 is approximately 2x slower than v1.5.0 under the same conditions:
This is the most concerning issue.
Additional Notes
Possible Hypotheses
I am not sure whether the slowdown is caused by one of the following:
num_workersis determined internallyAUG_INDUSTRIALQuestions
num_workers, augmentations, batch size handling, or PyTorch Lightning integration?Thanks in advance for any help!
Environment
Minimal Reproducible Example