Search before asking
Bug
There is an abnormal thing that I observed after we switched from 1.5.x and older version to PTL trainer is the Number of training iterators show in progress bar during training steps is independent from grad_accum_steps . For example: if i have a train folder contains 8000 images, batch_size= 2 and grad_accum_steps= 4
- in v1.5.x and before: n_iterator = 8000/(batch_size*grad_accum_steps) = 1000
- From v1.6.x: n_iterator = 8000/(batch_size) = 4000. The
grad_accum_steps no longer participate
I used to use this parameter to adjust the work load of GPU VRAM during training . I wonder if we forgot something when switching to PTL
Environment
- OS: windows 11
- RTX 5090
- cuda 12.8
- rfdetr: 1.6.x
- python 3.10
Minimal Reproducible Example
model= RFDETRLarge(resolution= cfg.resolution, device= cfg.model_device, num_classes= num_classes)
model.train(dataset_dir= cfg.dataset_dir, epochs= 100, batch_size= 2, grad_accum_steps= 4)
Additional
No response
Are you willing to submit a PR?
Search before asking
Bug
There is an abnormal thing that I observed after we switched from 1.5.x and older version to PTL trainer is the Number of training iterators show in progress bar during training steps is independent from
grad_accum_steps. For example: if i have a train folder contains 8000 images,batch_size= 2andgrad_accum_steps= 4grad_accum_stepsno longer participateI used to use this parameter to adjust the work load of GPU VRAM during training . I wonder if we forgot something when switching to PTL
Environment
Minimal Reproducible Example
Additional
No response
Are you willing to submit a PR?