Skip to content

Number of training iterators is no longer impacted by 'grad_accum_steps' #1012

@LeMinhNgan

Description

@LeMinhNgan

Search before asking

  • I have searched the RF-DETR issues and found no similar bug report.

Bug

There is an abnormal thing that I observed after we switched from 1.5.x and older version to PTL trainer is the Number of training iterators show in progress bar during training steps is independent from grad_accum_steps . For example: if i have a train folder contains 8000 images, batch_size= 2 and grad_accum_steps= 4

  • in v1.5.x and before: n_iterator = 8000/(batch_size*grad_accum_steps) = 1000
  • From v1.6.x: n_iterator = 8000/(batch_size) = 4000. The grad_accum_steps no longer participate
    I used to use this parameter to adjust the work load of GPU VRAM during training . I wonder if we forgot something when switching to PTL

Environment

  • OS: windows 11
  • RTX 5090
  • cuda 12.8
  • rfdetr: 1.6.x
  • python 3.10

Minimal Reproducible Example

model= RFDETRLarge(resolution= cfg.resolution, device= cfg.model_device, num_classes= num_classes)       
model.train(dataset_dir= cfg.dataset_dir, epochs= 100, batch_size= 2, grad_accum_steps= 4)

Additional

No response

Are you willing to submit a PR?

  • Yes, I'd like to help by submitting a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions