Skip to content

Question about training time on LIBERO-10 (eg. 8 GPUs, grad accumulation = 40) #78

@Jade0716

Description

@Jade0716

Hi, thanks for your great work!

I have a question regarding the training efficiency of this project on LIBERO-10.
Could you share an approximate training time for the following setup?

Number of GPUs: 8
Gradient accumulation steps: 40
Total training steps: 4k

Additionally, I only have access to 2×96GB H20 GPUs. I’d like to understand:

Approximately how many training steps are needed to reach a reasonable/acceptable loss
What final loss values you observed for both action loss and video (latent) loss at convergence

For reference, I will attach my training curve: I have trained for 800 steps with Gradient accumulation steps: 32, 2 GPUS (the model had already been pretrained for 400 steps before that), used for 64 hours.

Image

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions