Question about training time on LIBERO-10 (eg. 8 GPUs, grad accumulation = 40)

Hi, thanks for your great work!

I have a question regarding the training efficiency of this project on LIBERO-10.
Could you share an approximate training time for the following setup?

Number of GPUs: 8
Gradient accumulation steps: 40
Total training steps: 4k

Additionally, I only have access to 2×96GB H20 GPUs. I’d like to understand:

Approximately how many training steps are needed to reach a reasonable/acceptable loss
What final loss values you observed for both action loss and video (latent) loss at convergence

For reference, I will attach my training curve: I have trained for 800 steps with Gradient accumulation steps: 32, 2 GPUS (the model had already been pretrained for 400 steps before that), used for 64 hours.

<img width="1514" height="666" alt="Image" src="https://github.com/user-attachments/assets/50d555a4-f53e-4426-b8b6-205060e3c88a" />

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about training time on LIBERO-10 (eg. 8 GPUs, grad accumulation = 40) #78

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about training time on LIBERO-10 (eg. 8 GPUs, grad accumulation = 40) #78

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions