Hi, thanks for your great work!
I have a question regarding the training efficiency of this project on LIBERO-10.
Could you share an approximate training time for the following setup?
Number of GPUs: 8
Gradient accumulation steps: 40
Total training steps: 4k
Additionally, I only have access to 2×96GB H20 GPUs. I’d like to understand:
Approximately how many training steps are needed to reach a reasonable/acceptable loss
What final loss values you observed for both action loss and video (latent) loss at convergence
For reference, I will attach my training curve: I have trained for 800 steps with Gradient accumulation steps: 32, 2 GPUS (the model had already been pretrained for 400 steps before that), used for 64 hours.
Thanks in advance!
Hi, thanks for your great work!
I have a question regarding the training efficiency of this project on LIBERO-10.
Could you share an approximate training time for the following setup?
Number of GPUs: 8
Gradient accumulation steps: 40
Total training steps: 4k
Additionally, I only have access to 2×96GB H20 GPUs. I’d like to understand:
Approximately how many training steps are needed to reach a reasonable/acceptable loss
What final loss values you observed for both action loss and video (latent) loss at convergence
For reference, I will attach my training curve: I have trained for 800 steps with Gradient accumulation steps: 32, 2 GPUS (the model had already been pretrained for 400 steps before that), used for 64 hours.
Thanks in advance!