Skip to content

[Question] When I use multi GPU distributed training, the process freezes #4282

@m4k1se

Description

@m4k1se

Question

[我尝试使用accelerate库在ISSAMSIM5.1.0环境中为多GPU加速提供支持,但是训练时进程会频繁卡死,GPU相关检查都显示正常,想请求那些遇到过类似问题的人的帮助。](

Image

)
After printing this information on the terminal, it will completely freeze and Nvidia smi will also become invalid.

Build Info

Describe the versions that you are currently using:

  • Isaac Lab Version: [e.g. 2.3.0]
  • Isaac Sim Version: [e.g. 5.1, this can be obtained by cat ${ISAACSIM_PATH}/VERSION]

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions