Skip to content

Container Slow when used with enroot #2549

@OliEfr

Description

@OliEfr

Hi all,

I'd like to run an IsaacLab container on a cluster powered by enroot. I build my container with ./docker/container.py start. Then, I have my docker container.

When I run docker run --entrypoint tail --gpus all isaac-lab-base -f /dev/null on my local machine (NOTE: overwriting the entrypoint is important, otherwise I think IsaacSim starts in the container and my trainings are 3x slower.), I get the usual training speed.

However, when I try to start the same container with enroot on the cluster, I get ~3x slower training speed, also when I overwrite the containers entrypoint.

The speed on the cluster should really be the same as on my local machine (using RTX4090 and H100). I made sure that I have GPU access on local and remote. If I train a plain pytorch NN I get the same training speeds.

I tried for about 1.5 days and all options that slurm, docker, and enroot provide.

Did anyone make a similar experience or has an idea what the issue could be?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions