Container Slow when used with enroot

Hi all,

I'd like to run an IsaacLab container on a cluster powered by [enroot](https://github.com/NVIDIA/enroot/tree/master). I build my container with `./docker/container.py start`. Then, I have my docker container.

When I run `docker run --entrypoint tail --gpus all isaac-lab-base -f /dev/null` on my local machine (NOTE: overwriting the entrypoint is important, otherwise I think IsaacSim starts in the container and my trainings are 3x slower.), I get the usual training speed.

However, when I try to start the same container with enroot on the cluster, I get ~3x slower training speed, also when I overwrite the containers entrypoint.

The speed on the cluster should really be the same as on my local machine (using RTX4090 and H100). I made sure that I have GPU access on local and remote. If I train a plain pytorch NN I get the same training speeds.

I tried for about 1.5 days and all options that slurm, docker, and enroot provide.

Did anyone make a similar experience or has an idea what the issue could be?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Container Slow when used with enroot #2549

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Container Slow when used with enroot #2549

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions