You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 14, 2024. It is now read-only.
I have a ubuntu22.04 VM that runs the rntop docker container.
I have a host that simulate the GPU machine that I want to monitor.
I can ping the host from my VM as well as SSH from VM to host.
I am able to run docker run -it --rm -v $HOME/.ssh:/root/.ssh --entrypoint bash runai/rntop -c "ssh user@machine nvidia-smi". From the README, it means that the container can connect to the machine and it's the rntop application itself that can't.
When I proceed by adding --ssh to the rntop command, i.e. sudo docker run -it --rm -v $HOME/.ssh:/root/.ssh --entrypoint bash runai/rntop:latest user@machine, it fails. I have error "GPUs wmove() failed. Terminate called after throwing an instance of 'std::expression'. In the printed output, there is no cluster and nodes info printed out too.
I am not sure how to further troubleshoot, any advice? Thanks