-
Notifications
You must be signed in to change notification settings - Fork 166
Description
Describe the bug
Trying to open the Ray dashboard in the NeMo-RL container will result in 404 errors and the dashboard will never load. This is due to Ray being installed in the /opt/nemo_rl_venv directory by uv but the main.js and main.css files used for the Ray dashboard get symlinked to /root as the container install does symlinks for all copies in the uv stages in the Dockerfile. The HTTP server used by Ray dashboard doesn't allow symlinks outside of the parent directory which causes the 404 errors. Instead, by using the copy link-mode during the uv pip install for Ray will fix the error.
Steps/Code to reproduce bug
- Build the
nemo-rlrelease container locally. Tested with commit5514d1e6eb0036f4ccbc5dac42b1b860959d0087. - Launch the container locally and expose port
8265for the Ray dashboard. - Run
ray start --head --port=6379 --dashboard-host=0.0.0.0 --block --include-dashboard=trueinside the container to start the Ray cluster. - Open localhost:8265 in your browser to view the Ray dashboard - this will never load and there will be 404 errors in the browser debug tools.
Instead, this can be worked around by re-installing the ray package with uv and using copy link-mode. To do so:
- Stop the Ray process in the container if still running.
- Re-install ray with
uv pip install --link-mode=copy --force-reinstall "ray[default]==2.46.0". - Re-launch the Ray cluster with
ray start --head --port=6379 --dashboard-host=0.0.0.0 --block --include-dashboard=true. - Open localhost:8265 in your browser - the dashboard should now load properly.
Expected behavior
Ray should be installed without symlinks so it can render the dashboard properly. As-is, the dashboard will not load due to the symlinks being in another parent directory.
The Dockerfile should be updated to either force-reinstall Ray without symlinks, or all dependences should be installed without symlinks, but that is less desirable.