Use iris bundle for Ray working_dir to include pb2 files#4182
Conversation
Extract create_workspace_zip() from BundleCreator and use it in fray v1/v2 backends and ray_run.py so that gitignored protobuf artifacts (pb2 files) are included in Ray job bundles.
|
Claude finished @rjpower's task in 58s —— View job PR Review: Use iris bundle for Ray working_dir to include pb2 files
Extracts Findings:
Overall this is a straightforward, well-structured refactor. The temp dir leak is the only item I'd want addressed before merge. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8e888345d9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| # Use a persistent temp directory (not a context manager) so the caller | ||
| # can use the zip path after this function returns. | ||
| td = tempfile.mkdtemp(prefix="workspace_zip_") |
There was a problem hiding this comment.
Delete temporary workspace zip directories after bundling
create_workspace_zip now uses tempfile.mkdtemp(...) to persist workspace.zip, but there is no corresponding cleanup path, and BundleCreator.create_bundle() immediately reads bytes without removing that temp directory. Because this helper is now called for each Ray/Iris submission, long-lived clients will leak one workspace_zip_* directory per job and can eventually exhaust /tmp, causing later submissions to fail.
Useful? React with 👍 / 👎.
|
@claude fix the tempdir leak and the test exclusion bug. why do we need a new tempdir for every execution? can we not just make a zipfile bundle on the fly? |
|
Claude finished @rjpower's task in 3m 34s —— View job Fixing tempdir leak and test exclusion bug
Changes made:
Tests: Ran |
- Register atexit cleanup for temp directories so they don't leak in long-lived processes. - Make caller-supplied exclude_dirs match only top-level directories, preserving the old Ray excludes behavior. Default EXCLUDE_DIRS (__pycache__, .git, etc.) still match at any depth. - Add test verifying nested tests/ directories are not excluded. Co-authored-by: Russell Power <rjpower@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
working_dir (str): Specifies the working directory for the Ray workers. This must either be (1) an local existing directory with total size at most 500 MiB, (2) a local existing zipped file with total unzipped size at most 500 MiB (Note: excludes has no effect), or (3) a URI to a remotely-stored zip file containing the working directory for your job (no file size limit is enforced by Ray). See Remote URIs for details. The specified directory will be downloaded to each node on the cluster, and Ray workers will be started in their node’s copy of this directory. Examples "." # cwd "/src/my_project" "/src/my_project.zip" "s3://path/to/my_dir.zip" do it right |
|
Claude finished @rjpower's task in 2m 35s —— View job Fix temp directory leak — use zip file path directly as
|
Use tempfile.mkstemp(suffix='.zip') instead of mkdtemp + nested zip file. Ray accepts zip file paths directly as working_dir, so no intermediate directory is needed. The single temp file is cleaned up via atexit. Co-authored-by: Russell Power <rjpower@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude fix bugs:: Fray - Tests / fray-tpu-tests (pull_request) E + where '\x1b[36mray::run_on_pod_ray()\x1b[39m (pid=7980, ip=169.254.123.15)\nRuntimeError: The remote function failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment or cannot be found from sys.path ['/workspace/.venv/bin', '/tmp/ray/session_2026-03-26_23-28-55_320387_241/runtime_resources/working_dir_files/_ray_pkg_a61c7d9d4effc741/.venv/lib/python3.11/site-packages/ray/thirdparty_files', '/workspace/.venv/lib/python3.11/site-packages/ray/_private/workers', '/tmp/ray/session_2026-03-26_23-28-55_320387_241/runtime_resources/working_dir_files/_ray_pkg_a61c7d9d4effc741', '/workspace/lib/fray/tests', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '/tmp/ray/session_2026-03-26_23-28-55_320387_241/runtime_resources/working_dir_files/_ray_pkg_a61c7d9d4effc741/.venv/lib/python3.11/site-packages', '/tmp/ray/session_2026-03-26_23-28-55_320387_241/runtime_resources/working_dir_files/_ray_pkg_a61c7d9d4effc741/src']:\n\n\x1b[36mray::run_on_pod_ray()\x1b[39m (pid=7980, ip=169.254.123.15)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/tmp/ray/session_2026-03-26_23-...ray/session_2026-03-26_23-28-55_320387_241/runtime_resources/working_dir_files/_ray_pkg_a61c7d9d4effc741/src/fray/v1/cluster/ray/cluster.py", line 20, in \n from iris.cluster.client.bundle import create_workspace_zip\n File "/tmp/ray/session_2026-03-26_23-28-55_320387_241/runtime_resources/working_dir_files/_ray_pkg_a61c7d9d4effc741/.venv/lib/python3.11/site-packages/iris/init.py", line 1, in \n raise RuntimeError("""\nRuntimeError: \n\nAmbiguous 'iris' package.\nPlease use either:\n\nillumon-iris - Python integration for Illumon Iris, a time-series database\n - pip install illumon-iris\n - https://pypi.python.org/pypi/illumon-iris\n - https://deephaven.io/\n\n**SciTools Iris** - Python library for analysing and visualising meteorological and oceanographic data sets\n - http://scitools.org.uk/iris/\n\ninfo.gianlucacosta.iris - A general-purpose library for Python\n - pip install info.gianlucacosta.iris\n - https://pypi.python.org/pypi/info.gianlucacosta.iris\n\nirisapi - Iris is a highly configurable and flexible service for paging and messaging\n - pip install irisapi\n - https://pypi.org/project/irisapi/\n - https://iris.claims/' = str(RayTaskError(RuntimeError)(RuntimeError('The remote function failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment or cannot be found from sys.path ['/workspace/.venv/bin', '/tmp/ray/session_2026-03-26_23-28-55_320387_241/runtime_resources/working_dir_files/_ray_pkg_a61c7d9d4effc741/.venv/lib/python3.11/site-packages/ray/thirdparty_files', '/workspace/.venv/lib/python3.11/site-packages/ray/_private/workers', '/tmp/ray/session_2026-03-26_23-28-55_320387_241/runtime_resources/working_dir_files/_ray_pkg_a61c7d9d4effc741', '/workspace/lib/fray/tests', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '/tmp/ray/session_2026-03-26_23-28-55_320387_241/runtime_resources/working_dir_files/_ray_pkg_a61c7d9d4effc741/.venv/lib/python3.11/site-packages', '/tmp/ray/session_2026-03-26_23-28-55_320387_241/runtime_resources/working_dir_files/_ray_pkg_a61c7d9d4effc741/src']:\n\nTraceback (most recent call last):\n File "/tmp/ray/session_2026-03-26_23-28-55_320387_241/runtime_resources/working_dir_files/_ray_pkg_a61c7d9d4effc741/.venv/lib/python3.11/site-packages/ray/_private...ssion_2026-03-26_23-28-55_320387_241/runtime_resources/working_dir_files/_ray_pkg_a61c7d9d4effc741/src/fray/v1/cluster/ray/cluster.py", line 20, in \n from iris.cluster.client.bundle import create_workspace_zip\n File "/tmp/ray/session_2026-03-26_23-28-55_320387_241/runtime_resources/working_dir_files/_ray_pkg_a61c7d9d4effc741/.venv/lib/python3.11/site-packages/iris/init.py", line 1, in \n raise RuntimeError("""\nRuntimeError: \n\nAmbiguous 'iris' package.\nPlease use either:\n\nillumon-iris - Python integration for Illumon Iris, a time-series database\n - pip install illumon-iris\n - https://pypi.python.org/pypi/illumon-iris\n - https://deephaven.io/\n\n**SciTools Iris** - Python library for analysing and visualising meteorological and oceanographic data sets\n - http://scitools.org.uk/iris/\n\ninfo.gianlucacosta.iris - A general-purpose library for Python\n - pip install info.gianlucacosta.iris\n - https://pypi.python.org/pypi/info.gianlucacosta.iris\n\nirisapi - Iris is a highly configurable and flexible service for paging and messaging\n - pip install irisapi\n - https://pypi.org/project/irisapi/\n - https://iris.claims/\n\n'))) |
|
Claude finished @rjpower's task in 4m 54s —— View job Fixing
|
…on workers The worker's .venv has a PyPI iris package that shadows our local iris source. Since create_workspace_zip and configure_logging are only needed at submission time (not on workers), move the imports inside the functions that use them. Co-authored-by: Russell Power <rjpower@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removed several tests related to create_workspace_zip functionality.
…mp directory JobSubmissionClient requires a directory path for working_dir, not a zip. Split bundle.py into collect_workspace_files (file list), create_workspace_zip (bytes for Iris), and create_workspace_dir (temp dir for Ray). Collapse the three separate exclude params (dirs/extensions/subpaths) into a single regex.
Extract create_workspace_zip() from BundleCreator as a reusable function that produces a zip including gitignored protobuf artifacts (pb2 files). Wire it into fray v1/v2 ray backends and ray_run.py as the working_dir source, replacing Ray's native directory bundling which skips gitignored files. This fixes jobs failing because generated pb2 files were missing from the uploaded workspace. --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Russell Power <rjpower@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Extract create_workspace_zip() from BundleCreator as a reusable function
that produces a zip including gitignored protobuf artifacts (pb2 files).
Wire it into fray v1/v2 ray backends and ray_run.py as the working_dir
source, replacing Ray's native directory bundling which skips gitignored
files. This fixes jobs failing because generated pb2 files were missing
from the uploaded workspace.