-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[core] Avoid fork preexec for Ray Client specific servers #63408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,8 @@ | ||
| import argparse | ||
| import logging | ||
| import os | ||
| import subprocess | ||
| import sys | ||
|
|
||
| from ray._private.ray_constants import LOGGER_FORMAT, LOGGER_LEVEL | ||
| from ray._private.ray_logging import setup_logger | ||
|
|
@@ -8,6 +11,30 @@ | |
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| _PARENT_PIPE_MONITOR_SCRIPT = """ | ||
| import os | ||
| import select | ||
| import signal | ||
| import sys | ||
| target_pid = int(sys.argv[1]) | ||
| while True: | ||
| readable, _, _ = select.select([sys.stdin], [], [], 1.0) | ||
| if readable: | ||
| data = os.read(sys.stdin.fileno(), 1) | ||
| if not data: | ||
| try: | ||
| os.kill(target_pid, signal.SIGTERM) | ||
| except ProcessLookupError: | ||
| pass | ||
| sys.exit(0) | ||
| try: | ||
| os.kill(target_pid, 0) | ||
| except ProcessLookupError: | ||
| sys.exit(0) | ||
| """ | ||
|
|
||
| parser = argparse.ArgumentParser( | ||
| description=("Set up the environment for a Ray worker and launch the worker.") | ||
| ) | ||
|
|
@@ -20,10 +47,30 @@ | |
|
|
||
| parser.add_argument("--language", type=str, help="the language type of the worker") | ||
|
|
||
| parser.add_argument( | ||
| "--monitor-parent-pipe", | ||
| required=False, | ||
| action="store_true", | ||
| help="Internal: exit when inherited stdin pipe reaches EOF.", | ||
| ) | ||
|
|
||
|
|
||
| def _start_parent_pipe_monitor(enabled: bool): | ||
| if not enabled: | ||
| return None | ||
| stdin = getattr(sys.stdin, "buffer", sys.stdin) | ||
| return subprocess.Popen( | ||
| [sys.executable, "-c", _PARENT_PIPE_MONITOR_SCRIPT, str(os.getpid())], | ||
| stdin=stdin, | ||
| stdout=subprocess.DEVNULL, | ||
| stderr=subprocess.DEVNULL, | ||
| ) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| setup_logger(LOGGER_LEVEL, LOGGER_FORMAT) | ||
| args, remaining_args = parser.parse_known_args() | ||
| _start_parent_pipe_monitor(args.monitor_parent_pipe) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When Useful? React with 👍 / 👎.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Handled in the current branch. The parent-pipe monitor is no longer a daemon thread inside The final server no longer starts a second stdin reader, so there is only one monitor consuming the inherited pipe. Validated:
I also retried the targeted pytest command, but this local checkout still cannot collect Ray tests because it lacks the compiled |
||
| # NOTE(edoakes): args.serialized_runtime_env_context is only None when | ||
| # we're starting the main Ray client proxy server. That case should | ||
| # probably not even go through this codepath. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a POSIX specific server is launched, this disables kernel fate sharing immediately, but the replacement stdin monitor is only started later after
setup_workerhas finished andray.util.client.server.main()begins. If the proxier process dies during that bootstrap/import/runtime-env command path, the child no longer gets killed by fate sharing and has not yet started monitoring stdin EOF, so it can continue running orphaned until it eventually reaches server startup. Start the parent-pipe monitor in the setup-worker phase or keep a startup-time fate-sharing mechanism until the monitor is active.Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handled in the current branch. The parent-pipe monitor now starts in
setup_worker.pybeforeRuntimeEnvContext.deserialize()/exec_worker(), so parent death during runtime-env/bootstrap exits the child early. Becauseexec_worker()replaces the process withos.execvp, setup_worker also forwards--monitor-parent-pipeto the final Ray Client server, which starts its own post-exec stdin EOF monitor.This preserves parent-death cleanup both before and after the exec boundary while keeping the fork-safe spawn path.
Validated with compileall, Ruff check, Ruff format check, and
git diff --check. Targeted pytest still cannot collect locally because this checkout lacksray._raylet.