Conversation
Signed-off-by: harshit <harshit@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request successfully moves the router retry backoff configuration from environment variables to RequestRouterConfig, deprecating the old env vars. The changes are well-tested and correctly implemented across the codebase.
One point of feedback: the documentation in doc/source/serve/advanced-guides/performance.md for the deprecated RAY_SERVE_ROUTER_RETRY_* environment variables has not been updated. It should be updated to reflect that these variables are deprecated and guide users to use the new RequestRouterConfig options.
I've also suggested some minor improvements to the descriptions of the new config options in python/ray/serve/config.py to make the deprecation clearer to users.
| description=( | ||
| "Initial backoff time (in seconds) before retrying to route a request " | ||
| "to a replica. Defaults to RAY_SERVE_ROUTER_RETRY_INITIAL_BACKOFF_S " | ||
| "environment variable, or 0.025 if not set." | ||
| ), |
There was a problem hiding this comment.
The description is a bit confusing. To make it clearer that the environment variable is being deprecated, consider rephrasing it. For example:
"Initial backoff time (in seconds) before retrying to route a request to a replica. Defaults to 0.025. This can be overridden by the deprecated `RAY_SERVE_ROUTER_RETRY_INITIAL_BACKOFF_S` environment variable."This applies to backoff_multiplier and max_backoff_s as well.
description=(
"Initial backoff time (in seconds) before retrying to route a request "
"to a replica. Defaults to 0.025. This can be overridden by the deprecated `RAY_SERVE_ROUTER_RETRY_INITIAL_BACKOFF_S` environment variable."
),| description=( | ||
| "Multiplier applied to the backoff time after each retry. " | ||
| "Defaults to RAY_SERVE_ROUTER_RETRY_BACKOFF_MULTIPLIER " | ||
| "environment variable, or 2 if not set." | ||
| ), |
There was a problem hiding this comment.
To make it clearer that the environment variable is being deprecated, consider rephrasing the description. For example:
"Multiplier applied to the backoff time after each retry. Defaults to 2. This can be overridden by the deprecated `RAY_SERVE_ROUTER_RETRY_BACKOFF_MULTIPLIER` environment variable." description=(
"Multiplier applied to the backoff time after each retry. Defaults to 2. "
"This can be overridden by the deprecated `RAY_SERVE_ROUTER_RETRY_BACKOFF_MULTIPLIER` environment variable."
),| description=( | ||
| "Maximum backoff time (in seconds) between retries. " | ||
| "Defaults to RAY_SERVE_ROUTER_RETRY_MAX_BACKOFF_S " | ||
| "environment variable, or 0.5 if not set." | ||
| ), |
There was a problem hiding this comment.
To make it clearer that the environment variable is being deprecated, consider rephrasing the description. For example:
"Maximum backoff time (in seconds) between retries. Defaults to 0.5. This can be overridden by the deprecated `RAY_SERVE_ROUTER_RETRY_MAX_BACKOFF_S` environment variable." description=(
"Maximum backoff time (in seconds) between retries. Defaults to 0.5. "
"This can be overridden by the deprecated `RAY_SERVE_ROUTER_RETRY_MAX_BACKOFF_S` environment variable."
),| double backoff_multiplier = 7; | ||
|
|
||
| // Maximum backoff time (in seconds) between retries. | ||
| double max_backoff_s = 8; |
There was a problem hiding this comment.
PR lacks description explaining changes
Low Severity
To help reviewers, please ensure your PR includes:
- Title: A concise summary of the change
- Description:
- What problem does this solve?
- How does this PR solve it?
- Any relevant context for reviewers such as:
- Why is the problem important to solve?
- Why was this approach chosen over others?
See this list of PRs as examples for PRs that have gone above and beyond:
- [Core] Introduce local port service discovery #59613
- [Core] Improve Large-Scale Resource View Synchronization Through Sync Message Batching #57641
- Remove node observability information from hot path of core components #56474
- [core][rdt] Support out-of-order actors by extracting metadata when creating #59610
- [core] fix open leak for plasma store memory (shm/fallback) by workers #52622
Signed-off-by: harshit <harshit@anyscale.com>


Summary
This PR introduces two improvements to Ray Serve's router configuration:
Document existing env vars: Added documentation for three undocumented queue length probing environment variables:
RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_SRAY_SERVE_MAX_QUEUE_LENGTH_RESPONSE_DEADLINE_SRAY_SERVE_QUEUE_LENGTH_CACHE_TIMEOUT_SMigrate backoff env vars to config: Makes three router backoff environment variables configurable via
RequestRouterConfig. The env vars are marked as deprecated with warnings guiding users to the new config options.Environment variables being migrated:
RAY_SERVE_ROUTER_RETRY_INITIAL_BACKOFF_S→request_router_config.initial_backoff_sRAY_SERVE_ROUTER_RETRY_BACKOFF_MULTIPLIER→request_router_config.backoff_multiplierRAY_SERVE_ROUTER_RETRY_MAX_BACKOFF_S→request_router_config.max_backoff_sChanges
Documentation (
performance.md)Config Layer (
config.py,serve.proto)RequestRouterConfig:initial_backoff_s,backoff_multiplier,max_backoff_sRouter Layer (
router.py,request_router.py)AsyncioRouter.update_deployment_config()extracts backoff params from config and stores themRequestRouter.__init__()when the router is createdRequestRouterstores params as instance variables and uses them in retry logicDeprecation Warnings (
constants_utils.py,router.py)_fully_deprecated_env_varsdictionaryDeprecationWarningwhen deprecated env vars are set, guiding users to use config instead