feat: Improve client perf and error handling#247
feat: Improve client perf and error handling#247LukeAVanDrie wants to merge 1 commit intokubernetes-sigs:mainfrom
Conversation
Reuses aiohttp.ClientSession across requests in openAIModelServerClient to reduce connection overhead. This change improves client-side throughput and latency. Additional improvements: - Refines error handling to distinguish between network errors (like aiohttp.ClientError), non-200 HTTP status codes, and errors during response processing. - Ensures non-200 responses with text bodies are captured. - Guarantees response body is always consumed to release connections.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: LukeAVanDrie The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| ) | ||
| ) | ||
|
|
||
| end_time = time.perf_counter() |
jjk-g
left a comment
There was a problem hiding this comment.
Thanks for adding! One nit
/lgtm
achandrasekar
left a comment
There was a problem hiding this comment.
Can you add how the change was tested and if you have any numbers on improvements that'd be great too?
|
Please address the linting and type check issue above |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@LukeAVanDrie friendly ping for linting and type check errors |
| elif not tokenizer_config: | ||
| tokenizer_config = CustomTokenizerConfig(pretrained_model_name_or_path=self.model_name) | ||
| self.tokenizer = CustomTokenizer(tokenizer_config) | ||
| self.session = aiohttp.ClientSession(connector=aiohttp.TCPConnector(limit=self.max_tcp_connections)) |
There was a problem hiding this comment.
Please correct me if I'm wrong, but isn't openAIModelServerClient shared across multiple asyncio event loops because of multiprocessing? Creating a single ClientSession here might cause issues if the same instance is also being shared to all the multiprocessing workers.
Relevant link: https://stackoverflow.com/questions/62707369/one-aiohttp-clientsession-per-thread
Slightly refactor `openAIModelServerClient` to accept a custom
`aiohttp.ClientSession` per request, which allows us to use exactly 1
client session per worker.
Prior to this commit, a new `aiohttp.ClientSession` is created for each
request. Not only is this inefficient and lowers throughput, on certain
environments, it also leads to inotify watch issues:
aiodns - WARNING - Failed to create DNS resolver channel with
automatic monitoring of resolver configuration changes. This usually
means the system ran out of inotify watches. Falling back to socket
state callback. Consider increasing the system inotify watch limit:
Failed to initialize c-ares channel
Indeed, because each DNS resolver is created for a new `ClientSession`,
creating tons of new `ClientSession`s causes eventual inotify watch
exhaustion. Sharing `ClientSession`s solves this issue.
Relevant links:
- https://docs.aiohttp.org/en/stable/http_request_lifecycle.html
- https://stackoverflow.com/questions/62707369/one-aiohttp-clientsession-per-thread
- home-assistant/core#144457 (comment)
Relevant PR: kubernetes-sigs#247
(doesn't address the issue of worker sharing).
Slightly refactor `openAIModelServerClient` to accept a custom
`aiohttp.ClientSession` per request, which allows us to use exactly 1
client session per worker.
Prior to this commit, a new `aiohttp.ClientSession` is created for each
request. Not only is this inefficient and lowers throughput, on certain
environments, it also leads to inotify watch issues:
aiodns - WARNING - Failed to create DNS resolver channel with
automatic monitoring of resolver configuration changes. This usually
means the system ran out of inotify watches. Falling back to socket
state callback. Consider increasing the system inotify watch limit:
Failed to initialize c-ares channel
Indeed, because each DNS resolver is created for a new `ClientSession`,
creating tons of new `ClientSession`s causes eventual inotify watch
exhaustion. Sharing `ClientSession`s solves this issue.
Relevant links:
- https://docs.aiohttp.org/en/stable/http_request_lifecycle.html
- https://stackoverflow.com/questions/62707369/one-aiohttp-clientsession-per-thread
- home-assistant/core#144457 (comment)
Relevant PR: kubernetes-sigs#247
(doesn't address the issue of worker sharing).
Slightly refactor `openAIModelServerClient` to accept a custom
`aiohttp.ClientSession` per request, which allows us to use exactly 1
client session per worker.
Prior to this commit, a new `aiohttp.ClientSession` is created for each
request. Not only is this inefficient and lowers throughput, on certain
environments, it also leads to inotify watch issues:
aiodns - WARNING - Failed to create DNS resolver channel with
automatic monitoring of resolver configuration changes. This usually
means the system ran out of inotify watches. Falling back to socket
state callback. Consider increasing the system inotify watch limit:
Failed to initialize c-ares channel
Indeed, because each DNS resolver is created for a new `ClientSession`,
creating tons of new `ClientSession`s causes eventual inotify watch
exhaustion. Sharing `ClientSession`s solves this issue.
Relevant links:
- https://docs.aiohttp.org/en/stable/http_request_lifecycle.html
- https://stackoverflow.com/questions/62707369/one-aiohttp-clientsession-per-thread
- home-assistant/core#144457 (comment)
Relevant PR: kubernetes-sigs#247
(doesn't address the issue of worker sharing).
Slightly refactor `openAIModelServerClient` to accept a custom
`aiohttp.ClientSession` per request, which allows us to use exactly 1
client session per worker.
Prior to this commit, a new `aiohttp.ClientSession` is created for each
request. Not only is this inefficient and lowers throughput, on certain
environments, it also leads to inotify watch issues:
aiodns - WARNING - Failed to create DNS resolver channel with
automatic monitoring of resolver configuration changes. This usually
means the system ran out of inotify watches. Falling back to socket
state callback. Consider increasing the system inotify watch limit:
Failed to initialize c-ares channel
Indeed, because each DNS resolver is created for a new `ClientSession`,
creating tons of new `ClientSession`s causes eventual inotify watch
exhaustion. Sharing `ClientSession`s solves this issue.
Relevant links:
- https://docs.aiohttp.org/en/stable/http_request_lifecycle.html
- https://stackoverflow.com/questions/62707369/one-aiohttp-clientsession-per-thread
- home-assistant/core#144457 (comment)
Relevant PR: kubernetes-sigs#247
(doesn't address the issue of worker sharing).
Slightly refactor `openAIModelServerClient` to add a new method,
`process_request_with_session`, that accepts a custom
`ReusableHTTPClientSession` per request, which allows the caller to
reuse an HTTP client session per worker.
The previous method, `process_request`, is made to create a fresh HTTP
client session then call `process_request_with_session`, preserving the
previous behavior.
Prior to this commit, a new `aiohttp.ClientSession` is created for each
request. Not only is this inefficient and lowers throughput, on certain
environments, it also leads to inotify watch issues:
aiodns - WARNING - Failed to create DNS resolver channel with
automatic monitoring of resolver configuration changes. This usually
means the system ran out of inotify watches. Falling back to socket
state callback. Consider increasing the system inotify watch limit:
Failed to initialize c-ares channel
Indeed, because each DNS resolver is created for a new `ClientSession`,
creating tons of new `ClientSession`s causes eventual inotify watch
exhaustion. Sharing `ClientSession`s solves this issue.
Relevant links:
- https://docs.aiohttp.org/en/stable/http_request_lifecycle.html
- https://stackoverflow.com/questions/62707369/one-aiohttp-clientsession-per-thread
- home-assistant/core#144457 (comment)
Relevant PR: kubernetes-sigs#247
(doesn't address the issue of worker sharing).
Slightly refactor `openAIModelServerClient` to add a new method,
`process_request_with_session`, that accepts a custom
`ReusableHTTPClientSession` per request, which allows the caller to
reuse an HTTP client session per worker.
The previous method, `process_request`, is made to create a fresh HTTP
client session then call `process_request_with_session`, preserving the
previous behavior.
Prior to this commit, a new `aiohttp.ClientSession` is created for each
request. Not only is this inefficient and lowers throughput, on certain
environments, it also leads to inotify watch issues:
aiodns - WARNING - Failed to create DNS resolver channel with
automatic monitoring of resolver configuration changes. This usually
means the system ran out of inotify watches. Falling back to socket
state callback. Consider increasing the system inotify watch limit:
Failed to initialize c-ares channel
Indeed, because each DNS resolver is created for a new `ClientSession`,
creating tons of new `ClientSession`s causes eventual inotify watch
exhaustion. Sharing `ClientSession`s solves this issue.
Relevant links:
- https://docs.aiohttp.org/en/stable/http_request_lifecycle.html
- https://stackoverflow.com/questions/62707369/one-aiohttp-clientsession-per-thread
- home-assistant/core#144457 (comment)
Relevant PR: kubernetes-sigs#247
(doesn't address the issue of worker sharing).
Slightly refactor `openAIModelServerClient` to add a new method,
`process_request_with_session`, that accepts a custom
`ReusableHTTPClientSession` per request, which allows the caller to
reuse an HTTP client session per worker.
The previous method, `process_request`, is made to create a fresh HTTP
client session then call `process_request_with_session`, preserving the
previous behavior.
Prior to this commit, a new `aiohttp.ClientSession` is created for each
request. Not only is this inefficient and lowers throughput, on certain
environments, it also leads to inotify watch issues:
aiodns - WARNING - Failed to create DNS resolver channel with
automatic monitoring of resolver configuration changes. This usually
means the system ran out of inotify watches. Falling back to socket
state callback. Consider increasing the system inotify watch limit:
Failed to initialize c-ares channel
Indeed, because each DNS resolver is created for a new `ClientSession`,
creating tons of new `ClientSession`s causes eventual inotify watch
exhaustion. Sharing `ClientSession`s solves this issue.
Relevant links:
- https://docs.aiohttp.org/en/stable/http_request_lifecycle.html
- https://stackoverflow.com/questions/62707369/one-aiohttp-clientsession-per-thread
- home-assistant/core#144457 (comment)
Relevant PR: kubernetes-sigs#247
(doesn't address the issue of worker sharing).
Slightly refactor `openAIModelServerClient` to add a new method,
`process_request_with_session`, that accepts a custom
`ReusableHTTPClientSession` per request, which allows the caller to
reuse an HTTP client session per worker.
The previous method, `process_request`, is made to create a fresh HTTP
client session then call `process_request_with_session`, preserving the
previous behavior.
Prior to this commit, a new `aiohttp.ClientSession` is created for each
request. Not only is this inefficient and lowers throughput, on certain
environments, it also leads to inotify watch issues:
aiodns - WARNING - Failed to create DNS resolver channel with
automatic monitoring of resolver configuration changes. This usually
means the system ran out of inotify watches. Falling back to socket
state callback. Consider increasing the system inotify watch limit:
Failed to initialize c-ares channel
Indeed, because each DNS resolver is created for a new `ClientSession`,
creating tons of new `ClientSession`s causes eventual inotify watch
exhaustion. Sharing `ClientSession`s solves this issue.
Relevant links:
- https://docs.aiohttp.org/en/stable/http_request_lifecycle.html
- https://stackoverflow.com/questions/62707369/one-aiohttp-clientsession-per-thread
- home-assistant/core#144457 (comment)
Relevant PR: kubernetes-sigs#247
(doesn't address the issue of worker sharing).
|
@LukeAVanDrie Thanks for the contribution. Can you please rebase this PR? |
Yes, apologies for the long delay here. I will make sure to update the description with my testing results and verify @diamondburned's concern regarding multiprocessing. |
Slightly refactor `openAIModelServerClient` to add a new method,
`process_request_with_session`, that accepts a custom
`ReusableHTTPClientSession` per request, which allows the caller to
reuse an HTTP client session per worker.
The previous method, `process_request`, is made to create a fresh HTTP
client session then call `process_request_with_session`, preserving the
previous behavior.
Prior to this commit, a new `aiohttp.ClientSession` is created for each
request. Not only is this inefficient and lowers throughput, on certain
environments, it also leads to inotify watch issues:
aiodns - WARNING - Failed to create DNS resolver channel with
automatic monitoring of resolver configuration changes. This usually
means the system ran out of inotify watches. Falling back to socket
state callback. Consider increasing the system inotify watch limit:
Failed to initialize c-ares channel
Indeed, because each DNS resolver is created for a new `ClientSession`,
creating tons of new `ClientSession`s causes eventual inotify watch
exhaustion. Sharing `ClientSession`s solves this issue.
Relevant links:
- https://docs.aiohttp.org/en/stable/http_request_lifecycle.html
- https://stackoverflow.com/questions/62707369/one-aiohttp-clientsession-per-thread
- home-assistant/core#144457 (comment)
Relevant PR: kubernetes-sigs#247
(doesn't address the issue of worker sharing).
Slightly refactor `openAIModelServerClient` to add a new method,
`process_request_with_session`, that accepts a custom
`ReusableHTTPClientSession` per request, which allows the caller to
reuse an HTTP client session per worker.
The previous method, `process_request`, is made to create a fresh HTTP
client session then call `process_request_with_session`, preserving the
previous behavior.
Prior to this commit, a new `aiohttp.ClientSession` is created for each
request. Not only is this inefficient and lowers throughput, on certain
environments, it also leads to inotify watch issues:
aiodns - WARNING - Failed to create DNS resolver channel with
automatic monitoring of resolver configuration changes. This usually
means the system ran out of inotify watches. Falling back to socket
state callback. Consider increasing the system inotify watch limit:
Failed to initialize c-ares channel
Indeed, because each DNS resolver is created for a new `ClientSession`,
creating tons of new `ClientSession`s causes eventual inotify watch
exhaustion. Sharing `ClientSession`s solves this issue.
Relevant links:
- https://docs.aiohttp.org/en/stable/http_request_lifecycle.html
- https://stackoverflow.com/questions/62707369/one-aiohttp-clientsession-per-thread
- home-assistant/core#144457 (comment)
Relevant PR: kubernetes-sigs#247
(doesn't address the issue of worker sharing).
Slightly refactor `openAIModelServerClient` to add a new method,
`process_request_with_session`, that accepts a custom
`ReusableHTTPClientSession` per request, which allows the caller to
reuse an HTTP client session per worker.
The previous method, `process_request`, is made to create a fresh HTTP
client session then call `process_request_with_session`, preserving the
previous behavior.
Prior to this commit, a new `aiohttp.ClientSession` is created for each
request. Not only is this inefficient and lowers throughput, on certain
environments, it also leads to inotify watch issues:
aiodns - WARNING - Failed to create DNS resolver channel with
automatic monitoring of resolver configuration changes. This usually
means the system ran out of inotify watches. Falling back to socket
state callback. Consider increasing the system inotify watch limit:
Failed to initialize c-ares channel
Indeed, because each DNS resolver is created for a new `ClientSession`,
creating tons of new `ClientSession`s causes eventual inotify watch
exhaustion. Sharing `ClientSession`s solves this issue.
Relevant links:
- https://docs.aiohttp.org/en/stable/http_request_lifecycle.html
- https://stackoverflow.com/questions/62707369/one-aiohttp-clientsession-per-thread
- home-assistant/core#144457 (comment)
Relevant PR: kubernetes-sigs#247
(doesn't address the issue of worker sharing).
Slightly refactor `openAIModelServerClient` to add a new method,
`process_request_with_session`, that accepts a custom
`ReusableHTTPClientSession` per request, which allows the caller to
reuse an HTTP client session per worker.
The previous method, `process_request`, is made to create a fresh HTTP
client session then call `process_request_with_session`, preserving the
previous behavior.
Prior to this commit, a new `aiohttp.ClientSession` is created for each
request. Not only is this inefficient and lowers throughput, on certain
environments, it also leads to inotify watch issues:
aiodns - WARNING - Failed to create DNS resolver channel with
automatic monitoring of resolver configuration changes. This usually
means the system ran out of inotify watches. Falling back to socket
state callback. Consider increasing the system inotify watch limit:
Failed to initialize c-ares channel
Indeed, because each DNS resolver is created for a new `ClientSession`,
creating tons of new `ClientSession`s causes eventual inotify watch
exhaustion. Sharing `ClientSession`s solves this issue.
Relevant links:
- https://docs.aiohttp.org/en/stable/http_request_lifecycle.html
-
https://stackoverflow.com/questions/62707369/one-aiohttp-clientsession-per-thread
-
home-assistant/core#144457 (comment)
Relevant PR: #247
(doesn't address the issue of worker sharing).
|
@LukeAVanDrie any updates on this PR? |
Reuses aiohttp.ClientSession across requests in openAIModelServerClient to reduce connection overhead. This change improves client-side throughput and latency.
Additional improvements: