[Docs] Add API server tuning guide #5176

aylei · 2025-04-10T16:25:41Z

Setups tested:

4c8g (our internal API server also use this setup)
8c16g
16c32g
128c256g

The concurrency of other resource level is calculated based on our code. All the setups not seeing issue with 1000 concurrent jobs launch requests (10 clients x 100 jobs launch per client) and 10 * 10K status requests.

128c256g encountered the ulimit issue due to high executor numbers #5174, addressed after tuning, this should be fixed in another PR so we don't have to mention this in our doc here.

Tested (run the relevant ones):

Code formatting: install pre-commit (auto-check on commit) or bash format.sh
Any manual or new tests for this PR (please specify below)
All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

Signed-off-by: Aylei <[email protected]>

aylei · 2025-04-11T10:54:42Z

docs/source/reference/api-server/api-server-tunning.rst

+Queuing requests and polling status asynchronously
+--------------------------------------------------
+
+There is no limit on the number of queued requests. So in addition to increasing the allocated resources to improve the maximum concurrency, you can also submit requests with ``--async`` flag and poll the status asynchronously to avoid blocking. For example:


The queue length is verified in #5175

aylei · 2025-04-11T10:55:03Z

docs/source/reference/api-server/api-server-tunning.rst

+
+    sky api cancel <requst_id>
+
+Avoid concurrent logs requests


We can remove this chapter after log optimization completed

Michaelvll · 2025-04-17T05:48:58Z

Preview: https://docs.skypilot.co/en/apiserver-tuning/docs/index.html

Michaelvll

Thanks @aylei! I like the doc. Left some comments.

charts/skypilot/values.yaml

docs/source/reference/api-server/api-server.rst

docs/source/reference/api-server/api-server-tunning.rst

Co-authored-by: Zhanghao Wu <[email protected]>

Signed-off-by: Aylei <[email protected]>

aylei · 2025-04-21T09:06:35Z

@Michaelvll @concretevitamin ping for another look, thanks!

concretevitamin

Thanks @aylei, some comments.

docs/source/reference/api-server/api-server-tunning.rst

Co-authored-by: Zongheng Yang <[email protected]>

concretevitamin

Reminder on the few open items before merging ;)

Signed-off-by: Aylei <[email protected]>

aylei · 2025-04-27T14:12:28Z

Reminder on the few open items before merging ;)

Thanks! Dived into log optimization these days, just fixed all the issues we've discussed, merging.

aylei added 4 commits April 10, 2025 23:58

API server tuning guide

4af9cc8

Signed-off-by: Aylei <[email protected]>

Refinement

7f1b770

Signed-off-by: Aylei <[email protected]>

Update QoS explaination

519faf3

Signed-off-by: Aylei <[email protected]>

Update QoS explaination

e19f3bf

Signed-off-by: Aylei <[email protected]>

aylei requested review from romilbhardwaj, Michaelvll and concretevitamin April 11, 2025 10:48

aylei commented Apr 11, 2025

View reviewed changes

aylei marked this pull request as ready for review April 11, 2025 10:55

Michaelvll removed the request for review from romilbhardwaj April 17, 2025 02:17

Michaelvll approved these changes Apr 17, 2025

View reviewed changes

aylei and others added 3 commits April 18, 2025 16:28

Apply suggestions from code review

3c3cdfc

Co-authored-by: Zhanghao Wu <[email protected]>

Update docs

eb56c7a

Signed-off-by: Aylei <[email protected]>

Refine table look

68a4678

Signed-off-by: Aylei <[email protected]>

aylei requested a review from Michaelvll April 18, 2025 10:16

Fix merge conflicts

fce9775

Signed-off-by: Aylei <[email protected]>

concretevitamin reviewed Apr 23, 2025

View reviewed changes

Apply suggestions from code review

ae56c9f

Co-authored-by: Zongheng Yang <[email protected]>

concretevitamin approved these changes Apr 25, 2025

View reviewed changes

Address review comments

57fe1e5

Signed-off-by: Aylei <[email protected]>

aylei merged commit 1bfa89c into master Apr 27, 2025
22 checks passed

aylei deleted the apiserver-tuning branch April 27, 2025 14:12

[Docs] Add API server tuning guide #5176

[Docs] Add API server tuning guide #5176

Uh oh!

Conversation

aylei commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aylei Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aylei Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

Michaelvll commented Apr 17, 2025

Uh oh!

Michaelvll left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aylei commented Apr 21, 2025

Uh oh!

concretevitamin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

concretevitamin left a comment

Choose a reason for hiding this comment

Uh oh!

aylei commented Apr 27, 2025

Uh oh!

Uh oh!

Uh oh!

aylei commented Apr 10, 2025 •

edited

Loading

aylei Apr 11, 2025 •

edited

Loading