Skip to content

[Docs] Add API server tuning guide #5176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

[Docs] Add API server tuning guide #5176

wants to merge 8 commits into from

Conversation

aylei
Copy link
Collaborator

@aylei aylei commented Apr 10, 2025

Setups tested:

  • 4c8g (our internal API server also use this setup)
  • 8c16g
  • 16c32g
  • 128c256g

The concurrency of other resource level is calculated based on our code. All the setups not seeing issue with 1000 concurrent jobs launch requests (10 clients x 100 jobs launch per client) and 10 * 10K status requests.

128c256g encountered the ulimit issue due to high executor numbers #5174, addressed after tuning, this should be fixed in another PR so we don't have to mention this in our doc here.

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

Queuing requests and polling status asynchronously
--------------------------------------------------

There is no limit on the number of queued requests. So in addition to increasing the allocated resources to improve the maximum concurrency, you can also submit requests with ``--async`` flag and poll the status asynchronously to avoid blocking. For example:
Copy link
Collaborator Author

@aylei aylei Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queue length is verified in #5175


sky api cancel <requst_id>

Avoid concurrent logs requests
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this chapter after log optimization completed

@aylei aylei marked this pull request as ready for review April 11, 2025 10:55
@Michaelvll Michaelvll removed the request for review from romilbhardwaj April 17, 2025 02:17
@Michaelvll
Copy link
Collaborator

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aylei! I like the doc. Left some comments.

Comment on lines +47 to +49
limits:
cpu: "4"
memory: "8Gi"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to set it, just want to reduce the chance that API server is killed by k8s due to slight out of memory?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explained in the latter dropdown

d9c95c7e-d248-4a7f-b72e-636511405357 alice sky.jobs.launch a few secs ago PENDING
767182fd-0202-4ae5-b2d7-ddfabea5c821 alice sky.jobs.launch a few secs ago PENDING
5667cff2-e953-4b80-9e5f-546cea83dc59 alice sky.jobs.launch a few secs ago RUNNING

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a title: Check logs for a request

$ sky api logs <request_id>

If the request is stuck according to the log, e.g. retrying to launch VMs that is out of stock, you can cancel the request with:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a title: Cancel a request

aylei and others added 3 commits April 18, 2025 16:28
Signed-off-by: Aylei <[email protected]>
Signed-off-by: Aylei <[email protected]>
@aylei aylei requested a review from Michaelvll April 18, 2025 10:16
Signed-off-by: Aylei <[email protected]>
@aylei
Copy link
Collaborator Author

aylei commented Apr 21, 2025

@Michaelvll @concretevitamin ping for another look, thanks!

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aylei, some comments.

Comment on lines +13 to +14
* ``Long-running request``: request that takes long time and more resources to run, including ``launch``, ``exec``, ``jobs.launch``, etc.
* ``Short-running request``: request that takes short time or less resources to run, including ``status``, ``logs``, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* ``Long-running request``: request that takes long time and more resources to run, including ``launch``, ``exec``, ``jobs.launch``, etc.
* ``Short-running request``: request that takes short time or less resources to run, including ``status``, ``logs``, etc.
* ``Long-running requests``: requests that take longer time and more resources to run, including ``launch``, ``exec``, ``jobs.launch``, etc.
* ``Short-running requests``: requests that take shorter time or less resources to run, including ``status``, ``logs``, etc.

Comment on lines +16 to +18
.. note::

Though a task (or job) can run for any length of time, concurrent tasks does not occupy the concurrency. Because once a task is submitted to the cluster, it will be detached and no longer takes any resources off the API server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rephrase this somewhat, and pull it out of note box. How about something like:

Requests are queued and processed by the API server. Therefore, they only take resources off the API server when they are in queue or being processed. Once requests are processed and remote clusters start doing real work, they no longer require API server's resources or count against its concurrency limit.

For example, long-running requests for launch and exec no longer take resources off the API server once a cluster has been provisioned, or once a job has been submitted to a cluster, respectively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check accuracy.


.. note::

If you specify a resources that is lower than the minimum recommended resources (4 CPUs with 8GB of memory) for team usage, an error will be raised on ``helm upgrade``. You can specify ``--set apiService.skipResourcesCheck=true`` to skip the check if performance and stability is not an issue for you scenario.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing that " (4 CPUs with 8GB of memory)" is mentioned as our rec settings, but snippet above doesn't reflect this?

We should mention the rec setting in a 1-sentence paragraph.

Queuing requests and polling status asynchronously
--------------------------------------------------

There is no limit on the number of queued requests, i.e. despite increasing the allocated resources to improve the maximum concurrency, you can also submit requests with :ref:`async<async>` (``--async``) and poll the status asynchronously to avoid blocking. For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There is no limit on the number of queued requests, i.e. despite increasing the allocated resources to improve the maximum concurrency, you can also submit requests with :ref:`async<async>` (``--async``) and poll the status asynchronously to avoid blocking. For example:
There is no limit on the number of queued requests. To avoid request blocking, you can either (1) allocate more resources to increase the maximum concurrency (described above), or (2) :ref:`submit requests asynchronously <async>` (``--async``) and poll the status asynchronously.
For example:

Do we mean this?

I still find this revised section & the sec title confusing. What do we really want to say in this section? Lmk and I can try to rephrase.

It feels like "Use asynchronous requests as much as possible"?

767182fd-0202-4ae5-b2d7-ddfabea5c821 alice sky.jobs.launch a few secs ago PENDING
5667cff2-e953-4b80-9e5f-546cea83dc59 alice sky.jobs.launch a few secs ago RUNNING

Check logs for a request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Check logs for a request
Checking the logs of a request

# Replace <request_id> with the actual request id from the ID column
$ sky api logs <request_id>

Cancel a request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Cancel a request
Canceling a request

Avoid concurrent logs requests
------------------------------

If you run ``sky logs`` to tail the logs of a task, the log tailing will keep taking off the resources of the API server as long as the task being tailed is running. So concurrent log requests will occupy the concurrency and make other requests to be delayed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you run ``sky logs`` to tail the logs of a task, the log tailing will keep taking off the resources of the API server as long as the task being tailed is running. So concurrent log requests will occupy the concurrency and make other requests to be delayed.
If you run ``sky logs`` to tail the logs of a task, the log tailing will keep taking resources off the API server as long as the task being tailed is still running. Thus, concurrent log requests will occupy the concurrency limit and potentially delay other requests.

Avoid concurrent logs requests
------------------------------

If you run ``sky logs`` to tail the logs of a task, the log tailing will keep taking off the resources of the API server as long as the task being tailed is running. So concurrent log requests will occupy the concurrency and make other requests to be delayed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this section, both 'task' and 'job' are used; can we keep one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants