[APIServer] Dynamic default values for workers and max-concurrency based on platform by Copilot · Pull Request #7497 · PaddlePaddle/FastDeploy

Copilot · 2026-04-20T03:04:03Z

Motivation

--workers and --max-concurrency were hardcoded to 1 and 512 respectively. On NVIDIA GPUs with large max_num_seqs, a single worker becomes a bottleneck. Defaults should scale with the workload on capable hardware.

Modifications

fastdeploy/entrypoints/openai/utils.py:
- Changed --workers and --max-concurrency defaults to None
- Added resolve_workers_and_concurrency(args) that resolves defaults based on platform:
  - NVIDIA GPU (CUDA): workers = ceil(max_num_seqs / 64), max_concurrency = workers * 512
  - Other platforms: workers = 1, max_concurrency = workers * 512
- User-provided values are never overridden (only None is resolved)
fastdeploy/entrypoints/openai/api_server.py:
- Calls resolve_workers_and_concurrency(args) after parse_args(), before any usage
- Added max_concurrency to startup logging

Usage or Command

Behavior is automatic. Explicitly passing --workers or --max-concurrency still overrides the computed defaults:

# Auto-resolved based on platform and max_num_seqs
fastdeploy serve --model my_model

# Explicit override still works
fastdeploy serve --model my_model --workers 4 --max-concurrency 2048

Accuracy Tests

No model output changes — this only affects API server worker/concurrency configuration.

Checklist

Add at least a tag in the PR title.
Format your code, run pre-commit before commit.
Add unit tests. No unit tests added — logic depends on current_platform.is_cuda() which requires GPU hardware. The resolution function is straightforward and exercised on every server startup.
Provide accuracy results. N/A — no model output changes.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

… platform Change workers and max-concurrency defaults from hardcoded values (1 and 512) to None, then resolve them dynamically: - NVIDIA GPU (CUDA): workers = ceil(max_num_seqs / 64), max_concurrency = workers * 512 - Other platforms: workers = 1, max_concurrency = workers * 512 Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/b62082bc-b3bd-4495-ba1b-2d4cbfb8cf24

Address code review feedback: - Add safe fallback for max_num_seqs using getattr with default value 8 - Add log output for resolved max_concurrency value Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/b62082bc-b3bd-4495-ba1b-2d4cbfb8cf24

CLAassistant · 2026-04-20T03:04:09Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

paddle-bot · 2026-04-20T03:04:16Z

Thanks for your contribution!

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-20 11:18 CST

📋 Review 摘要

PR 概述：根据平台动态计算 --workers 和 --max-concurrency 的默认值，NVIDIA GPU 上按 max_num_seqs 自动伸缩 worker 数
变更范围：entrypoints/openai/ (api_server.py, utils.py)
影响面 Tag：APIServer

问题

级别	文件	概述
🟡 建议	`utils.py:398`	魔法数字 64/512 缺乏解释，建议抽取为命名常量
🟡 建议	`utils.py:400`	`max_concurrency` 计算逻辑在两个分支中重复，可简化
❓ 疑问	`utils.py:394`	`or 8` vs `if is None` 语义差异

总体评价

整体实现逻辑清晰，用户显式传值不会被覆盖的设计合理。主要建议是将魔法数字抽取为常量并简化重复的分支逻辑，提高可维护性。

PaddlePaddle-bot · 2026-04-20T03:18:13Z

+
+    if current_platform.is_cuda():
+        if args.workers is None:
+            args.workers = math.ceil(max_num_seqs / 64)


🟡 建议 魔法数字 64 和 512 缺乏解释

这里的 64 和 512 是关键的调优参数，但没有注释说明为什么选择这些值。建议抽取为命名常量并添加注释说明选择依据，方便后续维护和调优：

# 每个 worker 处理的最大序列数（经验值，平衡 worker 数与负载） _SEQS_PER_WORKER = 64 # 每个 worker 的并发连接上限 _CONCURRENCY_PER_WORKER = 512

PaddlePaddle-bot · 2026-04-20T03:18:13Z

+        if args.workers is None:
+            args.workers = math.ceil(max_num_seqs / 64)
+        if args.max_concurrency is None:
+            args.max_concurrency = args.workers * 512


🟡 建议 max_concurrency 计算逻辑在两个分支中完全相同，可以简化

CUDA 和非 CUDA 分支中 max_concurrency 的计算逻辑是相同的（workers * 512），可以提取到 if/else 之后，减少代码重复：

if current_platform.is_cuda(): if args.workers is None: args.workers = math.ceil(max_num_seqs / 64) else: if args.workers is None: args.workers = 1 if args.max_concurrency is None: args.max_concurrency = args.workers * 512

PaddlePaddle-bot · 2026-04-20T03:18:13Z

+    """
+    from fastdeploy.platforms import current_platform
+
+    max_num_seqs = getattr(args, "max_num_seqs", None) or 8


❓ 疑问 使用 or 8 而非 if is None 的意图

getattr(args, "max_num_seqs", None) or 8 会在 max_num_seqs 为 0 或其他 falsy 值时也 fallback 到 8。虽然 max_num_seqs=0 在实践中不太可能出现，但如果只是想处理 None 的情况，使用显式的 if ... is None 更清晰：

max_num_seqs = getattr(args, "max_num_seqs", None) if max_num_seqs is None: max_num_seqs = 8

Copilot AI added 2 commits April 20, 2026 02:47

Copilot AI assigned Copilot and Jiang-Jia-Jun Apr 20, 2026

Copilot AI review requested due to automatic review settings April 20, 2026 03:04

Copilot created this pull request from a session on behalf of Jiang-Jia-Jun April 20, 2026 03:04 View session

Copilot AI review requested due to automatic review settings April 20, 2026 03:04

paddle-bot bot added the contributor External developers label Apr 20, 2026

PaddlePaddle-bot reviewed Apr 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[APIServer] Dynamic default values for workers and max-concurrency based on platform#7497

[APIServer] Dynamic default values for workers and max-concurrency based on platform#7497
Copilot wants to merge 2 commits intodevelopfrom
copilot/update-default-values-workers-max-concurrency

Copilot AI commented Apr 20, 2026

Uh oh!

CLAassistant commented Apr 20, 2026

Uh oh!

paddle-bot bot commented Apr 20, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Copilot AI commented Apr 20, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

CLAassistant commented Apr 20, 2026

Uh oh!

paddle-bot bot commented Apr 20, 2026

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

总体评价

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants