feat(gateway): make vLLM sampler requests asynchronous and rate-limited by droot · Pull Request #119 · gke-labs/open-rl

droot · 2026-06-10T19:20:57Z

Dispatch vLLM token generation requests to a background task instead of blocking the FastAPI handler, aligning its async behavior with the Torch sampler backend.
Introduce VLLM_CONCURRENCY_LIMIT (default 512) and _vllm_semaphore to prevent socket/file-descriptor exhaustion and connection drop errors under heavy surges.
Maintain a global _background_tasks set to hold strong references to running background tasks and prevent premature garbage collection.

droot · 2026-06-10T19:25:33Z

Pl. do not merge it. I haven't tested it yet.

- Dispatch vLLM token generation requests to a background task instead of blocking the FastAPI handler, aligning its async behavior with the Torch sampler backend. - Introduce VLLM_CONCURRENCY_LIMIT (default 512) and _vllm_semaphore to prevent socket/file-descriptor exhaustion and connection drop errors under heavy surges. - Maintain a global _background_tasks set to hold strong references to running background tasks and prevent premature garbage collection.

droot force-pushed the feature/async-vllm-sampling branch from 4f7d79e to 7c1a533 Compare June 11, 2026 15:00

droot closed this Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(gateway): make vLLM sampler requests asynchronous and rate-limited#119

feat(gateway): make vLLM sampler requests asynchronous and rate-limited#119
droot wants to merge 1 commit into
gke-labs:mainfrom
droot:feature/async-vllm-sampling

droot commented Jun 10, 2026

Uh oh!

droot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

droot commented Jun 10, 2026

Uh oh!

droot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant