Feature Request: Stagger Parallel Vision Requests for Better Resource Utilization

When multiple parallel vision requests run simultaneously, they all hit the same processing phase at the same time - image encoding, image decoding, then text inference. This causes resource contention rather than efficient pipelining.

Current Behavior:

Parallel requests execute in lockstep through each phase
Image encoding, image decoding, and text inference all happen simultaneously across requests
Resources are overwhelmed at each phase rather than utilized continuously
Expected Behavior:

Parallel requests should be staggered so phases overlap
When one request is in image encoding, another could be in image decoding, another in token generation
Continuous utilization of both CPU and GPU across all parallel slots


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Stagger Parallel Vision Requests for Better Resource Utilization #1654

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Stagger Parallel Vision Requests for Better Resource Utilization #1654

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions