[Question] Help with Client-Side Batching for Large Requests in Triton


I’m currently facing an issue with handling requests where the request batch size is greater than the model’s max_batch_size hosted in Triton. The [chunking guide](https://triton-inference-server.github.io/pytriton/latest/guides/chunking/) for PyTriton suggests it’s possible to address this, but I’m not sure how to implement it using triton client.

### Related Open Issues
- Installing `pytriton` includes Triton binaries, which I don’t need for client-side operations. I found [this issue](https://github.com/triton-inference-server/pytriton/issues/62) where others have mentioned the lack of a lightweight `pytriton.client` package. Any updates on this?

- There’s an ongoing discussion in [Triton server issue #4547](https://github.com/triton-inference-server/server/issues/4547) about handling large requests, but there haven’t been updates there either.

### Questions
1. How can I handle requests where the batch size exceeds the model’s `max_batch_size`? Specifically, I’d like to know how to split these large requests efficiently and send them to Triton in smaller batches.

2. Could you provide a minimal working example using TritonClient?
   - I’ve seen the PyTriton example, which includes asynchronous support, but I’m looking for something similar with TritonClient.
   - If possible, an example using `concurrent.futures` or async functionality would be very helpful.

3. Is there a plan to release a standalone `pytriton.client` package to avoid installing the full `pytriton`? Alternatively, is there a plan to include this batch splitting logic in Triton server itself?

Thanks in advance!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Help with Client-Side Batching for Large Requests in Triton #818

Related Open Issues

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Help with Client-Side Batching for Large Requests in Triton #818

Description

Related Open Issues

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions