Parallel warmup when using multiple GPUs

**Is your feature request related to a problem? Please describe.**
Not sure if this is specific to the onnx backend.
When creating `model_warmup { .... }` entries in config.pbtxt, and the system has two GPUs,
Triton will run `ModelInitialize` for each GPU, and the warmup will run serially - it will first run warmup requests on the first GPU, then after all done it will run on the second GPU, etc.

**Describe the solution you'd like**
I'd like the warmup requests to run on all the GPUs in parallel, to speed up model startup time. Otherwise startup time is quite slow.

**Describe alternatives you've considered**
I could manually warm up the model, but I cannot see how to place a request on a specific GPU.

**Additional context**



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel warmup when using multiple GPUs #292

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parallel warmup when using multiple GPUs #292

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions