[Performance]: VLLM with DP performing worst

### Name of failing test

examples/offline_inference/data_parallel.py

### Basic information

- [ ] Flaky test
- [x] Can reproduce locally
- [ ] Caused by external libraries (e.g. bug in `transformers`)

### 🧪 Describe the failing test

I have tested DP feature with 4 x A100 card. I observed that vllm with DP 4 and api-server-count = 4 performs poor as compare to 4 x VLLM instances with 1 GPU each . 

### 📝 History of failing test

NA

### CC List.

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Performance]: VLLM with DP performing worst #30655

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

CC List.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Performance]: VLLM with DP performing worst #30655

Description

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

CC List.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions