Skip to content

[AutoDeploy]: Investigate SuperV3 with MTP not scaling wellΒ #14225

@galagam

Description

@galagam

πŸš€ The feature, motivation and pitch

AD surpasses PT at WS=1 across all concurrencies β€” 18% faster at c=1, near-parity (1%) at c=64.
AD fails to scale at WS=4: PT leads by 20–35%. The gap is driven by poor WS=1β†’4 scaling on the AD side: PT scales 1.52Γ— from WS=1 to WS=4 at c=1 (4.67msβ†’3.07ms), while AD only gains 1.04Γ— (3.82msβ†’3.67ms).

Investigate and resolve the poor scaling on AutoDeploy side.
Baseline branch: nv-auto-deploy:gagam/super-mtp-perf-2-replay (see #13725)

Alternatives

No response

Additional context

See SuperV3 MTP ticket #12359
Scripts, configs and experiment data:
https://gitlab-master.nvidia.com/ghubaraagam/agent-reports/-/tree/main/260428_superv3_mtp?ref_type=heads

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

AutoDeploy<NV> AutoDeploy Backendfeature requestNew feature or request. This includes new model, dtype, functionality support

Type

No type
No fields configured for issues without a type.

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions