π The feature, motivation and pitch
AD surpasses PT at WS=1 across all concurrencies β 18% faster at c=1, near-parity (1%) at c=64.
AD fails to scale at WS=4: PT leads by 20β35%. The gap is driven by poor WS=1β4 scaling on the AD side: PT scales 1.52Γ from WS=1 to WS=4 at c=1 (4.67msβ3.07ms), while AD only gains 1.04Γ (3.82msβ3.67ms).
Investigate and resolve the poor scaling on AutoDeploy side.
Baseline branch: nv-auto-deploy:gagam/super-mtp-perf-2-replay (see #13725)
Alternatives
No response
Additional context
See SuperV3 MTP ticket #12359
Scripts, configs and experiment data:
https://gitlab-master.nvidia.com/ghubaraagam/agent-reports/-/tree/main/260428_superv3_mtp?ref_type=heads
Before submitting a new issue...
π The feature, motivation and pitch
AD surpasses PT at WS=1 across all concurrencies β 18% faster at c=1, near-parity (1%) at c=64.
AD fails to scale at WS=4: PT leads by 20β35%. The gap is driven by poor WS=1β4 scaling on the AD side: PT scales 1.52Γ from WS=1 to WS=4 at c=1 (4.67msβ3.07ms), while AD only gains 1.04Γ (3.82msβ3.67ms).
Investigate and resolve the poor scaling on AutoDeploy side.
Baseline branch:
nv-auto-deploy:gagam/super-mtp-perf-2-replay(see #13725)Alternatives
No response
Additional context
See SuperV3 MTP ticket #12359
Scripts, configs and experiment data:
https://gitlab-master.nvidia.com/ghubaraagam/agent-reports/-/tree/main/260428_superv3_mtp?ref_type=heads
Before submitting a new issue...