Skip to content

[AutoDeploy]: Pre-merge L0 cost-reduction #14173

@galagam

Description

@galagam

🚀 The feature, motivation and pitch

  1. Move perf tests to post-merge. Currently perf/test_perf.py::test_perf[deepseek_r1_distill_qwen_32b-bench-_autodeploy-float16-kv_frac:0.8-input_output_len:1024,1024-reqs:512] is in pre-merge and accounts for a bug chunk of our budget. Consider if checkpoint is one of the crucial perf tests, or should we focus on models we care more about.
  2. Reduce test variants, specifically in Nemotron tests. Focus on testing the model registry config in pre-merge, and move other variants to post-merge.

Full report link: https://gitlab-master.nvidia.com/junq/my-notes/-/blob/main/l0-pre-merge-4wk-2026-05-13/l0-pre-merge-duration-analysis.md

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

AutoDeploy<NV> AutoDeploy Backendfeature requestNew feature or request. This includes new model, dtype, functionality support

Type

No type
No fields configured for issues without a type.

Projects

Status

In review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions