Add max-model-len param for vLLM #502

oyilmaz-nvidia · 2025-11-04T18:11:48Z

No description provided.

Signed-off-by: Onur Yilmaz <[email protected]>

copy-pr-bot · 2025-11-04T18:11:51Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

oyilmaz-nvidia · 2025-11-04T18:18:51Z

/ok to test 6ffc9b3

athitten · 2025-11-04T18:27:54Z

nemo_export/vllm_exporter.py

        cpu_offload_gb: float = 0,
        enforce_eager: bool = False,
        max_seq_len_to_capture: int = 8192,
+        max_model_len: int = 8192,


Thank you @oyilmaz-nvidia . For my understanding, why is this param needed now ? Is it new introduced by vLLM

We really didn't need to set the parameter up until now but some of the large models like llama 70B might need some tuning to fit model into the GPUs. And CI is giving error now to fit this model (it was working before but with the newer versions of vllm, we might need to tune it).

athitten

Thank you @oyilmaz-nvidia!

Signed-off-by: Onur Yilmaz <[email protected]>

oyilmaz-nvidia · 2025-11-05T19:49:13Z

/ok to test 4d16afa

Add max-model-len param for vLLM

6ffc9b3

Signed-off-by: Onur Yilmaz <[email protected]>

oyilmaz-nvidia requested review from athitten and pthombre as code owners November 4, 2025 18:11

github-actions bot added deploy LLM scripts export vLLM labels Nov 4, 2025

oyilmaz-nvidia added the r0.3.0 r0.3.0 label Nov 4, 2025

copy-pr-bot bot temporarily deployed to nemo-ci November 4, 2025 18:19 Inactive

copy-pr-bot bot temporarily deployed to test November 4, 2025 18:19 Inactive

athitten reviewed Nov 4, 2025

View reviewed changes

copy-pr-bot bot had a problem deploying to nemo-ci November 4, 2025 18:33 Failure

copy-pr-bot bot temporarily deployed to nemo-ci November 4, 2025 18:33 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 4, 2025 19:49 Inactive

athitten approved these changes Nov 4, 2025

View reviewed changes

oyilmaz-nvidia and others added 2 commits November 5, 2025 14:31

Merge branch 'main' into onur/add_max_model_len

6326dd2

Fix minor test

4d16afa

Signed-off-by: Onur Yilmaz <[email protected]>

github-actions bot added the tests label Nov 5, 2025

copy-pr-bot bot temporarily deployed to nemo-ci November 5, 2025 19:49 Inactive

copy-pr-bot bot temporarily deployed to test November 5, 2025 19:49 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 5, 2025 19:55 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 5, 2025 20:29 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci November 5, 2025 20:30 Failure

oyilmaz-nvidia enabled auto-merge (squash) November 5, 2025 20:56

copy-pr-bot bot had a problem deploying to nemo-ci November 5, 2025 22:15 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add max-model-len param for vLLM #502

Add max-model-len param for vLLM #502

Uh oh!

oyilmaz-nvidia commented Nov 4, 2025

Uh oh!

copy-pr-bot bot commented Nov 4, 2025

Uh oh!

oyilmaz-nvidia commented Nov 4, 2025

Uh oh!

athitten Nov 4, 2025

Uh oh!

oyilmaz-nvidia Nov 4, 2025

Uh oh!

athitten left a comment

Uh oh!

oyilmaz-nvidia commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add max-model-len param for vLLM #502

Are you sure you want to change the base?

Add max-model-len param for vLLM #502

Uh oh!

Conversation

oyilmaz-nvidia commented Nov 4, 2025

Uh oh!

copy-pr-bot bot commented Nov 4, 2025

Uh oh!

oyilmaz-nvidia commented Nov 4, 2025

Uh oh!

athitten Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

oyilmaz-nvidia Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

athitten left a comment

Choose a reason for hiding this comment

Uh oh!

oyilmaz-nvidia commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants