Skip to content

Conversation

@oyilmaz-nvidia
Copy link
Contributor

No description provided.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 4, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@oyilmaz-nvidia
Copy link
Contributor Author

/ok to test 6ffc9b3

cpu_offload_gb: float = 0,
enforce_eager: bool = False,
max_seq_len_to_capture: int = 8192,
max_model_len: int = 8192,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @oyilmaz-nvidia . For my understanding, why is this param needed now ? Is it new introduced by vLLM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really didn't need to set the parameter up until now but some of the large models like llama 70B might need some tuning to fit model into the GPUs. And CI is giving error now to fit this model (it was working before but with the newer versions of vllm, we might need to tune it).

Copy link
Contributor

@athitten athitten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @oyilmaz-nvidia!

@github-actions github-actions bot added the tests label Nov 5, 2025
@oyilmaz-nvidia
Copy link
Contributor Author

/ok to test 4d16afa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants