Skip to content

Conversation

@jikunshang
Copy link
Collaborator

@jikunshang jikunshang commented Dec 15, 2025

Purpose

vLLM is a framework support multi hardware backend. while there are some torch.cuda hard call. this is unfriendly to non-cuda compatible device. fortunately, there is a new set of torch.accelerator API which can dispatch based on platform.
I will try to create a series of PR to address this issue, start from empty_cache API.

Test Plan

CI.

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@mergify
Copy link

mergify bot commented Dec 15, 2025

Documentation preview: https://vllm--30681.org.readthedocs.build/en/30681/

@mergify mergify bot added documentation Improvements or additions to documentation nvidia v1 labels Dec 15, 2025
@jikunshang jikunshang marked this pull request as draft December 15, 2025 08:25
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces direct calls to torch.cuda.empty_cache() with the more hardware-agnostic torch.accelerator.empty_cache(). This is a good step towards making vLLM compatible with non-CUDA devices. The changes are applied consistently across the codebase, including in examples, utility functions, and core model execution logic. A new pre-commit hook is also added to prevent future usage of torch.cuda.empty_cache. My review focuses on the implementation of this new pre-commit hook, where I've found a couple of critical issues that would prevent it from working as intended. I've provided suggestions to fix them. The rest of the changes look good.

Comment on lines 14 to 18
ALLOWED_FILES = {"tests/", "benchmarks/", "vllm/platforms/*"}


def is_allowed_file(current_file: str) -> bool:
return current_file in ALLOWED_FILES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The check current_file in ALLOWED_FILES performs an exact match, but ALLOWED_FILES contains directory prefixes. For example, a file tests/models/test_llama.py will not be matched against tests/. This will cause the pre-commit hook to incorrectly flag files that should be allowed. You should use startswith to check if the file path is under one of the allowed directories. Also, vllm/platforms/* seems to intend to match all files in the directory, so it should probably be vllm/platforms/.

Suggested change
ALLOWED_FILES = {"tests/", "benchmarks/", "vllm/platforms/*"}
def is_allowed_file(current_file: str) -> bool:
return current_file in ALLOWED_FILES
ALLOWED_FILES = {"tests/", "benchmarks/", "vllm/platforms/"}
def is_allowed_file(current_file: str) -> bool:
return any(current_file.startswith(p) for p in ALLOWED_FILES)

Comment on lines 21 to 23
def is_forbidden_torch_cuda_api(line: str) -> bool:
stripped = line.strip()
return bool(_TORCH_CUDA_RE.match(stripped))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

re.match only checks for a match at the beginning of the string. This will fail to detect forbidden API calls that are not at the start of a line (after stripping whitespace), for example x = torch.cuda.empty_cache(). You should use re.search to find a match anywhere in the line.

Suggested change
def is_forbidden_torch_cuda_api(line: str) -> bool:
stripped = line.strip()
return bool(_TORCH_CUDA_RE.match(stripped))
def is_forbidden_torch_cuda_api(line: str) -> bool:
return bool(_TORCH_CUDA_RE.search(line))

entry: python tools/pre_commit/check_torch_cuda.py
language: python
types: [python]
pass_filenames: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pass file names, otherwise this will run on all files every commit

Suggested change
pass_filenames: false

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hook doesn't need to work out which files it should run on, pre-commit already does that for us. Please look at tools/pre_commit/check_pickle_imports.py to see how this should be done

@github-project-automation github-project-automation bot moved this to In review in NVIDIA Dec 15, 2025
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation nvidia v1

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

2 participants