-
-
Notifications
You must be signed in to change notification settings - Fork 12k
[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache
#30681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Documentation preview: https://vllm--30681.org.readthedocs.build/en/30681/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request replaces direct calls to torch.cuda.empty_cache() with the more hardware-agnostic torch.accelerator.empty_cache(). This is a good step towards making vLLM compatible with non-CUDA devices. The changes are applied consistently across the codebase, including in examples, utility functions, and core model execution logic. A new pre-commit hook is also added to prevent future usage of torch.cuda.empty_cache. My review focuses on the implementation of this new pre-commit hook, where I've found a couple of critical issues that would prevent it from working as intended. I've provided suggestions to fix them. The rest of the changes look good.
tools/pre_commit/check_torch_cuda.py
Outdated
| ALLOWED_FILES = {"tests/", "benchmarks/", "vllm/platforms/*"} | ||
|
|
||
|
|
||
| def is_allowed_file(current_file: str) -> bool: | ||
| return current_file in ALLOWED_FILES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check current_file in ALLOWED_FILES performs an exact match, but ALLOWED_FILES contains directory prefixes. For example, a file tests/models/test_llama.py will not be matched against tests/. This will cause the pre-commit hook to incorrectly flag files that should be allowed. You should use startswith to check if the file path is under one of the allowed directories. Also, vllm/platforms/* seems to intend to match all files in the directory, so it should probably be vllm/platforms/.
| ALLOWED_FILES = {"tests/", "benchmarks/", "vllm/platforms/*"} | |
| def is_allowed_file(current_file: str) -> bool: | |
| return current_file in ALLOWED_FILES | |
| ALLOWED_FILES = {"tests/", "benchmarks/", "vllm/platforms/"} | |
| def is_allowed_file(current_file: str) -> bool: | |
| return any(current_file.startswith(p) for p in ALLOWED_FILES) |
tools/pre_commit/check_torch_cuda.py
Outdated
| def is_forbidden_torch_cuda_api(line: str) -> bool: | ||
| stripped = line.strip() | ||
| return bool(_TORCH_CUDA_RE.match(stripped)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re.match only checks for a match at the beginning of the string. This will fail to detect forbidden API calls that are not at the start of a line (after stripping whitespace), for example x = torch.cuda.empty_cache(). You should use re.search to find a match anywhere in the line.
| def is_forbidden_torch_cuda_api(line: str) -> bool: | |
| stripped = line.strip() | |
| return bool(_TORCH_CUDA_RE.match(stripped)) | |
| def is_forbidden_torch_cuda_api(line: str) -> bool: | |
| return bool(_TORCH_CUDA_RE.search(line)) |
.pre-commit-config.yaml
Outdated
| entry: python tools/pre_commit/check_torch_cuda.py | ||
| language: python | ||
| types: [python] | ||
| pass_filenames: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should pass file names, otherwise this will run on all files every commit
| pass_filenames: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hook doesn't need to work out which files it should run on, pre-commit already does that for us. Please look at tools/pre_commit/check_pickle_imports.py to see how this should be done
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Purpose
vLLM is a framework support multi hardware backend. while there are some torch.cuda hard call. this is unfriendly to non-cuda compatible device. fortunately, there is a new set of
torch.acceleratorAPI which can dispatch based on platform.I will try to create a series of PR to address this issue, start from
empty_cacheAPI.Test Plan
CI.
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.