-
-
Notifications
You must be signed in to change notification settings - Fork 12k
Open
Labels
Description
Motivation.
vLLM is a framework support multi hardware backend. while there are some torch.cuda hard code call. this is unfriendly to non-cuda compatible device. Fortunately, there is a new set of torch.accelerator API in pytorch which can dispatch based on platform.
Meanwhile, we should add some lint tool to avoid add more torch.cuda call in new added code.
Proposed Change.
torch accelerator API status on torch-2.9.0:
| cuda API name | unified torch API name | torch API Status | vLLM replace status |
|---|---|---|---|
| torch.cuda.Event | torch.Event | #26985 | |
| torch.cuda.Stream | torch.Stream | ||
| torch.cuda.device_count | torch.accelerator.device_count | ||
| torch.cuda.is_available | torch.accelerator.is_available | ||
| torch.cuda.synchronize | torch.accelerator.synchronize | ||
| torch.cuda.set_stream | torch.accelerator.set_stream | ||
| torch.cuda.current_device | torch.accelerator.current_accelerator | ||
| torch.cuda.current_stream | torch.accelerator.current_stream | ||
| torch.cuda.empty_cache | torch.accelerator.empty_cache | #30681 | |
| torch.cuda.max_memory_allocated | torch.accelerator.max_memory_allocated | ||
| torch.cuda.max_memory_reserved | torch.accelerator.max_memory_reserved | ||
| torch.cuda.memory_allocated | torch.accelerator.memory_allocated | ||
| torch.cuda.memory_reserved | torch.accelerator.memory_reserved | ||
| torch.cuda.memory_stats | torch.accelerator.memory_stats | ||
| torch.cuda.reset_peak_memory_stats | torch.accelerator.reset_peak_memory_stats |
Feedback Period.
No response
CC List.
@youkaichao @simon-mo @WoosukKwon @zhuohan123
Any Other Things.
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
DarkLight1337, NickCao, xuechendi, kding1 and hongxiayang