Skip to content

[WIP] Refactor CPU Offloading Backend Pattern#1223

Draft
yuanheng-zhao wants to merge 7 commits intovllm-project:mainfrom
yuanheng-zhao:rfc/cpu-offload-backend
Draft

[WIP] Refactor CPU Offloading Backend Pattern#1223
yuanheng-zhao wants to merge 7 commits intovllm-project:mainfrom
yuanheng-zhao:rfc/cpu-offload-backend

Conversation

@yuanheng-zhao
Copy link
Contributor

@yuanheng-zhao yuanheng-zhao commented Feb 5, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Closes #1125

Test Plan

T2V, I2V offline examples with Wan pipeline

Test Result

To be added

Layer-wise

Logs

[Stage-0] INFO 02-05 14:19:12 [diffusion_model_runner.py:109] Model runner: Model loaded successfully.
[Stage-0] INFO 02-05 14:19:12 [layerwise_backend.py:323] Applying layer-wise offloading on ['transformer', 'transformer_2']
[Stage-0] INFO 02-05 14:19:12 [layerwise_backend.py:328] Applying hooks on transformer (WanTransformer3DModel)
[Stage-0] INFO 02-05 14:19:38 [layerwise_backend.py:358] Layer-wise offloading enabled on 40 layers (blocks), with 1 kept on device
[Stage-0] INFO 02-05 14:19:38 [layerwise_backend.py:328] Applying hooks on transformer_2 (WanTransformer3DModel)
[Stage-0] INFO 02-05 14:20:02 [layerwise_backend.py:358] Layer-wise offloading enabled on 40 layers (blocks), with 1 kept on device
[Stage-0] INFO 02-05 14:20:02 [diffusion_model_runner.py:125] Model runner: Model compiled with torch.compile.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC]: Refactor And Refine CPU Offloading Features

1 participant