[WIP] Refactor CPU Offloading Backend Pattern by yuanheng-zhao · Pull Request #1223 · vllm-project/vllm-omni

yuanheng-zhao · 2026-02-05T13:58:51Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Closes #1125

Test Plan

T2V, I2V offline examples with Wan pipeline

Test Result

To be added

Layer-wise

Logs

[Stage-0] INFO 02-05 14:19:12 [diffusion_model_runner.py:109] Model runner: Model loaded successfully.
[Stage-0] INFO 02-05 14:19:12 [layerwise_backend.py:323] Applying layer-wise offloading on ['transformer', 'transformer_2']
[Stage-0] INFO 02-05 14:19:12 [layerwise_backend.py:328] Applying hooks on transformer (WanTransformer3DModel)
[Stage-0] INFO 02-05 14:19:38 [layerwise_backend.py:358] Layer-wise offloading enabled on 40 layers (blocks), with 1 kept on device
[Stage-0] INFO 02-05 14:19:38 [layerwise_backend.py:328] Applying hooks on transformer_2 (WanTransformer3DModel)
[Stage-0] INFO 02-05 14:20:02 [layerwise_backend.py:358] Layer-wise offloading enabled on 40 layers (blocks), with 1 kept on device
[Stage-0] INFO 02-05 14:20:02 [diffusion_model_runner.py:125] Model runner: Model compiled with torch.compile.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

yuanheng-zhao added 7 commits February 4, 2026 05:06

[rfc] add impl offloading backend cls

ca2b5ce

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

add impl following cache backend

540dd94

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

rm prev offload module

9b84052

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

fix imports

26b8a6b

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

upd model level offload hooks

a086f0d

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

adapt layerwise backend to use ModelHook

4b56dfe

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

fix

2b31b7b

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Refactor CPU Offloading Backend Pattern#1223

[WIP] Refactor CPU Offloading Backend Pattern#1223
yuanheng-zhao wants to merge 7 commits intovllm-project:mainfrom
yuanheng-zhao:rfc/cpu-offload-backend

yuanheng-zhao commented Feb 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yuanheng-zhao commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Layer-wise

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yuanheng-zhao commented Feb 5, 2026 •

edited

Loading