Skip to content

Commit b58b7c5

Browse files
update readme
1 parent d0a0868 commit b58b7c5

File tree

1 file changed

+25
-10
lines changed

1 file changed

+25
-10
lines changed

README.md

+25-10
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,14 @@ In this repository, we present **Wan2.1**, a comprehensive and open suite of vid
2727

2828
## 🔥 Latest News!!
2929

30-
* Mar 3, 2025: 👋 Wan2.1's T2V and I2V have been integrated into Diffusers ([T2V](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/wan/pipeline_wan.py) | [I2V](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/wan/pipeline_wan_i2v.py)). Feel free to give it a try!
31-
* Feb 27, 2025: 👋 Wan2.1 has been integrated into [ComfyUI](https://comfyanonymous.github.io/ComfyUI_examples/wan/). Enjoy!
32-
* Feb 25, 2025: 👋 We've released the inference code and weights of Wan2.1.
30+
* Mar 3, 2025: 👋 **Wan2.1**'s T2V and I2V have been integrated into Diffusers ([T2V](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan#diffusers.WanPipeline) | [I2V](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan#diffusers.WanImageToVideoPipeline)). Feel free to give it a try!
31+
* Feb 27, 2025: 👋 **Wan2.1** has been integrated into [ComfyUI](https://comfyanonymous.github.io/ComfyUI_examples/wan/). Enjoy!
32+
* Feb 25, 2025: 👋 We've released the inference code and weights of **Wan2.1**.
33+
34+
## Community Works
35+
If your work has improved **Wan2.1** and you would like more people to see it, please inform us.
36+
- [TeaCache](https://github.com/ali-vilab/TeaCache) now supports **Wan2.1** acceleration, capable of increasing speed by approximately 2x. Feel free to give it a try!
37+
- [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) provides more support for **Wan2.1**, including video-to-video, FP8 quantization, VRAM optimization, LoRA training, and more. Please refer to [their examples](https://github.com/modelscope/DiffSynth-Studio/tree/main/examples/wanvideo).
3338

3439

3540
## 📑 Todo List
@@ -291,30 +296,43 @@ DASH_API_KEY=your_key python generate.py --task i2v-14B --size 1280*720 --ckpt_d
291296
You can easily inference **Wan2.1**-I2V using Diffusers with the following command:
292297
``` python
293298
import torch
299+
import numpy as np
294300
from diffusers import AutoencoderKLWan, WanImageToVideoPipeline
295301
from diffusers.utils import export_to_video, load_image
302+
from transformers import CLIPVisionModel
296303

297304
# Available models: Wan-AI/Wan2.1-I2V-14B-480P-Diffusers, Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
298305
model_id = "Wan-AI/Wan2.1-I2V-14B-720P-Diffusers"
306+
image_encoder = CLIPVisionModel.from_pretrained(model_id, subfolder="image_encoder", torch_dtype=torch.float32)
299307
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
300-
pipe = WanImageToVideoPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
308+
pipe = WanImageToVideoPipeline.from_pretrained(model_id, vae=vae, image_encoder=image_encoder, torch_dtype=torch.bfloat16)
301309
pipe.to("cuda")
302310

303-
max_area = 720 * 1280
304-
height, width = 720, 1280
305311
image = load_image(
306312
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"
307313
)
314+
max_area = 720 * 1280
315+
aspect_ratio = image.height / image.width
316+
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
317+
height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
318+
width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
319+
image = image.resize((width, height))
308320
prompt = (
309321
"An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in "
310322
"the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
311323
)
312324
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
313325

314326
output = pipe(
315-
image=image, prompt=prompt, max_area=max_area, negative_prompt=negative_prompt, num_frames=81, guidance_scale=5.0
327+
image=image,
328+
prompt=prompt,
329+
negative_prompt=negative_prompt,
330+
height=height, width=width,
331+
num_frames=81,
332+
guidance_scale=5.0
316333
).frames[0]
317334
export_to_video(output, "output.mp4", fps=16)
335+
318336
```
319337
> 💡Note: Please note that this example does not integrate Prompt Extension and distributed inference. We will soon update with the integrated prompt extension and multi-GPU version of Diffusers.
320338
@@ -402,9 +420,6 @@ We test the computational efficiency of different **Wan2.1** models on different
402420
> 💡Note: T2V-14B is slower than I2V-14B because the former samples 50 steps while the latter uses 40 steps.
403421
404422

405-
## Community Contributions
406-
- [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) provides more support for **Wan2.1**, including video-to-video, FP8 quantization, VRAM optimization, LoRA training, and more. Please refer to [their examples](https://github.com/modelscope/DiffSynth-Studio/tree/main/examples/wanvideo).
407-
408423
-------
409424

410425
## Introduction of Wan2.1

0 commit comments

Comments
 (0)