feat: Add Bernini-R model support (Wan video) (CORE-279)#14216
Conversation
Since sizes don't have to match
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR adds in-context reference conditioning to WAN models. It extends the RoPE rotation mechanism to accept a 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
R2V person with 4 views(front left back and head) not that similar to the ref, i wonder if we can get some improvements |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@comfy/ldm/wan/model.py`:
- Around line 1668-1669: SCAILWanModel.forward_orig() currently references
self.patch_embedding_mask and handles ref_mask_latents / sam_latents, but
patch_embedding_mask is only created in SCAIL2WanModel.__init__(), causing
AttributeError for base SCAILWanModel; fix by moving the mask-specific logic out
of the base forward_orig into an override on SCAIL2WanModel (implement
forward_orig or forward_mask_handling in SCAIL2WanModel that applies
patch_embedding_mask to ref_mask_latents/sam_latents), or alternatively add a
safe default on SCAILWanModel (e.g. set self.patch_embedding_mask = None in
SCAILWanModel.__init__() and guard uses with an if self.patch_embedding_mask is
not None check around the x = x + ... lines referenced). Ensure all occurrences
(forward_orig references at lines cited and handling near sam_latents) follow
the same pattern so base class remains backward-compatible.
- Around line 1732-1757: The branch currently converts reference lengths into
patch counts (ref_t_patches) and passes those to super().rope_encode, but
rope_encode expects frame counts and will itself convert to patches, causing
double-conversion when patch_size[0] != 1; change the logic to compute and pass
frame counts (use the frame dimension directly from reference_latent and
pose_latents—e.g., reference_latent.shape[2] and pose_latents.shape[-3]) into
rope_encode and compute main frames as t - ref_frames, so call
super().rope_encode(ref_frames, ...) and super().rope_encode(main_frames, ...)
instead of using ref_t_patches/main_t_patches while leaving pose handling
(F_pose) consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a46af7b7-4d90-4f4f-b12b-daa3227d37e1
📒 Files selected for processing (3)
comfy/ldm/wan/model.pycomfy/model_base.pynodes.py
🚧 Files skipped from review as they are similar to previous changes (2)
- nodes.py
- comfy/model_base.py
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@comfy/ldm/wan/model.py`:
- Around line 1668-1669: SCAILWanModel.forward_orig() currently references
self.patch_embedding_mask and handles ref_mask_latents / sam_latents, but
patch_embedding_mask is only created in SCAIL2WanModel.__init__(), causing
AttributeError for base SCAILWanModel; fix by moving the mask-specific logic out
of the base forward_orig into an override on SCAIL2WanModel (implement
forward_orig or forward_mask_handling in SCAIL2WanModel that applies
patch_embedding_mask to ref_mask_latents/sam_latents), or alternatively add a
safe default on SCAILWanModel (e.g. set self.patch_embedding_mask = None in
SCAILWanModel.__init__() and guard uses with an if self.patch_embedding_mask is
not None check around the x = x + ... lines referenced). Ensure all occurrences
(forward_orig references at lines cited and handling near sam_latents) follow
the same pattern so base class remains backward-compatible.
- Around line 1732-1757: The branch currently converts reference lengths into
patch counts (ref_t_patches) and passes those to super().rope_encode, but
rope_encode expects frame counts and will itself convert to patches, causing
double-conversion when patch_size[0] != 1; change the logic to compute and pass
frame counts (use the frame dimension directly from reference_latent and
pose_latents—e.g., reference_latent.shape[2] and pose_latents.shape[-3]) into
rope_encode and compute main frames as t - ref_frames, so call
super().rope_encode(ref_frames, ...) and super().rope_encode(main_frames, ...)
instead of using ref_t_patches/main_t_patches while leaving pose handling
(F_pose) consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a46af7b7-4d90-4f4f-b12b-daa3227d37e1
📒 Files selected for processing (3)
comfy/ldm/wan/model.pycomfy/model_base.pynodes.py
🚧 Files skipped from review as they are similar to previous changes (2)
- nodes.py
- comfy/model_base.py
🛑 Comments failed to post (2)
comfy/ldm/wan/model.py (2)
1668-1669:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon’t make the base SCAIL path depend on a SCAIL-2-only layer.
SCAILWanModel.forward_orig()now usesself.patch_embedding_mask(...), but that layer is only created inSCAIL2WanModel.__init__(). Ifref_mask_latentsorsam_latentsget forwarded into a plainSCAILWanModel, this now crashes withAttributeError. Please move the mask-specific handling into aSCAIL2WanModeloverride, or initialize a safe default on the base class.As per coding guidelines,
comfy/**changes should preserve backward compatibility because breaking changes affect all custom nodes.Also applies to: 1677-1678, 1810-1815
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy/ldm/wan/model.py` around lines 1668 - 1669, SCAILWanModel.forward_orig() currently references self.patch_embedding_mask and handles ref_mask_latents / sam_latents, but patch_embedding_mask is only created in SCAIL2WanModel.__init__(), causing AttributeError for base SCAILWanModel; fix by moving the mask-specific logic out of the base forward_orig into an override on SCAIL2WanModel (implement forward_orig or forward_mask_handling in SCAIL2WanModel that applies patch_embedding_mask to ref_mask_latents/sam_latents), or alternatively add a safe default on SCAILWanModel (e.g. set self.patch_embedding_mask = None in SCAILWanModel.__init__() and guard uses with an if self.patch_embedding_mask is not None check around the x = x + ... lines referenced). Ensure all occurrences (forward_orig references at lines cited and handling near sam_latents) follow the same pattern so base class remains backward-compatible.Source: Coding guidelines
1732-1757:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep replacement-mode
rope_encode()in frame units.This branch converts the reference length to patch units and then feeds those values back into
super().rope_encode(), which already does its own patch-size conversion. Withpatch_size[0] != 1, the ref/main RoPE lengths drift from the tokenized sequence and break non-default temporal patch sizes.Suggested fix
- ref_t_patches = 0 + ref_t = 0 if reference_latent is not None: - ref_t_patches = (reference_latent.shape[2] + (self.patch_size[0] // 2)) // self.patch_size[0] - main_t_patches = t - ref_t_patches + ref_t = reference_latent.shape[2] + main_t = t - ref_t parts = [] - if ref_t_patches > 0: + if ref_t > 0: ref_tf = {"rope_options": {"shift_y": REF_ROPE_H, "shift_x": 0.0, "scale_y": 1.0, "scale_x": 1.0}} - parts.append(super().rope_encode(ref_t_patches, h, w, t_start=0, device=device, dtype=dtype, transformer_options=ref_tf)) - if main_t_patches > 0: - parts.append(super().rope_encode(main_t_patches, h, w, t_start=0, device=device, dtype=dtype, transformer_options=transformer_options)) + parts.append(super().rope_encode(ref_t, h, w, t_start=0, device=device, dtype=dtype, transformer_options=ref_tf)) + if main_t > 0: + parts.append(super().rope_encode(main_t, h, w, t_start=0, device=device, dtype=dtype, transformer_options=transformer_options))As per coding guidelines,
comfy/**changes should preserve backward compatibility because breaking changes affect all custom nodes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy/ldm/wan/model.py` around lines 1732 - 1757, The branch currently converts reference lengths into patch counts (ref_t_patches) and passes those to super().rope_encode, but rope_encode expects frame counts and will itself convert to patches, causing double-conversion when patch_size[0] != 1; change the logic to compute and pass frame counts (use the frame dimension directly from reference_latent and pose_latents—e.g., reference_latent.shape[2] and pose_latents.shape[-3]) into rope_encode and compute main frames as t - ref_frames, so call super().rope_encode(ref_frames, ...) and super().rope_encode(main_frames, ...) instead of using ref_t_patches/main_t_patches while leaving pose handling (F_pose) consistent.Source: Coding guidelines
Co-authored-by: Alexis Rolland <alexis@comfy.org>
Adds support for IC conditioning Berninin uses for Wan Video.
The weights are identical in structure to Wan22, so I chose to add the conditioning support to that instead of new model subclass, it's slightly different method than existing models use.
Bernini_testing_video_edit_02.json
Model weights:
https://huggingface.co/Comfy-Org/Bernini-R
Video edit:
Wan22_Bernini_00001.mp4
Wan22_Bernini_00023.mp4
Wan22_Bernini_00014.1.mp4
I2V:
Wan22_Bernini_00031.mp4
R2V:
Wan22_Bernini_00013.1.mp4