Skip to content

Seeking help with custom data #9

@Wang-Wenqing

Description

@Wang-Wenqing

When I use custom data to generate videos, I sometimes get poor results. Could you please provide some guidance on how to improve them?

if i give a input image like this:

Image

first step: VGGT — 3D scene warping (single / few images)
with command:

python run_warp.py \
    --image_path image_1 \
    --output_path output_images_1_vggt_warp_degree90 \
    --camera 0 \
    --direction right \
    --degree 90 \
    --frame_single 25 \
    --look_at_depth 0.25

and then Prepare Prompts
"city_square": (
"A bullet-time effect video in a frozen 3D photography style. The entire small-town courthouse-square intersection is captured as a single, perfectly static moment in time. The red-brick civic building with a clock tower, the surrounding storefronts and awnings, the gazebo, the roads, cars, pedestrians, flag, trees, shadows, and warm afternoon atmosphere all remain completely motionless, with no movement anywhere in the environment. The only change is the camera, which moves smoothly and stably in a gentle aerial arc while maintaining a high-angle perspective. The scene should remain a coherent small-town urban center throughout the shot: newly revealed areas should continue the courthouse-square layout with more connected streets, sidewalks, rooftops, storefronts, parked vehicles, and town-block structures consistent with the existing intersection. The surrounding view should preserve the continuity of the town rather than collapsing into generic woodland or empty natural scenery."
),

and then Video Inference with command:

#!/bin/bash
# WorldForge (WAN) Video Generation - Batch Inference Script
# Usage: bash wan_for_worldforge/run_test_case.sh  (run from project root)
# export HF_ENDPOINT=https://hf-mirror.com  # Uncomment if you need a HuggingFace mirror
cd "$(dirname "$0")"
# ==================== Basic Configuration ====================
MODELS_DIR="/.cache/huggingface/models"        # Model weights directory (must contain Wan2.1-I2V-14B-480P-Diffusers etc.)
VIDEO_REF="vggt/output_images_1_vggt_warp_degree90/warped_images"            # Input frames + masks directory
OUTPUT_DIR="./output_image_1_wan_worldforge_degree90"                          # Output directory
SCENE="city_square"                                   # Scene name (see utils/prompts.py)
NUM_FRAMES=49                                  # Number of output frames
RESOLUTION="720p"                              # 480p or 720p
STATIC="True"                                 # True for static scenes, False for dynamic
NUM_INFERENCE_STEPS=50                         # Diffusion sampling steps

# ==================== Parameter Grid ====================
# Modify these arrays to sweep different parameter combinations
omegas=(4)                  # Auto-guidance strength (recommended: (4 6)) 
guidance_scales=(4)         # CFG scale (recommended: (4) ) 
transition_distances=(15)   # Mask softening distance in pixels (0=hard edge; recommended: (15 20 25))
resample_steps=(2)          # Resampling iterations per step (recommended: (2))
guide_steps=(10 18 23)         # Guide steps: apply guided fusion for first N steps (recommended: (10 15 18 20 23))
step_additions=(0)        # Extra steps for resample_round = guide_steps + addition (recommended: (0 1))

# ==================== Batch Inference ====================
mkdir -p "$OUTPUT_DIR"

for omega in "${omegas[@]}"; do
for cfg in "${guidance_scales[@]}"; do
for mask in "${transition_distances[@]}"; do
for resample in "${resample_steps[@]}"; do
for guide in "${guide_steps[@]}"; do
for add in "${step_additions[@]}"; do
    round=$((guide + add))
    output="${OUTPUT_DIR}/o${omega}_guide${guide}_round${round}_mask${mask}_cfg${cfg}.mp4"
    
    echo "========================================"
    echo "omega=$omega, guide=$guide, round=$round, mask=$mask, cfg=$cfg"
    echo "output: $output"
    echo "========================================"
    
    python infer_worldforge.py \
        --model "$RESOLUTION" \
        --models-dir "$MODELS_DIR" \
        --video-ref "$VIDEO_REF" \
        --scene "$SCENE" \
        --num-frames $NUM_FRAMES \
        --num-inference-steps $NUM_INFERENCE_STEPS \
        --guidance-scale $cfg \
        --static "$STATIC" \
        --guided \
        --resample-steps $resample \
        --guide-steps $guide \
        --resample-round $round \
        --omega $omega \
        --omega_resample $omega \
        --soften-mask \
        --transition-distance $mask \
        --use-pca-channel-selection \
        --output "$output"
done
done
done
done
done
done

but get results very weird no matter which guide_steps

o4_guide23_round23_mask15_cfg4.mp4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions