Seeking help with custom data

When I use custom data to generate videos, I sometimes get poor results. Could you please provide some guidance on how to improve them?

if i give a input image like this:

![Image](https://github.com/user-attachments/assets/d072a9d2-206c-46b0-85a6-25a6a26b30d4)

first step: VGGT — 3D scene warping (single / few images)
with command: 
```
python run_warp.py \
    --image_path image_1 \
    --output_path output_images_1_vggt_warp_degree90 \
    --camera 0 \
    --direction right \
    --degree 90 \
    --frame_single 25 \
    --look_at_depth 0.25
```

and then Prepare Prompts
    "city_square": (
        "A bullet-time effect video in a frozen 3D photography style. The entire small-town courthouse-square intersection is captured as a single, perfectly static moment in time. The red-brick civic building with a clock tower, the surrounding storefronts and awnings, the gazebo, the roads, cars, pedestrians, flag, trees, shadows, and warm afternoon atmosphere all remain completely motionless, with no movement anywhere in the environment. The only change is the camera, which moves smoothly and stably in a gentle aerial arc while maintaining a high-angle perspective. The scene should remain a coherent small-town urban center throughout the shot: newly revealed areas should continue the courthouse-square layout with more connected streets, sidewalks, rooftops, storefronts, parked vehicles, and town-block structures consistent with the existing intersection. The surrounding view should preserve the continuity of the town rather than collapsing into generic woodland or empty natural scenery."
    ),

and then Video Inference with command:
```
#!/bin/bash
# WorldForge (WAN) Video Generation - Batch Inference Script
# Usage: bash wan_for_worldforge/run_test_case.sh  (run from project root)
# export HF_ENDPOINT=https://hf-mirror.com  # Uncomment if you need a HuggingFace mirror
cd "$(dirname "$0")"
# ==================== Basic Configuration ====================
MODELS_DIR="/.cache/huggingface/models"        # Model weights directory (must contain Wan2.1-I2V-14B-480P-Diffusers etc.)
VIDEO_REF="vggt/output_images_1_vggt_warp_degree90/warped_images"            # Input frames + masks directory
OUTPUT_DIR="./output_image_1_wan_worldforge_degree90"                          # Output directory
SCENE="city_square"                                   # Scene name (see utils/prompts.py)
NUM_FRAMES=49                                  # Number of output frames
RESOLUTION="720p"                              # 480p or 720p
STATIC="True"                                 # True for static scenes, False for dynamic
NUM_INFERENCE_STEPS=50                         # Diffusion sampling steps

# ==================== Parameter Grid ====================
# Modify these arrays to sweep different parameter combinations
omegas=(4)                  # Auto-guidance strength (recommended: (4 6)) 
guidance_scales=(4)         # CFG scale (recommended: (4) ) 
transition_distances=(15)   # Mask softening distance in pixels (0=hard edge; recommended: (15 20 25))
resample_steps=(2)          # Resampling iterations per step (recommended: (2))
guide_steps=(10 18 23)         # Guide steps: apply guided fusion for first N steps (recommended: (10 15 18 20 23))
step_additions=(0)        # Extra steps for resample_round = guide_steps + addition (recommended: (0 1))

# ==================== Batch Inference ====================
mkdir -p "$OUTPUT_DIR"

for omega in "${omegas[@]}"; do
for cfg in "${guidance_scales[@]}"; do
for mask in "${transition_distances[@]}"; do
for resample in "${resample_steps[@]}"; do
for guide in "${guide_steps[@]}"; do
for add in "${step_additions[@]}"; do
    round=$((guide + add))
    output="${OUTPUT_DIR}/o${omega}_guide${guide}_round${round}_mask${mask}_cfg${cfg}.mp4"
    
    echo "========================================"
    echo "omega=$omega, guide=$guide, round=$round, mask=$mask, cfg=$cfg"
    echo "output: $output"
    echo "========================================"
    
    python infer_worldforge.py \
        --model "$RESOLUTION" \
        --models-dir "$MODELS_DIR" \
        --video-ref "$VIDEO_REF" \
        --scene "$SCENE" \
        --num-frames $NUM_FRAMES \
        --num-inference-steps $NUM_INFERENCE_STEPS \
        --guidance-scale $cfg \
        --static "$STATIC" \
        --guided \
        --resample-steps $resample \
        --guide-steps $guide \
        --resample-round $round \
        --omega $omega \
        --omega_resample $omega \
        --soften-mask \
        --transition-distance $mask \
        --use-pca-channel-selection \
        --output "$output"
done
done
done
done
done
done
```

but get results very weird no matter which guide_steps

https://github.com/user-attachments/assets/fa9f82d8-f31e-4470-aa12-ec7475579df5


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeking help with custom data #9

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Seeking help with custom data #9

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions