Skip to content

eedVR2 3B quality on 8GB VRAM (tiling/offload) — seeking guidance #54

@idocinthebox

Description

@idocinthebox

Cross-posted here: numz/ComfyUI-SeedVR2_VideoUpscaler#528

Quality Issues with SeedVR2 3B on 8GB VRAM GPU — Seeking Guidance

Environment

Component Details
GPU NVIDIA GeForce RTX 5070 Laptop GPU, 8GB VRAM
CPU AMD Ryzen 9 9950X
PyTorch 2.11.0.dev+cu128
CUDA 12.8
Python 3.12
Model SeedVR2 3B (GGUF Q8_0, 3,491 MB)
VAE ema_vae (478 MB)
OS Windows 11

Context

I'm integrating SeedVR2 3B into a standalone desktop application (not ComfyUI) for video restoration from analog tape sources (VHS, Hi8, DV). The implementation uses:

  • Block-by-block CPU↔GPU offloading for the DiT (only one transformer block on GPU at a time, FP8 compressed during transfer)
  • Tiled VAE encode/decode (512×512 pixel tiles, 160px overlap, cosine blending) since the full frame won't fit in 8GB
  • GGUF Q8_0 weights for the DiT (425 tensors FP16 + 210 tensors Q8_0)
  • 1-step distilled inference (Euler sampler)

Problem Description

The output has significantly lower quality than official demos and ComfyUI results on high-VRAM GPUs. Specifically:

  1. Dark patches / artifacts (partially fixed — see below)
  2. Waxy / blurry appearance (partially fixed — see below)
  3. Visible lines within VAE tiles (partially fixed — see below)
  4. Overall softness / lack of detail compared to official demos

What I've Already Investigated and Fixed

I did a detailed diff between the official ByteDance SeedVR repo code and my implementation. Three issues were found and corrected:

Fix 1: VAE dtype — float16 → bfloat16

  • Problem: VAE was running in float16 (max 65,504). The 3D causal conv intermediate activations exceeded this range → overflow → dark artifacts.
  • Fix: Changed to bfloat16 (matches official configs_3b/main.yaml: vae.dtype: bfloat16).
  • Result: Dark patches eliminated.

Fix 2: Color correction — LAB → wavelet only

  • Problem: Post-processing applied wavelet_reconstruction() followed by LAB histogram matching. The LAB step over-corrected chrominance, producing a waxy/washed-out look.
  • Fix: Changed to wavelet-only color correction (matches official pipeline).
  • Result: Waxy appearance reduced.

Fix 3: VAE conv spatial splitting — disabled

  • Problem: VAE_CONV_MAX_MEM was set to 0.125 GiB (vs official 0.5 GiB), forcing InflatedCausalConv3d.memory_limit_conv() to split conv3d inputs along H/W dimensions. This truncates the receptive field at split boundaries → visible horizontal/vertical lines within tiles.
  • Fix: Set VAE_CONV_MAX_MEM = float("inf") to disable spatial splitting. With 512px tiles, peak per-tile VRAM is ~200–300 MB, well within budget.
  • Result: Internal tile lines eliminated.

What Still Doesn't Match Official Quality

Even after these three fixes, the output is noticeably softer than official demos. The image content is recognizable and the worst artifacts are gone, but fine detail and sharpness are lacking.

Suspected Remaining Issues

1. Tiled VAE encode uses mode() instead of sample()

  • The official code uses .latent (stochastic sampling from the posterior)
  • My tiled approach uses .latent_dist.mode() (deterministic mean)
  • Reason: Stochastic sampling per-tile would create noise inconsistencies at tile boundaries
  • Question: Is there a better approach for tiled VAE encoding that preserves stochastic behavior while maintaining tile boundary consistency?

2. 512px tiles may be too small

  • Official SeedVR2 runs on 80GB H100s with no tiling at all
  • My 512px tiles with 160px overlap may lose too much global context
  • ComfyUI uses 736px tiles on 16GB GPUs
  • Question: What is the minimum tile size that preserves quality? Would 640px or 768px tiles work better, or is the quality loss inherent to tiling?

3. GGUF Q8_0 quantization impact

  • Weight comparison shows max error 0.015625, mean error 0.000135 vs FP16
  • A/B test showed PSNR 20.90 dB between Q8_0 and FP16 outputs
  • Q8_0 output actually looked slightly better in our tests
  • Question: Is Q8_0 expected to have meaningful quality loss vs FP16 for SeedVR2 3B?

4. Block-by-block DiT offloading

  • Each of the 32 transformer blocks is loaded from CPU → GPU → inference → back to CPU
  • FP8 compression during CPU storage (halves transfer size)
  • Question: Does processing blocks individually (vs keeping the full model on GPU) affect attention quality? Each block still sees the full sequence, but are there cross-block dependencies that require simultaneous residency?

5. VAE tiling overlap/blending

  • Using cosine blending in the overlap region (160px)
  • Question: Is cosine blending optimal, or does the official codebase use a different blending strategy? Would Gaussian blending or feathering produce better results?

Specific Questions for the Community

  1. Has anyone achieved official-demo-quality results on an 8GB GPU? If so, what settings/compromises were used?

  2. Is there a known minimum VRAM threshold below which SeedVR2 3B cannot produce good results regardless of tiling/offloading strategies?

  3. Are there any other pipeline differences between the official ByteDance inference code and typical 8GB GPU implementations that I might have missed?

  4. Would SeedVR2 7B with more aggressive quantization (e.g., Q4) produce better results than 3B at Q8 on 8GB VRAM?

Reproduction

Single-frame test at 720p output (SD input → 720p upscale):

  • Input: 480×360 DV frame
  • Output target: 960×720 (padded to divisible-by-8)
  • VAE tiles: 512×512, overlap 160px
  • DiT: Full latent, block-by-block offload, FP8 mode
  • Diffusion: 1 step, Euler sampler
  • Color correction: wavelet only
  • Processing time: ~35 seconds per frame

References

Any guidance or insights from folks who have worked on low-VRAM SeedVR2 implementations would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions