-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
I’m pretraining VGT InternVL3 0.6B (448px) using the official pretraining pipeline, but the generation quality is much lower than expected. Training appears stable and the loss decreases normally, but visual quality remains poor and improves only slightly over time.
I attach generated samples from 2k, 50k, and 100k iterations, as well as the loss curves.
samples from 2k iterations
samples from 50k iterations
samples from 100k iterations
loss curves
Training Setup
- 100k iterations, 8 GPUs (DDP)
- Global batch size: 256
- LR: 3e-4 peak, cosine decay to 1e-4
- Warmup: 1k iters
- AdamW (0.9, 0.95), weight decay 0.05
- EMA start 10k (momentum 0.0002)
- REPA loss weight 0.5
Config: configs/pretrain/vgt_internvl3_0_6B_448px_pretrain.py
Dataset
Mixed training data:
- megalith10m
- text2image2m
- imagenet1k_t2i_qwenvl_flux
Question
Is this level of generation quality expected for this setup, or is there anything important I should check or adjust?
Thanks — happy to share more details if needed.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels