Reproduce VGT-AR Pretraining (InternVL3)

I’m pretraining **VGT InternVL3 0.6B (448px)** using the official **pretraining** pipeline, but the generation quality is much lower than expected. Training appears stable and the loss decreases normally, but visual quality remains poor and improves only slightly over time.

I attach generated samples from **2k, 50k, and 100k iterations**, as well as the loss curves.

### samples from 2k iterations

<img width="1523" height="452" alt="Image" src="https://github.com/user-attachments/assets/3b891f62-934d-4614-8ced-e9d4d7ff8d6b" />

### samples from 50k iterations

<img width="1523" height="443" alt="Image" src="https://github.com/user-attachments/assets/57831dac-83f6-41db-8775-89e6042ff520" />

### samples from 100k iterations

<img width="1495" height="394" alt="Image" src="https://github.com/user-attachments/assets/9e787804-de3e-4687-8ca0-f60b91f2fc1f" />

### loss curves

<img width="1570" height="624" alt="Image" src="https://github.com/user-attachments/assets/f84dac48-905b-4117-90fb-9b726b6d8ca6" />



---

### Training Setup

* 100k iterations, 8 GPUs (DDP)
* Global batch size: 256
* LR: 3e-4 peak, cosine decay to 1e-4
* Warmup: 1k iters
* AdamW (0.9, 0.95), weight decay 0.05
* EMA start 10k (momentum 0.0002)
* REPA loss weight 0.5

Config: `configs/pretrain/vgt_internvl3_0_6B_448px_pretrain.py`

---

### Dataset

Mixed training data:

* megalith10m
* text2image2m
* imagenet1k_t2i_qwenvl_flux

---

### Question

Is this level of generation quality expected for this setup, or is there anything important I should check or adjust?

Thanks — happy to share more details if needed.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce VGT-AR Pretraining (InternVL3) #12

samples from 2k iterations

samples from 50k iterations

samples from 100k iterations

loss curves

Training Setup

Dataset

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproduce VGT-AR Pretraining (InternVL3) #12

Description

samples from 2k iterations

samples from 50k iterations

samples from 100k iterations

loss curves

Training Setup

Dataset

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions