Stage1 VQ-VAE Training from Scratch: Performance Gap with Official Checkpoint

## Summary

I trained the Stage1 VQ-VAE model from scratch on the complete VOCASET dataset (314 training samples) and compared it with the official checkpoint. Despite using the full dataset and training for 300 epochs, there remains a **32.1x gap** in reconstruction loss. I'd like to understand what factors contribute to this gap and whether this performance is sufficient for Stage2 training.

## Environment

- **GPU**: 2x NVIDIA RTX A6000 (49GB each)
- **Framework**: PyTorch 2.0.1 with PyTorch Lightning
- **Dataset**: VOCASET from ModelScope (complete version)
- **Training samples**: 314 (train) + 53 (val) + 53 (test)

## Training Configuration

```yaml
# Stage1 VQ-VAE Training Config
MODEL:
  TYPE: vqvae
  n_vert: 15069
  n_embed: 256
  zquant_dim: 64
  hidden_size: 1024
  num_hidden_layers: 6
  num_attention_heads: 8

TRAIN:
  EPOCHS: 300
  BATCH_SIZE: 2
  LR: 1e-4
  GPUS: 2 (DDP)
  OPTIMIZER: Adam
  LR_SCHEDULER: StepLR
```

**Training time**: ~55 minutes
**Total training steps**: ~47,000
**Hardware**: 2x RTX A6000

## Results

### Quantitative Comparison

Evaluated on validation set (50 batches, ~100 samples):

| Model | Reconstruction Loss (MSE) | Quantization Loss | Relative Gap |
|-------|---------------------------|-------------------|--------------|
| **Self-trained (Epoch 299)** | 6.86e-6 ± 4.10e-6 | 4.23e-3 ± 6.15e-5 | 32.1x / 36.7x |
| **Official checkpoint** | 2.10e-7 ± 1.20e-7 | 1.15e-4 ± 5.34e-6 | 1.0x / 1.0x |

**Key findings**:
- Reconstruction loss: **32.1x higher** than official
- Quantization loss: **36.7x higher** than official
- Self-trained model achieves micrometer-level precision (6.86e-6)
- Official model achieves nanometer-level precision (2.10e-7)

### Training Steps Comparison

```python
# Official checkpoint metadata
official_checkpoint = torch.load('checkpoints/voca_vae.ckpt')
print(f"Epoch: {official_checkpoint['epoch']}")        # 199
print(f"Global step: {official_checkpoint['global_step']}")  # 62,800

# Self-trained checkpoint
# Epoch: 299
# Global step: ~47,000
# Training steps gap: 62,800 vs 47,000 (1.33x difference)
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stage1 VQ-VAE Training from Scratch: Performance Gap with Official Checkpoint #4

Summary

Environment

Training Configuration

Results

Quantitative Comparison

Training Steps Comparison

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model	Reconstruction Loss (MSE)	Quantization Loss	Relative Gap
Self-trained (Epoch 299)	6.86e-6 ± 4.10e-6	4.23e-3 ± 6.15e-5	32.1x / 36.7x
Official checkpoint	2.10e-7 ± 1.20e-7	1.15e-4 ± 5.34e-6	1.0x / 1.0x

Stage1 VQ-VAE Training from Scratch: Performance Gap with Official Checkpoint #4

Description

Summary

Environment

Training Configuration

Results

Quantitative Comparison

Training Steps Comparison

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions