Skip to content

Stage1 VQ-VAE Training from Scratch: Performance Gap with Official Checkpoint #4

@Queequeg92

Description

@Queequeg92

Summary

I trained the Stage1 VQ-VAE model from scratch on the complete VOCASET dataset (314 training samples) and compared it with the official checkpoint. Despite using the full dataset and training for 300 epochs, there remains a 32.1x gap in reconstruction loss. I'd like to understand what factors contribute to this gap and whether this performance is sufficient for Stage2 training.

Environment

  • GPU: 2x NVIDIA RTX A6000 (49GB each)
  • Framework: PyTorch 2.0.1 with PyTorch Lightning
  • Dataset: VOCASET from ModelScope (complete version)
  • Training samples: 314 (train) + 53 (val) + 53 (test)

Training Configuration

# Stage1 VQ-VAE Training Config
MODEL:
  TYPE: vqvae
  n_vert: 15069
  n_embed: 256
  zquant_dim: 64
  hidden_size: 1024
  num_hidden_layers: 6
  num_attention_heads: 8

TRAIN:
  EPOCHS: 300
  BATCH_SIZE: 2
  LR: 1e-4
  GPUS: 2 (DDP)
  OPTIMIZER: Adam
  LR_SCHEDULER: StepLR

Training time: ~55 minutes
Total training steps: ~47,000
Hardware: 2x RTX A6000

Results

Quantitative Comparison

Evaluated on validation set (50 batches, ~100 samples):

Model Reconstruction Loss (MSE) Quantization Loss Relative Gap
Self-trained (Epoch 299) 6.86e-6 ± 4.10e-6 4.23e-3 ± 6.15e-5 32.1x / 36.7x
Official checkpoint 2.10e-7 ± 1.20e-7 1.15e-4 ± 5.34e-6 1.0x / 1.0x

Key findings:

  • Reconstruction loss: 32.1x higher than official
  • Quantization loss: 36.7x higher than official
  • Self-trained model achieves micrometer-level precision (6.86e-6)
  • Official model achieves nanometer-level precision (2.10e-7)

Training Steps Comparison

# Official checkpoint metadata
official_checkpoint = torch.load('checkpoints/voca_vae.ckpt')
print(f"Epoch: {official_checkpoint['epoch']}")        # 199
print(f"Global step: {official_checkpoint['global_step']}")  # 62,800

# Self-trained checkpoint
# Epoch: 299
# Global step: ~47,000
# Training steps gap: 62,800 vs 47,000 (1.33x difference)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions