Generate in Reconstruction Space, Match in Semantic Space: Transport Geometry for One-Step Generation (official implementation)
We introduce an architecture for one-step image generation where a generator maps noise to an image in a single forward pass inside a pretrained autoencoder's latent space. On top of this latent space, we train a frozen self-supervised featurizer that provides semantic features. The distributional matching loss used to train the generator is defined in this semantic feature space. At inference, only the generator and decoder are needed (one forward pass, no featurizer, no iterative sampling).
Semantic features discard nuisance reconstruction variation, making the distribution matching problem lower-dimensional and statistically more tractable. The optimal transport couplings estimated from finite minibatches become more stable, directly improving the training signal. This gives a 39x FID reduction on class-conditional ImageNet (134 to 3.46).
The training loss is a Sinkhorn divergence with classifier-free guidance:
where
Uncurated ImageNet 256x256 samples (one step, no refinement). Each row uses a different frozen SSL featurizer during training. The featurizer is not used at inference.
git clone https://github.com/huguesva/semantic-transport-generation.git
cd semantic-transport-generation
pip install -e . -r requirements.txtTraining operates on precomputed SD-VAE latents. Encode ImageNet 256x256 images into sharded .pt files:
python scripts/precompute_latents.py \
--imagenet_dir /path/to/imagenet \
--output_dir /path/to/imagenet256_latents \
--vae stabilityai/sd-vae-ft-mseThen set data_dir in experiments/dataset/imagenet256_latent.yaml to point to the output directory.
Experiments can be launched via the spt CLI provided by stable-pretraining.
spt run experiments/main.yaml # MAE mask 50% featurizer (default, best)
spt run experiments/main.yaml featurizer=mae_60 # MAE mask 60%
spt run experiments/main.yaml featurizer=mae_75 # MAE mask 75%
spt run experiments/main.yaml featurizer=dinov3 # DINOv3 distillation
spt run experiments/main.yaml featurizer=inception # Inception distillationSee the paper for data preparation details and full experimental setup.
Generators and featurizers are available at huguesva/semantic-transport-generation on Hugging Face.
@misc{vanassel2026generatereconstructionspacematch,
title={Generate in Reconstruction Space, Match in Semantic Space: Transport Geometry for One-Step Generation},
author={Hugues Van Assel and Edward De Brouwer and Saeed Saremi and Gabriele Scalia and Aviv Regev},
year={2026},
eprint={2606.00514},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2606.00514},
}
