I'm trying to better understand the design decisions made for the "Categorical Codebook Matching for Embodied Character Controllers" paper.
What was the reasoning for using the Gumbel-Softmax sampling over commitment and codebook losses used in the VQ-VAE paper? Was it to increase the potential diversity in the sampled motions? Or is there more to it?
I would think that the VQ-VAE approach would lead to a more interpretable space and have a lot more deterministic behavior at inference time.