Skip to content

"The CAPP (Contrastive Audio and Pose Pretraining) model should be available in a few weeks."  #21

@johndpope

Description

@johndpope
Screenshot 2024-10-30 at 11 11 40 PM

Joint Optimization:
Primary diffusion loss ensures high-quality generation
CAPP loss ensures better audio-pose alignment
Weighted combination allows control of importance

Training Insights:

CopyWithout CAPP:

  • Only optimizes for motion prediction
  • No explicit audio-pose alignment objective

With CAPP:

  • Direct feedback on alignment quality
  • Better learning of natural head movements
  • Improved synchronization with speech patterns

Validation Benefits:

CAPP score provides quantitative metric
Helps identify best checkpoints
Better model selection criteria

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions