"The CAPP (Contrastive Audio and Pose Pretraining) model should be available in a few weeks." 

<img width="950" alt="Screenshot 2024-10-30 at 11 11 40 PM" src="https://github.com/user-attachments/assets/b885a956-ef78-4884-a5e0-2a0d1f921385">



**Joint Optimization:**
Primary diffusion loss ensures high-quality generation
CAPP loss ensures better audio-pose alignment
Weighted combination allows control of importance


**Training Insights:**

**CopyWithout CAPP:**
- Only optimizes for motion prediction
- No explicit audio-pose alignment objective

**With CAPP:**
- Direct feedback on alignment quality
- Better learning of natural head movements
- Improved synchronization with speech patterns

**Validation Benefits:**


CAPP score provides quantitative metric
Helps identify best checkpoints
Better model selection criteria




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"The CAPP (Contrastive Audio and Pose Pretraining) model should be available in a few weeks." #21

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

"The CAPP (Contrastive Audio and Pose Pretraining) model should be available in a few weeks." #21

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions