Joint Optimization:
Primary diffusion loss ensures high-quality generation
CAPP loss ensures better audio-pose alignment
Weighted combination allows control of importance
Training Insights:
CopyWithout CAPP:
- Only optimizes for motion prediction
- No explicit audio-pose alignment objective
With CAPP:
- Direct feedback on alignment quality
- Better learning of natural head movements
- Improved synchronization with speech patterns
Validation Benefits:
CAPP score provides quantitative metric
Helps identify best checkpoints
Better model selection criteria