VASA-1-hack

This repository contains the VASA implementation separated from EMOPortraits, with all components properly configured for standalone training.

Related Projects

Project	Description	Status
IMTalker	Built on my recreated Microsoft IMF paper - most promising direction, active development focused here	Active
IMF	Training code for Implicit Motion Function (Microsoft paper recreation)	Training
OmniTransfer-hack	LTX2 / OmniTransfer implementation (paper)	Experimental

Support This Project

Training video models requires significant GPU compute. If you find this work useful, please consider donating Vast.ai credits to help continue development.

Send Vast.ai credits to: jp@bellgeorge.com

vastai transfer credit jp@bellgeorge.com <AMOUNT>

Tier	Suggested Amount	What It Helps With
Buy Me a Coffee	$5-10	Quick experiments, bug fixes
Mates Rates	$25-50	A few hours of A100 training
Supporter	$100-250	Full training run (10k steps)
Enterprise	$500+	Multi-stage training, new features

Every contribution helps push this research forward. Thank you!

🎬 Training Progress

Live Training Dashboard: wandb.ai/snoozie/vasa-overfitting

Expression Transfer Visualization

The training visualization shows four panels demonstrating the expression transfer pipeline:

Panel	Description
Identity (Source)	The source identity image - the person whose appearance we want to preserve
Target	The driving video frame - provides the expression/pose we want to transfer
EMO Generated	Output from the EMOPortraits volumetric avatar model (baseline)
VASA Generated	Output from our VASA diffusion model - learns to predict motion parameters that drive expression transfer while preserving source identity

The green outline in the VASA output shows facial landmark detection used for loss computation. The goal is for VASA Generated to match the Target's expression while maintaining the Identity's appearance.

Audio-to-Expression Mapping

This visualization shows the audio-to-expression correlation during training, demonstrating how the model learns to map audio features to facial expressions for lip-sync.

Expression Prediction Target

This shows the target expression parameters that the model must learn to predict from audio alone. The expression embedding captures facial dynamics (mouth shape, eye openness, eyebrow position, etc.) frame-by-frame.

Successful Expression Transfer

When the model successfully predicts the expression parameters from audio, combined with the identity image, it recreates the target expression while preserving the source identity. This demonstrates the full pipeline working end-to-end.

🎯 Key Features

Clean separation of VASA motion generation from EMOPortraits volumetric rendering
Bridge interface for easy swapping of volumetric avatar backends
XY/UV warping system for expression transfer and canonical view generation
Efficient caching with single-bucket preprocessing
Multi-mode training support (overfitting, full dataset)

Setup Instructions

MCP Server Setup (for Claude integration):

# Add Weights & Biases MCP server for Claude
claude mcp add wandb -- uvx --from git+https://github.com/wandb/wandb-mcp-server wandb_mcp_server && uvx wandb login

Clone the repository with submodules:

# Clone with submodules included
git clone --recurse-submodules https://github.com/johndpope/VASA-1-hack.git
cd VASA-1-hack

# Or if you already cloned without submodules:
git submodule update --init --recursive

Prerequisites

# Install system dependencies
sudo apt-get update
sudo apt-get install -y ffmpeg git-lfs

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
chmod +x ~/miniconda.sh
~/miniconda.sh
# carefully accept - type yes - 

# Create conda environment
conda create -n vasa python=3.12
conda activate vasa

# Install PyTorch (adjust for your CUDA version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129

# Install required packages
pip install omegaconf wandb opencv-python-headless pillow scipy matplotlib tqdm
pip install transformers diffusers accelerate einops
pip install facenet-pytorch insightface hsemotion-onnx
pip install mediapipe OmegaConf wandb
pip install memory-profiler rich
pip install diffusers h5py scikit-learn seaborn python_speech_features
pip install onnxruntime-gpu lpips pytorch_msssim

# EMOPortaits
cd nemo
chmod +x ./bootstrap.sh
./bootstrap.sh

Create necessary symlinks:

# Create symlink for repos (required for relative paths)
ln -s nemo/repos repos

# Create symlink for data directory (required for aligned keypoints)
ln -s nemo/data data

# Create symlink for losses directory (required for loss model weights)
ln -s nemo/losses losses

Download pre-trained volumetric avatar model:

The pre-trained model should be placed in:

nemo/logs/Retrain_with_17_V1_New_rand_MM_SEC_4_drop_02_stm_10_CV_05_1_1/checkpoints/328_model.pth

Prepare your training data:

# Create directories
mkdir -p junk cache checkpoints

# Place your training videos in the junk directory
# Videos should be .mp4 format
cp your_training_videos/*.mp4 junk/

📁 Project Structure

VASA-1-hack/
├── nemo/                        # Git submodule: nemo repository (base EMOPortraits code)
│   ├── models/                  # Model implementations
│   ├── networks/                # Network architectures
│   ├── losses/                  # Loss functions
│   ├── datasets/                # Dataset loaders
│   ├── repos/                   # External repositories (face_par_off, etc.)
│   └── logs/                    # Pre-trained model checkpoints
│
├── vasa_*.py                    # VASA-specific implementations
│   ├── vasa_trainer.py          # Main training script
│   ├── vasa_model.py            # VASA model architecture
│   ├── vasa_dataset.py          # VASA dataset handler
│   ├── vasa_scheduler.py        # Diffusion scheduler
│   └── vasa_lip_normalizer.py   # Lip normalization utilities
│
├── vasa_config.yaml             # Main configuration file
├── video_tracker.py             # Video tracking utilities
├── syncnet.py                   # Sync network implementation
│
├── data/                        # Data files
│   └── aligned_keypoints_3d.npy
├── losses/                      # Loss model weights
│   └── loss_model_weights/
├── junk/                        # Training videos directory
├── cache/                       # Cache for processed data
├── checkpoints/                 # Model checkpoints
└── repos/                       # Symlink to nemo/repos

⚙️ Configuration

Edit vasa_config.yaml to configure paths and training parameters:

paths:
  volumetric_model: "nemo/logs/[...]/328_model.pth"  # Pre-trained model
  volumetric_config: "nemo/models/stage_1/volumetric_avatar/va.yaml"
  data_dir: "data"
  video_folder: "junk"  # Your training videos directory
  cache_dir: "cache"
  checkpoint_dir: "checkpoints"

train:
  batch_size: 1
  num_epochs: 4000
  lr: 1e-3
  # ... other training parameters

🏃 Running Training

Test the Setup

python test_vasa_setup.py

Expected output:

✓ Config loaded successfully
✓ All paths exist
✓ All modules import correctly
✓ Setup looks good! You can now run vasa_trainer.py

Training Modes

1. Quick Start - Overfitting Test (Recommended First)

Test your setup and verify model can train properly:

# Run overfitting test with optimized settings
python train_overfit.py

This uses overfit_config.yaml with:

Single-bucket caching for fast data loading
Face attribute caching (gaze, emotion, head_distance)
Optimized batch sizes and learning rates
WandB integration for monitoring
Automatic checkpoint resumption

2. Vanilla Training (Full Dataset)

Use the standard configuration for training on your complete dataset:

# Uses vasa_config.yaml by default
python vasa_trainer.py

# Or explicitly specify the config
python vasa_trainer.py --config vasa_config.yaml

Key parameters in vasa_config.yaml:

window_size: 50 - Full 50-frame windows
n_layers: 8 - Full 8 transformer layers
num_steps: 1000 - Full 1000 diffusion steps
batch_size: 1 - Adjust based on GPU memory
num_epochs: 4000 - Full training schedule

3. Advanced Overfitting (With Custom Config)

Use the overfitting configuration via vasa_trainer:

# Use the overfitting configuration with vasa_trainer
python vasa_trainer.py --config overfit_config.yaml

Key differences in overfit_config.yaml:

window_size: 20 - Smaller windows for faster processing
n_layers: 2 - Reduced transformer depth (2x-4x faster)
num_steps: 100 - Reduced diffusion steps (10x faster)
batch_size: 4 - Larger batch for better GPU utilization
num_epochs: 100 - Shorter training for quick iteration
max_videos: 100 - Limited dataset size
num_workers: 8 - Multi-threaded data loading
No augmentation - Pure overfitting test

When to use overfitting mode:

Testing new model architectures
Debugging training pipeline
Verifying data loading and caching
Quick convergence tests
Checking if model can overfit to small dataset (sanity check)

Data Preprocessing (Optional but Recommended)

For faster training, preprocess all windows into a single cache file:

# Preprocess data for overfitting test (small dataset)
python preprocess_single_bucket.py --max_videos 100 --cache_dir cache_overfit

# Preprocess full dataset
python preprocess_single_bucket.py --max_videos 1000 --cache_dir cache_full

Benefits of single-bucket caching:

10x faster data loading - Direct index access to any window
Face attributes cached - Gaze, emotion, head_distance pre-computed
Better shuffling - Perfect for random sampling
Memory efficient - One H5 file instead of many
Self-contained windows - Context is cached, no video dependencies

The cache will be automatically used if:

use_single_bucket: true in your config file
The cache file exists in the specified cache_dir

Monitoring Training

Both training modes support WandB logging:

# View training progress
# Visit the URL printed at training start, e.g.:
# wandb: 🚀 View run at https://wandb.ai/your-username/vasa/runs/run-id

For overfitting mode, runs are grouped as "overfit-experiments" in WandB for easy comparison.

Custom Dataset Path

To use a different dataset (e.g., CelebV-HQ):

# Edit the config file or create a custom one
# Update video_folder path in the config:
# video_folder: "/path/to/your/dataset"

# For example, using CelebV-HQ:
# video_folder: "/media/12TB/Downloads/CelebV-HQ/celebvhq/35666"

The trainer will:

Load the pre-trained volumetric avatar model
Process videos from the configured directory
Cache processed windows for faster subsequent epochs
Save checkpoints periodically based on save_freq
Save checkpoints to checkpoints/ (or checkpoints_overfit/ for overfitting mode)
Log to Weights & Biases (if enabled)

Performance Comparison

Parameter	Vanilla Training	Overfitting Mode	Speedup
Window Size	50 frames	20 frames	2.5x
Transformer Layers	8	2	4x
Diffusion Steps	1000	100	10x
Batch Size	1	4	4x
Workers	0	8	Parallel loading
Epoch Time (RTX 5090)	~5 min	~1.5 min	3.3x

🔍 Debugging Tools

Pipeline Debug Scripts

The project includes several debugging pipelines for analyzing face swap and identity preservation issues:

1. pipeline3.py - Advanced Debug Pipeline

# Test with video (uses joint extraction to prevent identity drift)
python nemo/pipeline3.py --target nemo/data/VID_1.mp4 --max-frames 10

# Test with single image
python nemo/pipeline3.py --target nemo/data/IMG_2.png

# Use custom source identity
python nemo/pipeline3.py --source path/to/source.png --target path/to/target.mp4

# Swap identity mode (use driver's identity with source's expression)
python nemo/pipeline3.py --default-video --swap-identity

# This is useful when the model is extracting the wrong identity

Features:

Joint extraction: Processes source+first_driver_frame together to calibrate embeddings
Identity swapping: --swap-identity flag to use driver's identity with source's expression
Comprehensive tracing: Every step logged with images and tensors
Comparison grids: Side-by-side visualization of results
Warp visualization: XY/UV warp magnitude heatmaps
Debug output: All intermediates saved to debug_pipeline3/

2. pipeline2.py - Reference Implementation

# The reference pipeline that produces correct results
python nemo/pipeline2.py

This is the baseline implementation that pipeline3.py was designed to match.

3. Debug Analysis Scripts

Various analysis scripts for specific debugging:

check_identity_confusion.py - Analyze identity preservation
debug_identity_extraction.py - Test identity feature extraction
test_polished_face_swap.py - Test face swap quality
extract_and_apply_warps_properly.py - Analyze warp field application

Understanding XY/UV Warps

The volumetric avatar system uses two types of warps:

XY Warps (Rigid + Non-rigid 3D warping)
- Transform from posed face → canonical (neutral) space
- Removes head pose and expression from source
- Creates identity-preserving canonical volume
UV Warps (Expression transfer)
- Transform from canonical → target expression
- Applies target's expression and pose
- Preserves source identity while adopting target motion

Common Issues and Solutions

Identity Drift

Problem: Generated face morphs away from source identity Cause: Solo extraction (processing source alone without driver context) Solution: Joint extraction - process source+first_driver_frame together

Feminine Appearance on Male Faces

Problem: Male faces (e.g., IMG_1.png) appear feminine in results Cause: Identity embeddings not properly calibrated to driver motion space Solution: Joint extraction ensures embeddings are aligned with driver poses

Debugging Output Structure

debug_pipeline3/
├── trace_YYYYMMDD_HHMMSS.json    # Complete execution trace
├── step_NNNN_*.png                # Intermediate images at each step
├── step_NNNN_*.pt                 # Tensor checkpoints
├── frame_NNN_result.png           # Final output frames
└── video_comparison.png           # Grid comparison of all frames

Trace Analysis

The trace files contain detailed information about each processing step:

Entry/exit points for all major functions
Tensor shapes and statistics
Mask generation and compositing steps
Warp field generation and application

🔄 Warping System: XY vs UV Warps

The VASA model uses a sophisticated two-stage warping system to separate identity from expression, enabling clean expression transfer between faces.

Understanding XY and UV Warps

XY Warps (Source/Canonical Space)

Coordinate System: XY refers to spatial coordinates (X=width, Y=height) in the 3D volume space (16×64×64 grid)
Direction: FROM current expression → TO canonical (neutral)
Purpose: Expression normalization - removes the current expression to get back to a neutral state
Effect: "Undoes" expressions (e.g., moves smiling mouth corners back to neutral positions)
Applied to: The source volume before any target expression is added

UV Warps (Target/Texture Space)

Coordinate System: UV uses texture/surface coordinates (0-1 normalized range)
Direction: FROM canonical → TO target expression
Purpose: Expression application - adds the desired expression to the neutral volume
Effect: Deforms canonical volume to create new expressions (smile, frown, surprise, etc.)
Applied to: The volume after XY warping (canonical state)

The Two-Stage Pipeline

Source Face (😊) → [XY Warp] → Canonical (😐) → [UV Warp] → Target Face (😮)

Stage 1 (XY Warping): Normalizes any expression to canonical
Stage 2 (UV Warping): Applies target expression to canonical

This separation enables:

Clean expression transfer between any source and target
Identity preservation while changing expressions
Consistent canonical representation for all faces

Warp Extraction in Training

The warps are extracted during dataset preprocessing:

# In vasa_dataset.py - extract warps for training
motion_data = {
    'xy_warps': xy_warps,      # [T, 16, 64, 64, 3] - normalizes to canonical
    'rigid_warps': rigid_warps,  # [T, 16, 64, 64, 3] - head pose alignment
    'uv_warps': uv_warps,       # [T, 16, 64, 64, 3] - applies target expression
    'source_theta': thetas      # [T, 3, 4] - pose matrices
}

🌉 Bridge Interface Architecture

To cleanly separate VASA from the volumetric avatar implementation, we've developed a bridge interface that abstracts all EMOPortraits-specific details.

Core Components

1. VolumetricAvatarBridgeInterface (`vasa_emo_bridge_interface.py`)

Abstract interface that any volumetric avatar backend must implement:

class VolumetricAvatarBridgeInterface:
    def extract_warps_for_window(frames, identity_frame_idx) -> WindowWarpData
    def extract_warps_for_frame(identity_frame, target_frame) -> FrameWarpData
    def generate_canonical_view(identity_frame) -> canonical_image
    def get_identity_embedding(identity_frame) -> identity_embed

2. EMOPortraitsBridge

Concrete implementation for EMOPortraits/MegaPortraits models:

Handles all model-specific details internally
Provides clean warp extraction API
Manages caching for efficiency
Supports batch processing for entire windows

Usage Example

from vasa_emo_bridge_interface import create_bridge

# Create bridge (abstracts all EMO details)
bridge = create_bridge("emoportraits", emo_model)

# Extract warps for entire window at once
window_warps = bridge.extract_warps_for_window(
    frames=frames,           # [T, C, H, W]
    identity_frame_idx=0     # Use first frame as identity
)

# Access extracted warps
xy_warps = window_warps.xy_warps        # [T, D, H, W, 3]
rigid_warps = window_warps.rigid_warps  # [T, D, H, W, 3]
uv_warps = window_warps.uv_warps        # [T, D, H, W, 3]

# Generate canonical view
canonical = bridge.generate_canonical_view(identity_frame)

Benefits of the Bridge Pattern

Clean Separation: VASA code doesn't need to know EMOPortraits internals
Easy Swapping: Can replace volumetric backend without changing VASA
Batch Efficiency: Process entire windows at once
Automatic Caching: Identity embeddings cached automatically
Type Safety: Clear data structures with type hints

🎭 Canonical View Generation

The system can generate canonical (neutral, front-facing) views from any input expression:

What is a Canonical View?

A canonical view represents a person in a standardized state:

Neutral expression (no smile, closed mouth)
Front-facing pose (no head rotation)
Consistent lighting and appearance

How It Works

Extract identity embedding from the source frame
Create canonical pose (identity matrix = no rotation)
Process through volumetric model to get canonical volume
Decode with minimal warping to get neutral view

Applications

Reference frame generation for consistent motion synthesis
Expression normalization for training
Identity preservation during expression transfer
Quality evaluation of the volumetric model

Example Results

When given different expressions as input, the canonical generation produces nearly identical neutral views:

Average difference between canonical views: < 0.1 (excellent consistency)
Identity fully preserved
All expressions normalized to neutral

📝 Logging Configuration

Logging Levels (nemo/logger.py)

The project uses Python's logging module with three configurable levels defined in nemo/logger.py:28-30:

# log_level = logging.WARNING    # Minimal output - only warnings and errors
log_level = logging.INFO         # Standard output - informational messages (default)
# log_level = logging.DEBUG       # Verbose output - detailed debugging information

Logging Levels Explained:

WARNING (logging.WARNING)
- Shows only warnings, errors, and critical messages
- Use when you want minimal console output during training
- Best for production runs where you only need to know about issues
INFO (logging.INFO) - Currently Active
- Shows informational messages, warnings, and errors
- Provides training progress, epoch updates, and key metrics
- Default and recommended level for normal training runs
- Balances visibility with readability
DEBUG (logging.DEBUG)
- Shows all messages including detailed debugging information
- Includes tensor shapes, gradient information, and internal state
- Use when troubleshooting model issues or understanding data flow
- Can be verbose - recommended only for debugging sessions

To change the logging level:

Edit nemo/logger.py line 29
Uncomment the desired level and comment out the others
The change takes effect on next run

Additional Features:

Logs are saved to project.log file for later review
Rich formatting with color-coded output and timestamps
Third-party library logging is suppressed to reduce noise
TorchDebugger class available for advanced PyTorch debugging

🔧 Troubleshooting

Common Issues and Solutions

ModuleNotFoundError: No module named 'logger'

# The logger module is in nemo, paths are already configured
# If still having issues, check that nemo is cloned properly

FileNotFoundError: './repos/face_par_off/res/cp/79999_iter.pth'
```
# Ensure the symlink exists:
ln -s nemo/repos repos
```
ValueError: num_samples should be a positive integer value, but got num_samples=0
```
# No videos found. Add videos to junk/ directory:
cp your_video.mp4 junk/
```
FileNotFoundError: Config file not found at channel_config.yaml
```
# Copy from EMOPortraits or create a basic one
```
CUDA out of memory
- Reduce batch_size in vasa_config.yaml
- Enable gradient checkpointing
- Reduce sequence_length in dataset config
FFmpeg warnings
- These can be safely ignored if not processing audio
- To fix: pip install ffmpeg-python

Required Files from EMOPortraits

If you're missing files, you'll need these from EMOPortraits:

channel_config.yaml - Channel configuration
syncnet.py - Sync network implementation
data/aligned_keypoints_3d.npy - 3D keypoint alignments
losses/loss_model_weights/*.pth - Pre-trained loss models
Pre-trained volumetric avatar checkpoint

📊 Monitoring Training

Training progress is logged to:

Console: Real-time training metrics
Weights & Biases: Detailed metrics and visualizations (if enabled)
Checkpoints: Saved every N epochs to checkpoints/

Monitor training:

# Watch training logs
tail -f project.log

# Check W&B dashboard
# https://wandb.ai/YOUR_USERNAME/vasa/

🛠️ Development

Project Organization

VASA-specific code: Root directory (vasa_*.py)
Base EMOPortraits code: nemo/ directory
Configuration: vasa_config.yaml
Training data: junk/ directory
Model outputs: checkpoints/ directory

Key Improvements Made

Separated VASA components from EMOPortraits codebase
Fixed all hardcoded paths to be relative or configurable
Proper module imports with sys.path management
Configurable paths via vasa_config.yaml
Auto-detection of project directories in nemo code
Clean separation between VASA-specific and base code

Working with the Submodule

Update nemo to latest version:

cd nemo
git pull origin main
cd ..
git add nemo
git commit -m "Update nemo submodule to latest"

Lock to specific nemo version:

cd nemo
git checkout <commit-hash>
cd ..
git add nemo
git commit -m "Lock nemo to specific version"

📝 Notes

The volumetric model must be pre-trained (from EMOPortraits)
Training requires at least one video in the junk/ directory
All paths in configs are relative to the project root
The repos symlink is required for backward compatibility

🚨 Known Issues

Training requires significant GPU memory (recommended: 24GB+)
Some imports show FFmpeg warnings (can be ignored)
Initial dataset processing can be slow (cached afterward)

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Note: The nemo submodule and other dependencies may have their own licenses.

🙏 Acknowledgments

EMOPortraits team for the base implementation
VASA paper authors for the architecture design
Contributors to the nemo repository

Name		Name	Last commit message	Last commit date
Latest commit History 272 Commits
.claude		.claude
.cursor		.cursor
.taskmaster		.taskmaster
L2CS-Net @ c75d5eb		L2CS-Net @ c75d5eb
Synchformer @ c07305f		Synchformer @ c07305f
assets		assets
bad_videos		bad_videos
cache_per_video		cache_per_video
cache_single_bucket		cache_single_bucket
debug_pipeline2		debug_pipeline2
debug_pipeline4/json		debug_pipeline4/json
docs		docs
expression_audit		expression_audit
expression_audit_overfit		expression_audit_overfit
face-detection @ 786fbab		face-detection @ 786fbab
nemo @ 404692e		nemo @ 404692e
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.mcp.json		.mcp.json
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
analyze_va_motion_space.py		analyze_va_motion_space.py
append_expressions.py		append_expressions.py
au_extractor.py		au_extractor.py
au_to_landmark.py		au_to_landmark.py
audit.sh		audit.sh
audit_expressions.py		audit_expressions.py
audit_log.txt		audit_log.txt
blink_condition_handler.py		blink_condition_handler.py
build_expr_db.sh		build_expr_db.sh
build_expression_db.py		build_expression_db.py
bulk_process_videos.py		bulk_process_videos.py
channel_config.yaml		channel_config.yaml
check_cached_expressions.py		check_cached_expressions.py
check_identity_confusion.py		check_identity_confusion.py
check_phoneme_vocab.py		check_phoneme_vocab.py
check_srt_emo.py		check_srt_emo.py
clean_bad_videos.py		clean_bad_videos.py
convert_to_per_video_cache.py		convert_to_per_video_cache.py
create_video_face_swap.py		create_video_face_swap.py
data		data
debug_emo_generation.py		debug_emo_generation.py
debug_expression_variation.py		debug_expression_variation.py
debug_identity_extraction.py		debug_identity_extraction.py
debug_identity_image.py		debug_identity_image.py
debug_inference.py		debug_inference.py
debug_log.txt		debug_log.txt
debug_motion.py		debug_motion.py
debug_pipeline_simple.py		debug_pipeline_simple.py
debug_test3.py		debug_test3.py
debug_theta.py		debug_theta.py
debug_warps.py		debug_warps.py
demo_canonical_generation.py		demo_canonical_generation.py
demo_warp_to_target_expression.py		demo_warp_to_target_expression.py
diagnose_au.py		diagnose_au.py
diagnose_au.sh		diagnose_au.sh
diagnose_cache.py		diagnose_cache.py
diagnose_cache_complete.py		diagnose_cache_complete.py
diagnose_phoneme.py		diagnose_phoneme.py
diagnose_phoneme.sh		diagnose_phoneme.sh
diagnose_theta.py		diagnose_theta.py
diagnose_va_usage.py		diagnose_va_usage.py
download_face_parsing_weights.py		download_face_parsing_weights.py
download_synchformer_weights.sh		download_synchformer_weights.sh
expression_db.py		expression_db.py
extract_and_apply_warps_properly.py		extract_and_apply_warps_properly.py
extract_and_cache_gt_expression.py		extract_and_cache_gt_expression.py
extract_and_cache_gt_theta.py		extract_and_cache_gt_theta.py
frame_disk_cache.py		frame_disk_cache.py
generate_single_frame_with_cached_warps.py		generate_single_frame_with_cached_warps.py
generate_target_image.py		generate_target_image.py
infer_with_cached_gt_theta.py		infer_with_cached_gt_theta.py
inference_log.txt		inference_log.txt
inspect_cache.py		inspect_cache.py
inspect_emo_frames.py		inspect_emo_frames.py
inspect_face_attrs.py		inspect_face_attrs.py
lipsync_validation_methods.py		lipsync_validation_methods.py
logger.py		logger.py
logs		logs
loss_monitor.py		loss_monitor.py
losses		losses
merge_talkvid_audio_video.py		merge_talkvid_audio_video.py
models		models
mohattention.py		mohattention.py
monitor_cache.sh		monitor_cache.sh
monitor_contributions.py		monitor_contributions.py
motion_sequence_handler.py		motion_sequence_handler.py
overfit_config.yaml		overfit_config.yaml
per_video_cache.py		per_video_cache.py
preprocess_per_video.py		preprocess_per_video.py
preprocess_single_bucket.py		preprocess_single_bucket.py
profile_memory.py		profile_memory.py
recover_frame_cache.sh		recover_frame_cache.sh
recreate_cache_with_img1.py		recreate_cache_with_img1.py
recreate_middle_frame_fixed.py		recreate_middle_frame_fixed.py
recreate_middle_frame_with_img1.py		recreate_middle_frame_with_img1.py
repos		repos
reprocess_missing.sh		reprocess_missing.sh

License

johndpope/VASA-1-hack

Folders and files

Latest commit

History

Repository files navigation

VASA-1-hack

Related Projects

Support This Project

🎬 Training Progress

Expression Transfer Visualization

Audio-to-Expression Mapping

Expression Prediction Target

Successful Expression Transfer

🎯 Key Features

Setup Instructions

Prerequisites

📁 Project Structure

⚙️ Configuration

🏃 Running Training

Test the Setup

Training Modes

1. Quick Start - Overfitting Test (Recommended First)

2. Vanilla Training (Full Dataset)

3. Advanced Overfitting (With Custom Config)

Data Preprocessing (Optional but Recommended)

Monitoring Training

Custom Dataset Path

Performance Comparison

🔍 Debugging Tools

Pipeline Debug Scripts

1. pipeline3.py - Advanced Debug Pipeline

2. pipeline2.py - Reference Implementation

3. Debug Analysis Scripts

Understanding XY/UV Warps

Common Issues and Solutions

Identity Drift

Feminine Appearance on Male Faces

Debugging Output Structure

Trace Analysis

🔄 Warping System: XY vs UV Warps

Understanding XY and UV Warps

XY Warps (Source/Canonical Space)

UV Warps (Target/Texture Space)

The Two-Stage Pipeline

Warp Extraction in Training

🌉 Bridge Interface Architecture

Core Components

1. VolumetricAvatarBridgeInterface (vasa_emo_bridge_interface.py)

2. EMOPortraitsBridge

Usage Example

Benefits of the Bridge Pattern

🎭 Canonical View Generation

What is a Canonical View?

How It Works

Applications

Example Results

📝 Logging Configuration

Logging Levels (nemo/logger.py)

🔧 Troubleshooting

Common Issues and Solutions

Required Files from EMOPortraits

📊 Monitoring Training

🛠️ Development

Project Organization

Key Improvements Made

Working with the Submodule

📝 Notes

🚨 Known Issues

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

1. VolumetricAvatarBridgeInterface (`vasa_emo_bridge_interface.py`)

Packages