A sample Python application to showcase the Audio2Face-3D NIM hosted on NVIDIA Cloud Functions (NVCF). This client demonstrates how to send audio files and receive facial animation blendshapes data using NVIDIA's Audio2Face-3D API.
- Features
- Prerequisites
- Installation
- Usage
- Configuration
- Available Models
- Sample Audio Files
- Output
- Project Structure
- License
- CLI Client: Command-line interface for batch processing audio files
- Web Interface: Interactive Gradio-based web UI for real-time testing
- Multiple Character Models: Support for Mark, Claire, and James stylization models
- Emotion Control: Configurable emotion parameters for animation generation
- Blendshape Output: ARKit-compatible blendshape weights export
- Audio Streaming: Efficient gRPC-based audio streaming
- Python 3.8+
- NVIDIA API Key (from NVIDIA API Catalog)
- Function ID for the Audio2Face-3D API
python3 -m venv .venv
source .venv/bin/activatepip3 install -r requirements
pip3 install ../../proto/sample_wheel/nvidia_ace-1.2.0-py3-none-any.whlNote: The
nvidia_ace-1.2.0wheel is compatible with Audio2Face-3D NIM 1.3
| Package | Version | Purpose |
|---|---|---|
| numpy | 1.26.4 | Numerical operations |
| scipy | 1.13.0 | Audio file I/O |
| grpcio | 1.72.0rc1 | gRPC communication |
| protobuf | 4.24.1 | Protocol buffers |
| PyYAML | 6.0.1 | Configuration parsing |
| pandas | 2.2.2 | Data manipulation |
| gradio | 6.0.1 | Web interface |
| opencv-python-headless | 4.12.0.88 | Image processing |
Run the CLI client with the following command:
python3 ./nim_a2f_3d_client.py <audio_file.wav> <config.yml> --apikey <API_KEY> --function-id <FUNCTION_ID>python3 ./nim_a2f_3d_client.py \
../../example_audio/Claire_neutral.wav \
config/config_claire.yml \
--apikey nvapi-xxxxxxxxxxxx \
--function-id 0961a6da-fb9e-4f2e-8491-247e5fd7bf8d| Argument | Required | Description |
|---|---|---|
file |
✅ | PCM 16-bit single channel audio file in WAV format |
config |
✅ | YAML configuration file for inference parameters |
--apikey |
✅ | NGC API Key from NVIDIA API Catalog |
--function-id |
✅ | Function ID for the specific character model |
Launch the interactive web interface:
python3 ./app.pyThe web interface provides:
- Drag-and-drop audio upload
- Sample audio selection
- Real-time emotion parameter adjustment
- Visual blendshape output preview
- CSV export functionality
Configuration files are located in the config/ directory:
config_claire.yml- Claire character settingsconfig_james.yml- James character settingsconfig_mark.yml- Mark character settings
| Parameter | Description | Default |
|---|---|---|
upperFaceStrength |
Range of motion for upper face | 1.0 |
upperFaceSmoothing |
Temporal smoothing for upper face | 0.001 |
lowerFaceStrength |
Range of motion for lower face | 1.25 |
lowerFaceSmoothing |
Temporal smoothing for lower face | 0.006 |
faceMaskLevel |
Boundary between upper/lower regions | 0.6 |
faceMaskSoftness |
Blend smoothness between regions | 0.0085 |
skinStrength |
Range of motion for skin | 1.0 |
eyelidOpenOffset |
Default eyelid pose adjustment | 0.0 |
lipOpenOffset |
Default lip pose adjustment | 0.0 |
The configuration supports ARKit-compatible blendshape multipliers and offsets. See Apple ARKit documentation for more details.
| Character | Function ID |
|---|---|
| Mark | 8efc55f5-6f00-424e-afe9-26212cd2c630 |
| Claire | 0961a6da-fb9e-4f2e-8491-247e5fd7bf8d |
| James | 9327c39f-a361-4e02-bd72-e11b4c9b7b5e |
| Character | Function ID |
|---|---|
| Mark | cf145b84-423b-4222-bfdd-15bb0142b0fd |
| Claire | 617f80a7-85e4-4bf0-9dd6-dcb61e886142 |
| James | 8082bdcb-9968-4dc5-8705-423ea98b8fc2 |
Sample audio files are available in ../../example_audio/:
| File | Description |
|---|---|
Claire_neutral.wav |
Claire - Neutral emotion |
Claire_anger.wav |
Claire - Anger emotion |
Claire_joy_mandarin.wav |
Claire - Joy (Mandarin) |
Claire_sadness.wav |
Claire - Sadness emotion |
Claire_outofbreath_mandarin.wav |
Claire - Out of breath (Mandarin) |
Mark_neutral.wav |
Mark - Neutral emotion |
Mark_joy.wav |
Mark - Joy emotion |
Mark_anger.wav |
Mark - Anger emotion |
Mark_sadness.wav |
Mark - Sadness emotion |
Mark_outofbreath.wav |
Mark - Out of breath |
The application generates the following outputs:
- Blendshapes CSV: Animation keyframes with blendshape names, values, and timecodes
- Emotions CSV: Emotion data with timecodes
- Audio WAV: Processed audio output (
out.wav)
- Amazement
- Anger
- Cheekiness
- Disgust
- Fear
- Grief
- Joy
- Out of Breath
- Pain
- Sadness
audio2face_3d_api_client/
├── README.md # This file
├── nim_a2f_3d_client.py # CLI client script
├── app.py # Gradio web interface
├── requirements # Python dependencies
├── config/
│ ├── config_claire.yml # Claire model configuration
│ ├── config_james.yml # James model configuration
│ └── config_mark.yml # Mark model configuration
└── a2f_3d/
└── client/
├── auth.py # Authentication utilities
└── service.py # gRPC service handlers
- Read Audio: Loads audio data from a 16-bit PCM WAV file
- Load Config: Parses emotion and face parameters from YAML configuration
- Stream Audio: Sends audio data via gRPC to the Audio2Face-3D API
- Receive Animation: Gets back blendshape weights, audio, and emotion data
- Export Results: Saves animation keyframes and emotions to CSV files
SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0. See LICENSE for details.