Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
221 changes: 198 additions & 23 deletions scripts/audio2face_3d_api_client/README.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,225 @@
# Sample Application connecting to Audio2Face-3D NIM hosted on NVCF
# Audio2Face-3D NIM API Client

A sample Python application to showcase the Audio2Face-3D NIM hosted on NVIDIA Cloud Functions (NVCF).
A sample Python application to showcase the Audio2Face-3D NIM hosted on NVIDIA Cloud Functions (NVCF). This client demonstrates how to send audio files and receive facial animation blendshapes data using NVIDIA's Audio2Face-3D API.

## Getting started
## 📋 Table of Contents

Start by creating a python venv using:
- [Features](#features)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Usage](#usage)
- [Command Line Interface](#command-line-interface)
- [Gradio Web Interface](#gradio-web-interface)
- [Configuration](#configuration)
- [Available Models](#available-models)
- [Sample Audio Files](#sample-audio-files)
- [Output](#output)
- [Project Structure](#project-structure)
- [License](#license)

## ✨ Features

- **CLI Client**: Command-line interface for batch processing audio files
- **Web Interface**: Interactive Gradio-based web UI for real-time testing
- **Multiple Character Models**: Support for Mark, Claire, and James stylization models
- **Emotion Control**: Configurable emotion parameters for animation generation
- **Blendshape Output**: ARKit-compatible blendshape weights export
- **Audio Streaming**: Efficient gRPC-based audio streaming

## 📦 Prerequisites

- Python 3.8+
- NVIDIA API Key (from [NVIDIA API Catalog](https://build.nvidia.com/))
- Function ID for the Audio2Face-3D API

## 🚀 Installation

### 1. Create a Virtual Environment

```bash
python3 -m venv .venv
source .venv/bin/activate
```

Then install the required dependencies:
### 2. Install Dependencies

```bash
pip3 install -r requirements
pip3 install ../../proto/sample_wheel/nvidia_ace-1.2.0-py3-none-any.whl
```

Note: This wheel is compatible with Audio2Face-3D NIM 1.3
> **Note**: The `nvidia_ace-1.2.0` wheel is compatible with Audio2Face-3D NIM 1.3

### Dependencies

| Package | Version | Purpose |
|---------|---------|---------|
| numpy | 1.26.4 | Numerical operations |
| scipy | 1.13.0 | Audio file I/O |
| grpcio | 1.72.0rc1 | gRPC communication |
| protobuf | 4.24.1 | Protocol buffers |
| PyYAML | 6.0.1 | Configuration parsing |
| pandas | 2.2.2 | Data manipulation |
| gradio | 6.0.1 | Web interface |
| opencv-python-headless | 4.12.0.88 | Image processing |

## 💻 Usage

### Command Line Interface

Run the CLI client with the following command:

```bash
python3 ./nim_a2f_3d_client.py <audio_file.wav> <config.yml> --apikey <API_KEY> --function-id <Function_ID>
python3 ./nim_a2f_3d_client.py <audio_file.wav> <config.yml> --apikey <API_KEY> --function-id <FUNCTION_ID>
```

By Default:
#### Example

```bash
python3 ./nim_a2f_3d_client.py ../../example_audio/Claire_neutral.wav config/config_claire.yml --apikey <API_KEY> --function-id <Function_ID>
python3 ./nim_a2f_3d_client.py \
../../example_audio/Claire_neutral.wav \
config/config_claire.yml \
--apikey nvapi-xxxxxxxxxxxx \
--function-id 0961a6da-fb9e-4f2e-8491-247e5fd7bf8d
```

The scripts takes four mandatory parameters, an audio file at format PCM 16 bits,
a yaml configuration file for the emotions parameters, the API Key generated by API Catalogue, and the Function ID
used to access the API function.
#### Arguments

| Argument | Required | Description |
|----------|----------|-------------|
| `file` | ✅ | PCM 16-bit single channel audio file in WAV format |
| `config` | ✅ | YAML configuration file for inference parameters |
| `--apikey` | ✅ | NGC API Key from NVIDIA API Catalog |
| `--function-id` | ✅ | Function ID for the specific character model |

### Gradio Web Interface

Launch the interactive web interface:

```bash
Comment on lines +98 to +99
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instructions for launching the web interface are incomplete. The app requires an API key to be set (hardcoded in the current implementation, but should be via environment variable). Add documentation about setting the API key before running:

Launch the interactive web interface:

**Prerequisites**: Set your NVIDIA API key as an environment variable:
```bash
export NVIDIA_API_KEY=your-api-key-here

Then launch the app:

python3 ./app.py
Suggested change
```bash
**Prerequisites**: Set your NVIDIA API key as an environment variable:
```bash
export NVIDIA_API_KEY=your-api-key-here

Then launch the app:

Copilot uses AI. Check for mistakes.
python3 ./app.py
```

The web interface provides:
- Drag-and-drop audio upload
- Sample audio selection
- Real-time emotion parameter adjustment
- Visual blendshape output preview
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing information about visualization video output. The web interface generates a visualization video (as shown in app.py lines 510-517), but this feature is not mentioned in the interface capabilities list. Consider adding:

- Visual blendshape animation preview video
Suggested change
- Visual blendshape output preview
- Visual blendshape output preview
- Visual blendshape animation preview video

Copilot uses AI. Check for mistakes.
- CSV export functionality
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description "CSV export functionality" is misleading. Based on the code in app.py (lines 502-508), the web interface exports both CSV and JSON formats. Consider updating to:

- Downloadable animation data in JSON and CSV formats
Suggested change
- CSV export functionality
- Downloadable animation data in JSON and CSV formats

Copilot uses AI. Check for mistakes.

## ⚙️ Configuration

Configuration files are located in the `config/` directory:

- `config_claire.yml` - Claire character settings
- `config_james.yml` - James character settings
- `config_mark.yml` - Mark character settings

### Face Parameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| `upperFaceStrength` | Range of motion for upper face | 1.0 |
| `upperFaceSmoothing` | Temporal smoothing for upper face | 0.001 |
| `lowerFaceStrength` | Range of motion for lower face | 1.25 |
| `lowerFaceSmoothing` | Temporal smoothing for lower face | 0.006 |
| `faceMaskLevel` | Boundary between upper/lower regions | 0.6 |
| `faceMaskSoftness` | Blend smoothness between regions | 0.0085 |
| `skinStrength` | Range of motion for skin | 1.0 |
| `eyelidOpenOffset` | Default eyelid pose adjustment | 0.0 |
| `lipOpenOffset` | Default lip pose adjustment | 0.0 |

### Blendshape Parameters

--apikey for the API Key generated through the API Catalogue
--function-id for the Function ID provided to access the API function.
The configuration supports ARKit-compatible blendshape multipliers and offsets. See [Apple ARKit documentation](https://developer.apple.com/documentation/arkit/arfaceanchor/blendshapelocation) for more details.

Comment on lines +134 to 135
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blendshape parameters section states "The configuration supports ARKit-compatible blendshape multipliers and offsets" but doesn't provide any examples or explain what these parameters do. Consider adding a brief explanation:

The configuration supports ARKit-compatible blendshape multipliers and offsets:
- **Multipliers**: Scale the intensity of each blendshape (0.0 = disabled, 1.0 = full strength)
- **Offsets**: Add a constant bias to blendshape values (typically 0.0)

See [Apple ARKit documentation](https://developer.apple.com/documentation/arkit/arfaceanchor/blendshapelocation) for more details.
Suggested change
The configuration supports ARKit-compatible blendshape multipliers and offsets. See [Apple ARKit documentation](https://developer.apple.com/documentation/arkit/arfaceanchor/blendshapelocation) for more details.
The configuration supports ARKit-compatible blendshape multipliers and offsets:
- **Multipliers**: Scale the intensity of each blendshape (e.g., `0.0` = disabled, `1.0` = full strength)
- **Offsets**: Add a constant bias to blendshape values (typically `0.0`)
**Example:**
```json
{
"blendshapeMultipliers": {
"jawOpen": 1.0,
"eyeBlinkLeft": 0.8
},
"blendshapeOffsets": {
"mouthSmileLeft": 0.1
}
}

See Apple ARKit documentation for more details.

Copilot uses AI. Check for mistakes.
## What does this example do?
## 🎭 Available Models

### With Tongue Animation

| Character | Function ID |
|-----------|-------------|
| Mark | `8efc55f5-6f00-424e-afe9-26212cd2c630` |
| Claire | `0961a6da-fb9e-4f2e-8491-247e5fd7bf8d` |
| James | `9327c39f-a361-4e02-bd72-e11b4c9b7b5e` |

### Legacy (Without Tongue Animation)

| Character | Function ID |
|-----------|-------------|
| Mark | `cf145b84-423b-4222-bfdd-15bb0142b0fd` |
| Claire | `617f80a7-85e4-4bf0-9dd6-dcb61e886142` |
| James | `8082bdcb-9968-4dc5-8705-423ea98b8fc2` |

## 🎵 Sample Audio Files

Sample audio files are available in `../../example_audio/`:

| File | Description |
|------|-------------|
| `Claire_neutral.wav` | Claire - Neutral emotion |
| `Claire_anger.wav` | Claire - Anger emotion |
| `Claire_joy_mandarin.wav` | Claire - Joy (Mandarin) |
| `Claire_sadness.wav` | Claire - Sadness emotion |
| `Claire_outofbreath_mandarin.wav` | Claire - Out of breath (Mandarin) |
| `Mark_neutral.wav` | Mark - Neutral emotion |
| `Mark_joy.wav` | Mark - Joy emotion |
| `Mark_anger.wav` | Mark - Anger emotion |
| `Mark_sadness.wav` | Mark - Sadness emotion |
| `Mark_outofbreath.wav` | Mark - Out of breath |

## 📤 Output

The application generates the following outputs:

1. **Blendshapes CSV**: Animation keyframes with blendshape names, values, and timecodes
2. **Emotions CSV**: Emotion data with timecodes
3. **Audio WAV**: Processed audio output (`out.wav`)

### Supported Emotions

- Amazement
- Anger
- Cheekiness
- Disgust
- Fear
- Grief
- Joy
- Out of Breath
- Pain
- Sadness

## 📁 Project Structure

```
audio2face_3d_api_client/
├── README.md # This file
├── nim_a2f_3d_client.py # CLI client script
├── app.py # Gradio web interface
├── requirements # Python dependencies
├── config/
│ ├── config_claire.yml # Claire model configuration
│ ├── config_james.yml # James model configuration
│ └── config_mark.yml # Mark model configuration
└── a2f_3d/
└── client/
├── auth.py # Authentication utilities
└── service.py # gRPC service handlers
```

## 🔄 How It Works

1. **Read Audio**: Loads audio data from a 16-bit PCM WAV file
2. **Load Config**: Parses emotion and face parameters from YAML configuration
3. **Stream Audio**: Sends audio data via gRPC to the Audio2Face-3D API
4. **Receive Animation**: Gets back blendshape weights, audio, and emotion data
5. **Export Results**: Saves animation keyframes and emotions to CSV files

## 📄 License

```
SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
```

1. Reads the audio data from a wav 16bits PCM file
2. Reads emotions and parameters from the yaml configuration file
3. Sends emotions, parameters and audio to the A2F-3D
4. Receives back blendshapes, audio and emotions
5. Saves blendshapes as animation key frames in a csv file with their name, value
and time codes
6. Same process for the emotion data.
7. Saves the received audio as out.wav (Should be the same as input audio)
Licensed under the Apache License, Version 2.0. See [LICENSE](http://www.apache.org/licenses/LICENSE-2.0) for details.
Loading