Skip to content

Enhance README and add Gradio web interface for Audio2Face-3D API client#9

Open
Yash-Kavaiya wants to merge 1 commit intoNVIDIA:mainfrom
Yash-Kavaiya:main
Open

Enhance README and add Gradio web interface for Audio2Face-3D API client#9
Yash-Kavaiya wants to merge 1 commit intoNVIDIA:mainfrom
Yash-Kavaiya:main

Conversation

@Yash-Kavaiya
Copy link

Improve the README to provide comprehensive information about the Audio2Face-3D API client, including features, installation instructions, and usage examples. Introduce a Gradio web interface for real-time interaction and testing of the API functionalities. Add necessary dependencies for the web interface and enhance the overall structure of the documentation.

Copilot AI review requested due to automatic review settings December 1, 2025 18:56
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request enhances the Audio2Face-3D API client by introducing a comprehensive Gradio-based web interface and significantly improving documentation. The PR adds interactive functionality for testing and using the Audio2Face-3D API, making it more accessible to users who prefer a visual interface over command-line tools.

Key Changes

  • New Gradio Web Interface: Adds app.py with a full-featured web UI including audio upload/recording, sample audio selection, real-time emotion parameter adjustment, and visualization video generation
  • Enhanced Documentation: Comprehensive README overhaul with detailed installation instructions, usage examples, configuration details, and well-structured sections covering all aspects of the API client
  • Additional Dependencies: Introduces Gradio and OpenCV for web interface and visualization capabilities

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 23 comments.

File Description
scripts/audio2face_3d_api_client/requirements Adds Gradio 6.0.1 and opencv-python-headless 4.12.0.88 dependencies for web interface support
scripts/audio2face_3d_api_client/app.py New 637-line Gradio application providing interactive web interface with audio processing, emotion control, blendshape visualization, and multi-format output
scripts/audio2face_3d_api_client/README.md Major documentation enhancement from 50 to 225 lines, adding structured sections for features, installation, usage examples, configuration details, and comprehensive API reference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +443 to +444
return None, None, f"Error reading audio: {str(e)}"

Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message for audio reading failures is too generic. Consider providing more specific guidance based on common issues:

except Exception as e:
    error_msg = f"Error reading audio file: {str(e)}\n"
    error_msg += "Ensure the file is a valid WAV format with PCM encoding."
    return None, None, error_msg
Suggested change
return None, None, f"Error reading audio: {str(e)}"
error_msg = f"Error reading audio file: {str(e)}\n"
error_msg += "Ensure the file is a valid WAV format with PCM encoding."
return None, None, error_msg

Copilot uses AI. Check for mistakes.
- Sample audio selection
- Real-time emotion parameter adjustment
- Visual blendshape output preview
- CSV export functionality
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description "CSV export functionality" is misleading. Based on the code in app.py (lines 502-508), the web interface exports both CSV and JSON formats. Consider updating to:

- Downloadable animation data in JSON and CSV formats
Suggested change
- CSV export functionality
- Downloadable animation data in JSON and CSV formats

Copilot uses AI. Check for mistakes.
return output_path


async def process_audio_async(audio_path: str, model: str, emotion: str, emotion_strength: float,
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name process_audio_async could be more descriptive. Consider renaming to process_audio_with_a2f_api or generate_animation_from_audio to better describe what the function does (it generates animation, not just processes audio).

Suggested change
async def process_audio_async(audio_path: str, model: str, emotion: str, emotion_strength: float,
async def generate_animation_from_audio(audio_path: str, model: str, emotion: str, emotion_strength: float,

Copilot uses AI. Check for mistakes.
if audio_path is None:
return None, None, "Please upload an audio file or select a sample audio."

return asyncio.run(process_audio_async(audio_path, model, emotion, emotion_strength, progress))
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Using asyncio.run() inside a synchronous wrapper function is fine for simple cases, but Gradio already has async support. Consider making the wrapper function async and using Gradio's native async support for better performance:

async def process_audio(audio_path, sample_audio, model, emotion, emotion_strength, progress=gr.Progress()):
    # ... validation logic ...
    return await process_audio_async(audio_path, model, emotion, emotion_strength, progress)

This avoids creating a new event loop for each request.

Copilot uses AI. Check for mistakes.
total_frames = int(duration * fps) + 1

# Create video writer
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the deprecated 'mp4v' codec which has compatibility issues on many platforms. Consider using 'avc1' or letting OpenCV choose a platform-appropriate codec, or use H.264 directly:

fourcc = cv2.VideoWriter_fourcc(*'avc1')  # Better compatibility
# or
fourcc = cv2.VideoWriter_fourcc('H','2','6','4')  # H.264 codec

Note: The video is later re-encoded with libx264 by ffmpeg, so this temporary codec choice is less critical, but still worth fixing for clarity.

Suggested change
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
fourcc = cv2.VideoWriter_fourcc(*'avc1') # Use H.264/AVC for better compatibility

Copilot uses AI. Check for mistakes.
Comment on lines +98 to +99

```bash
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instructions for launching the web interface are incomplete. The app requires an API key to be set (hardcoded in the current implementation, but should be via environment variable). Add documentation about setting the API key before running:

Launch the interactive web interface:

**Prerequisites**: Set your NVIDIA API key as an environment variable:
```bash
export NVIDIA_API_KEY=your-api-key-here

Then launch the app:

python3 ./app.py
Suggested change
```bash
**Prerequisites**: Set your NVIDIA API key as an environment variable:
```bash
export NVIDIA_API_KEY=your-api-key-here

Then launch the app:

Copilot uses AI. Check for mistakes.
import gradio as gr
import asyncio
import os
import tempfile
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'tempfile' is not used.

Suggested change
import tempfile

Copilot uses AI. Check for mistakes.
import shutil
import numpy as np
import scipy.io.wavfile
import yaml
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'yaml' is not used.

Suggested change
import yaml

Copilot uses AI. Check for mistakes.
# Audio2Face imports
import a2f_3d.client.auth
from nvidia_ace.services.a2f_controller.v1_pb2_grpc import A2FControllerServiceStub
from nvidia_ace.animation_data.v1_pb2 import AnimationData, AnimationDataStreamHeader
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'AnimationData' is not used.
Import of 'AnimationDataStreamHeader' is not used.

Suggested change
from nvidia_ace.animation_data.v1_pb2 import AnimationData, AnimationDataStreamHeader
# from nvidia_ace.animation_data.v1_pb2 import AnimationData, AnimationDataStreamHeader

Copilot uses AI. Check for mistakes.
Comment on lines +300 to +303
if animation_frames:
duration = animation_frames[-1]["timeCode"]
else:
duration = 1.0
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is unreachable.

Suggested change
if animation_frames:
duration = animation_frames[-1]["timeCode"]
else:
duration = 1.0
duration = animation_frames[-1]["timeCode"]

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants