Enhance README and add Gradio web interface for Audio2Face-3D API client#9
Enhance README and add Gradio web interface for Audio2Face-3D API client#9Yash-Kavaiya wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request enhances the Audio2Face-3D API client by introducing a comprehensive Gradio-based web interface and significantly improving documentation. The PR adds interactive functionality for testing and using the Audio2Face-3D API, making it more accessible to users who prefer a visual interface over command-line tools.
Key Changes
- New Gradio Web Interface: Adds
app.pywith a full-featured web UI including audio upload/recording, sample audio selection, real-time emotion parameter adjustment, and visualization video generation - Enhanced Documentation: Comprehensive README overhaul with detailed installation instructions, usage examples, configuration details, and well-structured sections covering all aspects of the API client
- Additional Dependencies: Introduces Gradio and OpenCV for web interface and visualization capabilities
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 23 comments.
| File | Description |
|---|---|
scripts/audio2face_3d_api_client/requirements |
Adds Gradio 6.0.1 and opencv-python-headless 4.12.0.88 dependencies for web interface support |
scripts/audio2face_3d_api_client/app.py |
New 637-line Gradio application providing interactive web interface with audio processing, emotion control, blendshape visualization, and multi-format output |
scripts/audio2face_3d_api_client/README.md |
Major documentation enhancement from 50 to 225 lines, adding structured sections for features, installation, usage examples, configuration details, and comprehensive API reference |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return None, None, f"Error reading audio: {str(e)}" | ||
|
|
There was a problem hiding this comment.
The error message for audio reading failures is too generic. Consider providing more specific guidance based on common issues:
except Exception as e:
error_msg = f"Error reading audio file: {str(e)}\n"
error_msg += "Ensure the file is a valid WAV format with PCM encoding."
return None, None, error_msg| return None, None, f"Error reading audio: {str(e)}" | |
| error_msg = f"Error reading audio file: {str(e)}\n" | |
| error_msg += "Ensure the file is a valid WAV format with PCM encoding." | |
| return None, None, error_msg |
| - Sample audio selection | ||
| - Real-time emotion parameter adjustment | ||
| - Visual blendshape output preview | ||
| - CSV export functionality |
There was a problem hiding this comment.
The description "CSV export functionality" is misleading. Based on the code in app.py (lines 502-508), the web interface exports both CSV and JSON formats. Consider updating to:
- Downloadable animation data in JSON and CSV formats| - CSV export functionality | |
| - Downloadable animation data in JSON and CSV formats |
| return output_path | ||
|
|
||
|
|
||
| async def process_audio_async(audio_path: str, model: str, emotion: str, emotion_strength: float, |
There was a problem hiding this comment.
The function name process_audio_async could be more descriptive. Consider renaming to process_audio_with_a2f_api or generate_animation_from_audio to better describe what the function does (it generates animation, not just processes audio).
| async def process_audio_async(audio_path: str, model: str, emotion: str, emotion_strength: float, | |
| async def generate_animation_from_audio(audio_path: str, model: str, emotion: str, emotion_strength: float, |
| if audio_path is None: | ||
| return None, None, "Please upload an audio file or select a sample audio." | ||
|
|
||
| return asyncio.run(process_audio_async(audio_path, model, emotion, emotion_strength, progress)) |
There was a problem hiding this comment.
[nitpick] Using asyncio.run() inside a synchronous wrapper function is fine for simple cases, but Gradio already has async support. Consider making the wrapper function async and using Gradio's native async support for better performance:
async def process_audio(audio_path, sample_audio, model, emotion, emotion_strength, progress=gr.Progress()):
# ... validation logic ...
return await process_audio_async(audio_path, model, emotion, emotion_strength, progress)This avoids creating a new event loop for each request.
| total_frames = int(duration * fps) + 1 | ||
|
|
||
| # Create video writer | ||
| fourcc = cv2.VideoWriter_fourcc(*'mp4v') |
There was a problem hiding this comment.
Using the deprecated 'mp4v' codec which has compatibility issues on many platforms. Consider using 'avc1' or letting OpenCV choose a platform-appropriate codec, or use H.264 directly:
fourcc = cv2.VideoWriter_fourcc(*'avc1') # Better compatibility
# or
fourcc = cv2.VideoWriter_fourcc('H','2','6','4') # H.264 codecNote: The video is later re-encoded with libx264 by ffmpeg, so this temporary codec choice is less critical, but still worth fixing for clarity.
| fourcc = cv2.VideoWriter_fourcc(*'mp4v') | |
| fourcc = cv2.VideoWriter_fourcc(*'avc1') # Use H.264/AVC for better compatibility |
|
|
||
| ```bash |
There was a problem hiding this comment.
The instructions for launching the web interface are incomplete. The app requires an API key to be set (hardcoded in the current implementation, but should be via environment variable). Add documentation about setting the API key before running:
Launch the interactive web interface:
**Prerequisites**: Set your NVIDIA API key as an environment variable:
```bash
export NVIDIA_API_KEY=your-api-key-hereThen launch the app:
python3 ./app.py| ```bash | |
| **Prerequisites**: Set your NVIDIA API key as an environment variable: | |
| ```bash | |
| export NVIDIA_API_KEY=your-api-key-here |
Then launch the app:
| import gradio as gr | ||
| import asyncio | ||
| import os | ||
| import tempfile |
There was a problem hiding this comment.
Import of 'tempfile' is not used.
| import tempfile |
| import shutil | ||
| import numpy as np | ||
| import scipy.io.wavfile | ||
| import yaml |
There was a problem hiding this comment.
Import of 'yaml' is not used.
| import yaml |
| # Audio2Face imports | ||
| import a2f_3d.client.auth | ||
| from nvidia_ace.services.a2f_controller.v1_pb2_grpc import A2FControllerServiceStub | ||
| from nvidia_ace.animation_data.v1_pb2 import AnimationData, AnimationDataStreamHeader |
There was a problem hiding this comment.
Import of 'AnimationData' is not used.
Import of 'AnimationDataStreamHeader' is not used.
| from nvidia_ace.animation_data.v1_pb2 import AnimationData, AnimationDataStreamHeader | |
| # from nvidia_ace.animation_data.v1_pb2 import AnimationData, AnimationDataStreamHeader |
| if animation_frames: | ||
| duration = animation_frames[-1]["timeCode"] | ||
| else: | ||
| duration = 1.0 |
There was a problem hiding this comment.
This statement is unreachable.
| if animation_frames: | |
| duration = animation_frames[-1]["timeCode"] | |
| else: | |
| duration = 1.0 | |
| duration = animation_frames[-1]["timeCode"] |
Improve the README to provide comprehensive information about the Audio2Face-3D API client, including features, installation instructions, and usage examples. Introduce a Gradio web interface for real-time interaction and testing of the API functionalities. Add necessary dependencies for the web interface and enhance the overall structure of the documentation.