diff --git a/scripts/audio2face_3d_api_client/README.md b/scripts/audio2face_3d_api_client/README.md
index c080b25..9c4fe21 100644
--- a/scripts/audio2face_3d_api_client/README.md
+++ b/scripts/audio2face_3d_api_client/README.md
@@ -1,50 +1,225 @@
-# Sample Application connecting to Audio2Face-3D NIM hosted on NVCF
+# Audio2Face-3D NIM API Client
 
-A sample Python application to showcase the Audio2Face-3D NIM hosted on NVIDIA Cloud Functions (NVCF).
+A sample Python application to showcase the Audio2Face-3D NIM hosted on NVIDIA Cloud Functions (NVCF). This client demonstrates how to send audio files and receive facial animation blendshapes data using NVIDIA's Audio2Face-3D API.
 
-## Getting started
+## 📋 Table of Contents
 
-Start by creating a python venv using:
+- [Features](#features)
+- [Prerequisites](#prerequisites)
+- [Installation](#installation)
+- [Usage](#usage)
+  - [Command Line Interface](#command-line-interface)
+  - [Gradio Web Interface](#gradio-web-interface)
+- [Configuration](#configuration)
+- [Available Models](#available-models)
+- [Sample Audio Files](#sample-audio-files)
+- [Output](#output)
+- [Project Structure](#project-structure)
+- [License](#license)
+
+## ✨ Features
+
+- **CLI Client**: Command-line interface for batch processing audio files
+- **Web Interface**: Interactive Gradio-based web UI for real-time testing
+- **Multiple Character Models**: Support for Mark, Claire, and James stylization models
+- **Emotion Control**: Configurable emotion parameters for animation generation
+- **Blendshape Output**: ARKit-compatible blendshape weights export
+- **Audio Streaming**: Efficient gRPC-based audio streaming
+
+## 📦 Prerequisites
+
+- Python 3.8+
+- NVIDIA API Key (from [NVIDIA API Catalog](https://build.nvidia.com/))
+- Function ID for the Audio2Face-3D API
+
+## 🚀 Installation
+
+### 1. Create a Virtual Environment
 
 ```bash
 python3 -m venv .venv
 source .venv/bin/activate
 ```
 
-Then install the required dependencies:
+### 2. Install Dependencies
 
 ```bash
 pip3 install -r requirements
 pip3 install ../../proto/sample_wheel/nvidia_ace-1.2.0-py3-none-any.whl
 ```
 
-Note: This wheel is compatible with Audio2Face-3D NIM 1.3
+> **Note**: The `nvidia_ace-1.2.0` wheel is compatible with Audio2Face-3D NIM 1.3
 
+### Dependencies
+
+| Package | Version | Purpose |
+|---------|---------|---------|
+| numpy | 1.26.4 | Numerical operations |
+| scipy | 1.13.0 | Audio file I/O |
+| grpcio | 1.72.0rc1 | gRPC communication |
+| protobuf | 4.24.1 | Protocol buffers |
+| PyYAML | 6.0.1 | Configuration parsing |
+| pandas | 2.2.2 | Data manipulation |
+| gradio | 6.0.1 | Web interface |
+| opencv-python-headless | 4.12.0.88 | Image processing |
+
+## 💻 Usage
+
+### Command Line Interface
+
+Run the CLI client with the following command:
 
 ```bash
-python3 ./nim_a2f_3d_client.py <audio_file.wav> <config.yml> --apikey <API_KEY> --function-id <Function_ID>
+python3 ./nim_a2f_3d_client.py <audio_file.wav> <config.yml> --apikey <API_KEY> --function-id <FUNCTION_ID>
 ```
 
-By Default:
+#### Example
 
 ```bash
-python3 ./nim_a2f_3d_client.py ../../example_audio/Claire_neutral.wav config/config_claire.yml --apikey <API_KEY> --function-id <Function_ID>
+python3 ./nim_a2f_3d_client.py \
+    ../../example_audio/Claire_neutral.wav \
+    config/config_claire.yml \
+    --apikey nvapi-xxxxxxxxxxxx \
+    --function-id 0961a6da-fb9e-4f2e-8491-247e5fd7bf8d
 ```
 
-The scripts takes four mandatory parameters, an audio file at format PCM 16 bits,
- a yaml configuration file for the emotions parameters, the API Key generated by API Catalogue, and the Function ID
- used to access the API function.
+#### Arguments
+
+| Argument | Required | Description |
+|----------|----------|-------------|
+| `file` | ✅ | PCM 16-bit single channel audio file in WAV format |
+| `config` | ✅ | YAML configuration file for inference parameters |
+| `--apikey` | ✅ | NGC API Key from NVIDIA API Catalog |
+| `--function-id` | ✅ | Function ID for the specific character model |
+
+### Gradio Web Interface
+
+Launch the interactive web interface:
+
+```bash
+python3 ./app.py
+```
+
+The web interface provides:
+- Drag-and-drop audio upload
+- Sample audio selection
+- Real-time emotion parameter adjustment
+- Visual blendshape output preview
+- CSV export functionality
+
+## ⚙️ Configuration
+
+Configuration files are located in the `config/` directory:
+
+- `config_claire.yml` - Claire character settings
+- `config_james.yml` - James character settings  
+- `config_mark.yml` - Mark character settings
+
+### Face Parameters
+
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `upperFaceStrength` | Range of motion for upper face | 1.0 |
+| `upperFaceSmoothing` | Temporal smoothing for upper face | 0.001 |
+| `lowerFaceStrength` | Range of motion for lower face | 1.25 |
+| `lowerFaceSmoothing` | Temporal smoothing for lower face | 0.006 |
+| `faceMaskLevel` | Boundary between upper/lower regions | 0.6 |
+| `faceMaskSoftness` | Blend smoothness between regions | 0.0085 |
+| `skinStrength` | Range of motion for skin | 1.0 |
+| `eyelidOpenOffset` | Default eyelid pose adjustment | 0.0 |
+| `lipOpenOffset` | Default lip pose adjustment | 0.0 |
+
+### Blendshape Parameters
 
---apikey for the API Key generated through the API Catalogue
---function-id for the Function ID provided to access the API function.
+The configuration supports ARKit-compatible blendshape multipliers and offsets. See [Apple ARKit documentation](https://developer.apple.com/documentation/arkit/arfaceanchor/blendshapelocation) for more details.
 
-## What does this example do?
+## 🎭 Available Models
+
+### With Tongue Animation
+
+| Character | Function ID |
+|-----------|-------------|
+| Mark | `8efc55f5-6f00-424e-afe9-26212cd2c630` |
+| Claire | `0961a6da-fb9e-4f2e-8491-247e5fd7bf8d` |
+| James | `9327c39f-a361-4e02-bd72-e11b4c9b7b5e` |
+
+### Legacy (Without Tongue Animation)
+
+| Character | Function ID |
+|-----------|-------------|
+| Mark | `cf145b84-423b-4222-bfdd-15bb0142b0fd` |
+| Claire | `617f80a7-85e4-4bf0-9dd6-dcb61e886142` |
+| James | `8082bdcb-9968-4dc5-8705-423ea98b8fc2` |
+
+## 🎵 Sample Audio Files
+
+Sample audio files are available in `../../example_audio/`:
+
+| File | Description |
+|------|-------------|
+| `Claire_neutral.wav` | Claire - Neutral emotion |
+| `Claire_anger.wav` | Claire - Anger emotion |
+| `Claire_joy_mandarin.wav` | Claire - Joy (Mandarin) |
+| `Claire_sadness.wav` | Claire - Sadness emotion |
+| `Claire_outofbreath_mandarin.wav` | Claire - Out of breath (Mandarin) |
+| `Mark_neutral.wav` | Mark - Neutral emotion |
+| `Mark_joy.wav` | Mark - Joy emotion |
+| `Mark_anger.wav` | Mark - Anger emotion |
+| `Mark_sadness.wav` | Mark - Sadness emotion |
+| `Mark_outofbreath.wav` | Mark - Out of breath |
+
+## 📤 Output
+
+The application generates the following outputs:
+
+1. **Blendshapes CSV**: Animation keyframes with blendshape names, values, and timecodes
+2. **Emotions CSV**: Emotion data with timecodes
+3. **Audio WAV**: Processed audio output (`out.wav`)
+
+### Supported Emotions
+
+- Amazement
+- Anger
+- Cheekiness
+- Disgust
+- Fear
+- Grief
+- Joy
+- Out of Breath
+- Pain
+- Sadness
+
+## 📁 Project Structure
+
+```
+audio2face_3d_api_client/
+├── README.md                 # This file
+├── nim_a2f_3d_client.py      # CLI client script
+├── app.py                    # Gradio web interface
+├── requirements              # Python dependencies
+├── config/
+│   ├── config_claire.yml     # Claire model configuration
+│   ├── config_james.yml      # James model configuration
+│   └── config_mark.yml       # Mark model configuration
+└── a2f_3d/
+    └── client/
+        ├── auth.py           # Authentication utilities
+        └── service.py        # gRPC service handlers
+```
+
+## 🔄 How It Works
+
+1. **Read Audio**: Loads audio data from a 16-bit PCM WAV file
+2. **Load Config**: Parses emotion and face parameters from YAML configuration
+3. **Stream Audio**: Sends audio data via gRPC to the Audio2Face-3D API
+4. **Receive Animation**: Gets back blendshape weights, audio, and emotion data
+5. **Export Results**: Saves animation keyframes and emotions to CSV files
+
+## 📄 License
+
+```
+SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+```
 
-1. Reads the audio data from a wav 16bits PCM file
-2. Reads emotions and parameters from the yaml configuration file
-3. Sends emotions, parameters and audio to the A2F-3D
-4. Receives back blendshapes, audio and emotions
-5. Saves blendshapes as animation key frames in a csv file with their name, value
-and time codes
-6. Same process for the emotion data.
-7. Saves the received audio as out.wav (Should be the same as input audio)
+Licensed under the Apache License, Version 2.0. See [LICENSE](http://www.apache.org/licenses/LICENSE-2.0) for details.
diff --git a/scripts/audio2face_3d_api_client/app.py b/scripts/audio2face_3d_api_client/app.py
new file mode 100644
index 0000000..5798af3
--- /dev/null
+++ b/scripts/audio2face_3d_api_client/app.py
@@ -0,0 +1,637 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+import gradio as gr
+import asyncio
+import os
+import tempfile
+import shutil
+import numpy as np
+import scipy.io.wavfile
+import yaml
+import pandas as pd
+from datetime import datetime
+from pathlib import Path
+import cv2
+import json
+
+# Audio2Face imports
+import a2f_3d.client.auth
+from nvidia_ace.services.a2f_controller.v1_pb2_grpc import A2FControllerServiceStub
+from nvidia_ace.animation_data.v1_pb2 import AnimationData, AnimationDataStreamHeader
+from nvidia_ace.a2f.v1_pb2 import AudioWithEmotion, EmotionPostProcessingParameters, FaceParameters, BlendShapeParameters
+from nvidia_ace.audio.v1_pb2 import AudioHeader
+from nvidia_ace.controller.v1_pb2 import AudioStream, AudioStreamHeader
+from nvidia_ace.emotion_with_timecode.v1_pb2 import EmotionWithTimeCode
+from nvidia_ace.emotion_aggregate.v1_pb2 import EmotionAggregate
+import grpc
+
+# Constants
+BITS_PER_SAMPLE = 16
+CHANNEL_COUNT = 1
+AUDIO_FORMAT = AudioHeader.AUDIO_FORMAT_PCM
+
+# API Configuration
+API_KEY = "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"
+
+# Function IDs for different models (with tongue animation)
+FUNCTION_IDS = {
+    "Mark": "8efc55f5-6f00-424e-afe9-26212cd2c630",
+    "Claire": "0961a6da-fb9e-4f2e-8491-247e5fd7bf8d",
+    "James": "9327c39f-a361-4e02-bd72-e11b4c9b7b5e",
+}
+
+# Function IDs without tongue animation (legacy)
+FUNCTION_IDS_LEGACY = {
+    "Mark (Legacy)": "cf145b84-423b-4222-bfdd-15bb0142b0fd",
+    "Claire (Legacy)": "617f80a7-85e4-4bf0-9dd6-dcb61e886142",
+    "James (Legacy)": "8082bdcb-9968-4dc5-8705-423ea98b8fc2",
+}
+
+# Emotion list
+EMOTIONS = ["amazement", "anger", "cheekiness", "disgust", "fear", "grief", "joy", "outofbreath", "pain", "sadness"]
+
+# Sample audio files
+SAMPLE_AUDIO_DIR = Path(__file__).parent.parent.parent / "example_audio"
+SAMPLE_AUDIOS = {
+    "-- Select Sample Audio --": None,
+    "Claire - Neutral": "Claire_neutral.wav",
+    "Claire - Joy (Mandarin)": "Claire_joy_mandarin.wav",
+    "Claire - Anger": "Claire_anger.wav",
+    "Claire - Sadness": "Claire_sadness.wav",
+    "Claire - Out of Breath (Mandarin)": "Claire_outofbreath_mandarin.wav",
+    "Claire - Sadness (5 sec, 16kHz)": "Claire_sadness_16khz_5_sec.wav",
+    "Claire - Sadness (10 sec, 16kHz)": "Claire_sadness_16khz_10_sec.wav",
+    "Claire - Sadness (20 sec, 16kHz)": "Claire_sadness_16khz_20_sec.wav",
+    "Mark - Neutral": "Mark_neutral.wav",
+    "Mark - Joy": "Mark_joy.wav",
+    "Mark - Anger": "Mark_anger.wav",
+    "Mark - Sadness": "Mark_sadness.wav",
+    "Mark - Out of Breath": "Mark_outofbreath.wav",
+}
+
+
+def get_default_config():
+    """Returns default configuration for Audio2Face"""
+    return {
+        "face_parameters": {
+            "upperFaceStrength": 1.0,
+            "upperFaceSmoothing": 0.001,
+            "lowerFaceStrength": 1.25,
+            "lowerFaceSmoothing": 0.006,
+            "faceMaskLevel": 0.6,
+            "faceMaskSoftness": 0.0085,
+            "skinStrength": 1.0,
+            "eyelidOpenOffset": 0.0,
+            "lipOpenOffset": 0.0,
+        },
+        "blendshape_parameters": {
+            "enable_clamping_bs_weight": False,
+            "multipliers": {
+                "EyeBlinkLeft": 1.0, "EyeLookDownLeft": 0.0, "EyeLookInLeft": 0.0,
+                "EyeLookOutLeft": 0.0, "EyeLookUpLeft": 0.0, "EyeSquintLeft": 1.0,
+                "EyeWideLeft": 1.0, "EyeBlinkRight": 1.0, "EyeLookDownRight": 0.0,
+                "EyeLookInRight": 0.0, "EyeLookOutRight": 0.0, "EyeLookUpRight": 0.0,
+                "EyeSquintRight": 1.0, "EyeWideRight": 1.0, "JawForward": 0.7,
+                "JawLeft": 0.2, "JawRight": 0.2, "JawOpen": 1.0, "MouthClose": 1.0,
+                "MouthFunnel": 1.2, "MouthPucker": 1.2, "MouthLeft": 0.2,
+                "MouthRight": 0.2, "MouthSmileLeft": 0.8, "MouthSmileRight": 0.8,
+                "MouthFrownLeft": 0.4, "MouthFrownRight": 0.4, "MouthDimpleLeft": 0.7,
+                "MouthDimpleRight": 0.7, "MouthStretchLeft": 0.1, "MouthStretchRight": 0.1,
+                "MouthRollLower": 0.9, "MouthRollUpper": 0.5, "MouthShrugLower": 0.9,
+                "MouthShrugUpper": 0.4, "MouthPressLeft": 0.8, "MouthPressRight": 0.8,
+                "MouthLowerDownLeft": 0.8, "MouthLowerDownRight": 0.8,
+                "MouthUpperUpLeft": 0.8, "MouthUpperUpRight": 0.8, "BrowDownLeft": 1.0,
+                "BrowDownRight": 1.0, "BrowInnerUp": 1.0, "BrowOuterUpLeft": 1.0,
+                "BrowOuterUpRight": 1.0, "CheekPuff": 0.2, "CheekSquintLeft": 1.0,
+                "CheekSquintRight": 1.0, "NoseSneerLeft": 0.8, "NoseSneerRight": 0.8,
+                "TongueOut": 0.0,
+            },
+            "offsets": {k: 0.0 for k in [
+                "EyeBlinkLeft", "EyeLookDownLeft", "EyeLookInLeft", "EyeLookOutLeft",
+                "EyeLookUpLeft", "EyeSquintLeft", "EyeWideLeft", "EyeBlinkRight",
+                "EyeLookDownRight", "EyeLookInRight", "EyeLookOutRight", "EyeLookUpRight",
+                "EyeSquintRight", "EyeWideRight", "JawForward", "JawLeft", "JawRight",
+                "JawOpen", "MouthClose", "MouthFunnel", "MouthPucker", "MouthLeft",
+                "MouthRight", "MouthSmileLeft", "MouthSmileRight", "MouthFrownLeft",
+                "MouthFrownRight", "MouthDimpleLeft", "MouthDimpleRight", "MouthStretchLeft",
+                "MouthStretchRight", "MouthRollLower", "MouthRollUpper", "MouthShrugLower",
+                "MouthShrugUpper", "MouthPressLeft", "MouthPressRight", "MouthLowerDownLeft",
+                "MouthLowerDownRight", "MouthUpperUpLeft", "MouthUpperUpRight",
+                "BrowDownLeft", "BrowDownRight", "BrowInnerUp", "BrowOuterUpLeft",
+                "BrowOuterUpRight", "CheekPuff", "CheekSquintLeft", "CheekSquintRight",
+                "NoseSneerLeft", "NoseSneerRight", "TongueOut",
+            ]},
+        },
+        "post_processing_parameters": {
+            "emotion_contrast": 1.0,
+            "live_blend_coef": 0.7,
+            "enable_preferred_emotion": False,
+            "preferred_emotion_strength": 0.5,
+            "emotion_strength": 0.6,
+            "max_emotions": 3,
+        },
+        "emotion_with_timecode_list": {
+            "emotion_with_timecode1": {
+                "time_code": 0.0,
+                "emotions": {e: 0.0 for e in EMOTIONS}
+            }
+        }
+    }
+
+
+def create_config_with_emotion(primary_emotion: str, emotion_strength: float = 1.0):
+    """Create config with specified primary emotion"""
+    config = get_default_config()
+    emotions = {e: 0.0 for e in EMOTIONS}
+    if primary_emotion.lower() in emotions:
+        emotions[primary_emotion.lower()] = emotion_strength
+    config["emotion_with_timecode_list"]["emotion_with_timecode1"]["emotions"] = emotions
+    return config
+
+
+class A2FProcessor:
+    """Audio2Face-3D Processor"""
+    
+    def __init__(self):
+        self.bs_names = []
+        self.animation_key_frames = []
+        self.audio_buffer = b''
+        self.audio_header = None
+        self.emotion_key_frames = {"input": [], "a2f_smoothed_output": []}
+    
+    def reset(self):
+        self.bs_names = []
+        self.animation_key_frames = []
+        self.audio_buffer = b''
+        self.audio_header = None
+        self.emotion_key_frames = {"input": [], "a2f_smoothed_output": []}
+    
+    async def read_from_stream(self, stream, progress_callback=None):
+        """Read animation data from gRPC stream"""
+        frame_count = 0
+        while True:
+            message = await stream.read()
+            if message == grpc.aio.EOF:
+                break
+            
+            if message.HasField("animation_data_stream_header"):
+                header = message.animation_data_stream_header
+                self.bs_names = list(header.skel_animation_header.blend_shapes)
+                self.audio_header = header.audio_header
+            
+            elif message.HasField("animation_data"):
+                animation_data = message.animation_data
+                
+                # Parse emotion data
+                emotion_aggregate = EmotionAggregate()
+                if animation_data.metadata.get("emotion_aggregate") and \
+                   animation_data.metadata["emotion_aggregate"].Unpack(emotion_aggregate):
+                    for ewt in emotion_aggregate.input_emotions:
+                        self.emotion_key_frames["input"].append({
+                            "time_code": ewt.time_code,
+                            "emotion_values": dict(ewt.emotion),
+                        })
+                    for ewt in emotion_aggregate.a2f_smoothed_output:
+                        self.emotion_key_frames["a2f_smoothed_output"].append({
+                            "time_code": ewt.time_code,
+                            "emotion_values": dict(ewt.emotion),
+                        })
+                
+                # Parse blendshape data
+                for blendshapes in animation_data.skel_animation.blend_shape_weights:
+                    bs_values_dict = dict(zip(self.bs_names, blendshapes.values))
+                    self.animation_key_frames.append({
+                        "timeCode": blendshapes.time_code,
+                        "blendShapes": bs_values_dict
+                    })
+                    frame_count += 1
+                
+                self.audio_buffer += animation_data.audio.audio_buffer
+                
+                if progress_callback:
+                    progress_callback(frame_count)
+            
+            elif message.HasField("status"):
+                status = message.status
+                return status.code == 0, status.message
+        
+        return True, "Stream completed"
+    
+    async def write_to_stream(self, stream, config: dict, audio_data: np.ndarray, sample_rate: int):
+        """Write audio data to gRPC stream"""
+        # Send header
+        audio_stream_header = AudioStream(
+            audio_stream_header=AudioStreamHeader(
+                audio_header=AudioHeader(
+                    samples_per_second=sample_rate,
+                    bits_per_sample=BITS_PER_SAMPLE,
+                    channel_count=CHANNEL_COUNT,
+                    audio_format=AUDIO_FORMAT
+                ),
+                emotion_post_processing_params=EmotionPostProcessingParameters(
+                    **config["post_processing_parameters"]
+                ),
+                face_params=FaceParameters(float_params=config["face_parameters"]),
+                blendshape_params=BlendShapeParameters(
+                    bs_weight_multipliers=config["blendshape_parameters"]["multipliers"],
+                    bs_weight_offsets=config["blendshape_parameters"]["offsets"]
+                )
+            )
+        )
+        await stream.write(audio_stream_header)
+        
+        # Send audio in chunks
+        chunk_size = sample_rate  # 1 second chunks
+        for i in range(len(audio_data) // chunk_size + 1):
+            chunk = audio_data[i * chunk_size: (i + 1) * chunk_size]
+            if len(chunk) == 0:
+                continue
+            
+            if i == 0:
+                # First chunk includes emotions
+                list_emotion_tc = [
+                    EmotionWithTimeCode(
+                        emotion={**v["emotions"]},
+                        time_code=v["time_code"]
+                    ) for v in config["emotion_with_timecode_list"].values()
+                ]
+                await stream.write(
+                    AudioStream(
+                        audio_with_emotion=AudioWithEmotion(
+                            audio_buffer=chunk.astype(np.int16).tobytes(),
+                            emotions=list_emotion_tc
+                        )
+                    )
+                )
+            else:
+                await stream.write(
+                    AudioStream(
+                        audio_with_emotion=AudioWithEmotion(
+                            audio_buffer=chunk.astype(np.int16).tobytes()
+                        )
+                    )
+                )
+        
+        # Signal end of audio
+        await stream.write(AudioStream(end_of_audio=AudioStream.EndOfAudio()))
+
+
+def create_visualization_video(animation_frames: list, audio_path: str, output_path: str, fps: int = 30):
+    """
+    Create a visualization video showing blendshape values as animated bars
+    with the audio track.
+    """
+    if not animation_frames:
+        return None
+    
+    # Video dimensions
+    width, height = 1280, 720
+    
+    # Key blendshapes to visualize
+    key_blendshapes = [
+        "JawOpen", "MouthSmileLeft", "MouthSmileRight", "MouthFrownLeft", "MouthFrownRight",
+        "MouthPucker", "MouthFunnel", "BrowInnerUp", "BrowDownLeft", "BrowDownRight",
+        "EyeBlinkLeft", "EyeBlinkRight", "EyeWideLeft", "EyeWideRight",
+        "CheekSquintLeft", "CheekSquintRight", "NoseSneerLeft", "NoseSneerRight",
+    ]
+    
+    # Calculate duration based on animation frames
+    if animation_frames:
+        duration = animation_frames[-1]["timeCode"]
+    else:
+        duration = 1.0
+    
+    total_frames = int(duration * fps) + 1
+    
+    # Create video writer
+    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
+    temp_video = output_path.replace('.mp4', '_temp.mp4')
+    out = cv2.VideoWriter(temp_video, fourcc, fps, (width, height))
+    
+    # Color scheme
+    bg_color = (30, 30, 40)  # Dark background
+    bar_color = (0, 200, 100)  # Green bars
+    text_color = (255, 255, 255)  # White text
+    accent_color = (100, 150, 255)  # Blue accent
+    
+    # Animation frame index
+    anim_idx = 0
+    
+    for frame_num in range(total_frames):
+        current_time = frame_num / fps
+        
+        # Find the closest animation frame
+        while anim_idx < len(animation_frames) - 1 and \
+              animation_frames[anim_idx + 1]["timeCode"] <= current_time:
+            anim_idx += 1
+        
+        # Create frame
+        frame = np.full((height, width, 3), bg_color, dtype=np.uint8)
+        
+        # Title
+        cv2.putText(frame, "NVIDIA Audio2Face-3D Blendshape Visualization",
+                    (40, 50), cv2.FONT_HERSHEY_SIMPLEX, 1.0, accent_color, 2)
+        
+        # Time display
+        cv2.putText(frame, f"Time: {current_time:.2f}s / {duration:.2f}s",
+                    (40, 90), cv2.FONT_HERSHEY_SIMPLEX, 0.7, text_color, 1)
+        
+        # Progress bar
+        progress = current_time / duration if duration > 0 else 0
+        cv2.rectangle(frame, (40, 105), (width - 40, 115), (60, 60, 70), -1)
+        cv2.rectangle(frame, (40, 105), (int(40 + (width - 80) * progress), 115), accent_color, -1)
+        
+        # Draw blendshape bars
+        if animation_frames and anim_idx < len(animation_frames):
+            blendshapes = animation_frames[anim_idx].get("blendShapes", {})
+            
+            bar_height = 25
+            bar_max_width = 400
+            start_y = 150
+            col1_x = 50
+            col2_x = 650
+            
+            for i, bs_name in enumerate(key_blendshapes):
+                value = blendshapes.get(bs_name, 0.0)
+                value = max(0, min(1, value))  # Clamp to 0-1
+                
+                # Determine column
+                if i < len(key_blendshapes) // 2:
+                    x = col1_x
+                    y = start_y + i * (bar_height + 15)
+                else:
+                    x = col2_x
+                    y = start_y + (i - len(key_blendshapes) // 2) * (bar_height + 15)
+                
+                # Draw label
+                cv2.putText(frame, bs_name, (x, y + 18),
+                            cv2.FONT_HERSHEY_SIMPLEX, 0.5, text_color, 1)
+                
+                # Draw bar background
+                bar_x = x + 180
+                cv2.rectangle(frame, (bar_x, y), (bar_x + bar_max_width, y + bar_height),
+                              (60, 60, 70), -1)
+                
+                # Draw bar value
+                bar_width = int(bar_max_width * value)
+                if bar_width > 0:
+                    # Color gradient based on value
+                    color = (int(bar_color[0] * (1 - value * 0.5)),
+                             int(bar_color[1]),
+                             int(bar_color[2] * (1 - value * 0.3)))
+                    cv2.rectangle(frame, (bar_x, y), (bar_x + bar_width, y + bar_height),
+                                  color, -1)
+                
+                # Draw value text
+                cv2.putText(frame, f"{value:.2f}", (bar_x + bar_max_width + 10, y + 18),
+                            cv2.FONT_HERSHEY_SIMPLEX, 0.5, text_color, 1)
+        
+        # Footer
+        cv2.putText(frame, "Generated with NVIDIA Audio2Face-3D API",
+                    (40, height - 30), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (150, 150, 150), 1)
+        
+        out.write(frame)
+    
+    out.release()
+    
+    # Combine video with audio using ffmpeg
+    try:
+        import subprocess
+        cmd = [
+            'ffmpeg', '-y',
+            '-i', temp_video,
+            '-i', audio_path,
+            '-c:v', 'libx264',
+            '-c:a', 'aac',
+            '-shortest',
+            output_path
+        ]
+        subprocess.run(cmd, capture_output=True, check=True)
+        os.remove(temp_video)
+    except Exception as e:
+        # If ffmpeg fails, just use the video without audio
+        shutil.move(temp_video, output_path)
+        print(f"Warning: Could not add audio to video: {e}")
+    
+    return output_path
+
+
+async def process_audio_async(audio_path: str, model: str, emotion: str, emotion_strength: float, 
+                               progress=gr.Progress()):
+    """Process audio through Audio2Face-3D API"""
+    
+    # Get function ID
+    all_models = {**FUNCTION_IDS, **FUNCTION_IDS_LEGACY}
+    function_id = all_models.get(model)
+    if not function_id:
+        return None, None, f"Unknown model: {model}"
+    
+    # Read audio file
+    try:
+        sample_rate, audio_data = scipy.io.wavfile.read(audio_path)
+        # Convert to mono if stereo
+        if len(audio_data.shape) > 1:
+            audio_data = audio_data.mean(axis=1)
+        # Convert to int16 if needed
+        if audio_data.dtype != np.int16:
+            if audio_data.dtype == np.float32 or audio_data.dtype == np.float64:
+                audio_data = (audio_data * 32767).astype(np.int16)
+            else:
+                audio_data = audio_data.astype(np.int16)
+    except Exception as e:
+        return None, None, f"Error reading audio: {str(e)}"
+    
+    # Create config with emotion
+    config = create_config_with_emotion(emotion, emotion_strength)
+    
+    # Setup gRPC connection
+    metadata_args = [
+        ("function-id", function_id),
+        ("authorization", f"Bearer {API_KEY}")
+    ]
+    
+    try:
+        channel = a2f_3d.client.auth.create_channel(
+            uri="grpc.nvcf.nvidia.com:443",
+            use_ssl=True,
+            metadata=metadata_args
+        )
+        stub = A2FControllerServiceStub(channel)
+        stream = stub.ProcessAudioStream()
+        
+        # Process
+        processor = A2FProcessor()
+        
+        def update_progress(frame_count):
+            progress(frame_count / 100, desc=f"Processing frame {frame_count}...")
+        
+        # Run write and read concurrently
+        write_task = asyncio.create_task(
+            processor.write_to_stream(stream, config, audio_data, sample_rate)
+        )
+        read_task = asyncio.create_task(
+            processor.read_from_stream(stream, update_progress)
+        )
+        
+        await write_task
+        success, message = await read_task
+        
+        if not success:
+            return None, None, f"API Error: {message}"
+        
+        # Create output directory
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
+        output_dir = Path(f"outputs/{timestamp}")
+        output_dir.mkdir(parents=True, exist_ok=True)
+        
+        # Save audio
+        output_audio = output_dir / "output.wav"
+        if processor.audio_buffer and processor.audio_header:
+            audio_out = np.frombuffer(processor.audio_buffer, dtype=np.int16)
+            scipy.io.wavfile.write(str(output_audio), processor.audio_header.samples_per_second, audio_out)
+        else:
+            shutil.copy(audio_path, output_audio)
+        
+        # Save animation data as CSV
+        df = pd.json_normalize(processor.animation_key_frames)
+        csv_path = output_dir / "animation_frames.csv"
+        df.to_csv(csv_path, index=False)
+        
+        # Save as JSON for download
+        json_path = output_dir / "animation_data.json"
+        with open(json_path, 'w') as f:
+            json.dump({
+                "blendshape_names": processor.bs_names,
+                "frames": processor.animation_key_frames,
+                "emotions": processor.emotion_key_frames
+            }, f, indent=2)
+        
+        # Create visualization video
+        progress(0.8, desc="Creating visualization video...")
+        video_path = output_dir / "visualization.mp4"
+        create_visualization_video(
+            processor.animation_key_frames,
+            str(output_audio),
+            str(video_path)
+        )
+        
+        progress(1.0, desc="Complete!")
+        
+        return str(video_path), str(json_path), f"✅ Success! Generated {len(processor.animation_key_frames)} animation frames."
+        
+    except Exception as e:
+        return None, None, f"Error: {str(e)}"
+
+
+def process_audio(audio_path, sample_audio, model, emotion, emotion_strength, progress=gr.Progress()):
+    """Wrapper to run async function"""
+    # Use sample audio if selected, otherwise use uploaded audio
+    if sample_audio and sample_audio != "-- Select Sample Audio --":
+        audio_file = SAMPLE_AUDIOS.get(sample_audio)
+        if audio_file:
+            audio_path = str(SAMPLE_AUDIO_DIR / audio_file)
+    
+    if audio_path is None:
+        return None, None, "Please upload an audio file or select a sample audio."
+    
+    return asyncio.run(process_audio_async(audio_path, model, emotion, emotion_strength, progress))
+
+
+# Create Gradio Interface
+def create_ui():
+    with gr.Blocks(title="NVIDIA Audio2Face-3D") as demo:
+        gr.Markdown("""
+        # 🎭 NVIDIA Audio2Face-3D
+        ### Convert Audio to Facial Animation
+        
+        Upload your audio file and generate facial animation blendshapes with emotion control.
+        The output video shows a visualization of the generated blendshapes synced with your audio.
+        """)
+        
+        with gr.Row():
+            with gr.Column(scale=1):
+                gr.Markdown("### 📤 Input")
+                
+                with gr.Tab("📁 Upload Audio"):
+                    audio_input = gr.Audio(
+                        label="Upload Audio (WAV format, 16-bit PCM recommended)",
+                        type="filepath",
+                        sources=["upload", "microphone"]
+                    )
+                
+                with gr.Tab("🎵 Sample Audio"):
+                    sample_dropdown = gr.Dropdown(
+                        choices=list(SAMPLE_AUDIOS.keys()),
+                        value="-- Select Sample Audio --",
+                        label="Select Sample Audio",
+                        info="Choose from pre-loaded sample audio files"
+                    )
+                    gr.Markdown("*Tip: Use Claire samples with Claire model, Mark samples with Mark model*")
+                
+                model_dropdown = gr.Dropdown(
+                    choices=list(FUNCTION_IDS.keys()) + list(FUNCTION_IDS_LEGACY.keys()),
+                    value="Claire",
+                    label="🎭 Character Model",
+                    info="Select the face model to use"
+                )
+                
+                with gr.Accordion("🎨 Emotion Settings", open=True):
+                    emotion_dropdown = gr.Dropdown(
+                        choices=EMOTIONS,
+                        value="joy",
+                        label="Primary Emotion"
+                    )
+                    emotion_strength = gr.Slider(
+                        minimum=0.0,
+                        maximum=1.0,
+                        value=0.7,
+                        step=0.1,
+                        label="Emotion Strength"
+                    )
+                
+                process_btn = gr.Button("🚀 Generate Animation", variant="primary", size="lg")
+            
+            with gr.Column(scale=1):
+                gr.Markdown("### 📥 Output")
+                
+                status_text = gr.Textbox(label="Status", interactive=False)
+                
+                video_output = gr.Video(label="Animation Visualization")
+                
+                json_output = gr.File(label="📄 Download Animation Data (JSON)")
+        
+        gr.Markdown("""
+        ---
+        ### 📋 Instructions
+        1. **Select Audio**: Upload a WAV file, record from microphone, OR select a sample audio
+        2. **Select Model**: Choose from Mark, Claire, or James characters
+        3. **Set Emotion**: Pick the primary emotion and adjust strength
+        4. **Generate**: Click the button and wait for processing
+        
+        ### 📊 Output Files
+        - **Video**: Visualization of blendshape values over time with audio
+        - **JSON**: Complete animation data including all 52 ARKit blendshapes
+        
+        ### 🔗 Resources
+        - [NVIDIA Audio2Face Documentation](https://docs.nvidia.com/ace/latest/modules/a2f-docs/index.html)
+        - [ARKit Blendshape Reference](https://developer.apple.com/documentation/arkit/arfaceanchor/blendshapelocation)
+        """)
+        
+        # Event handlers
+        process_btn.click(
+            fn=process_audio,
+            inputs=[audio_input, sample_dropdown, model_dropdown, emotion_dropdown, emotion_strength],
+            outputs=[video_output, json_output, status_text]
+        )
+    
+    return demo
+
+
+if __name__ == "__main__":
+    # Create outputs directory
+    os.makedirs("outputs", exist_ok=True)
+    
+    # Launch the app
+    demo = create_ui()
+    demo.launch(server_name="0.0.0.0", server_port=7860, share=True)
diff --git a/scripts/audio2face_3d_api_client/requirements b/scripts/audio2face_3d_api_client/requirements
index e947386..caef084 100644
--- a/scripts/audio2face_3d_api_client/requirements
+++ b/scripts/audio2face_3d_api_client/requirements
@@ -4,3 +4,5 @@ grpcio==1.72.0rc1
 protobuf==4.24.1
 PyYAML==6.0.1
 pandas==2.2.2
+gradio==6.0.1
+opencv-python-headless==4.12.0.88
\ No newline at end of file