Skip to content

Commit

Permalink
[Talking Avatar][Live] update sample code to close WS connection in-t…
Browse files Browse the repository at this point in the history
…ime, when user closes/refreshes web page, or auto-reconnection is applied (#2713)

* [TalkingAvatar] Add sample code for TTS talking avatar real-time API

* sample codes for batch avatar synthesis

* Address repository check failure

* update

* [Avatar] Update real time avatar sample code to support multi-lingual

* [avatar] update real time avatar chat sample to receive GPT response streamingly

* [Live Avatar] update chat sample to make some refinements

* [TTS Avatar] Update real-time sample to support 1. non-continuous recognition mode 2. a button to stop speaking 3. user can type query without speech

* [TTS Avatar] Update real time avatar sample to support auto-reconnect

* Don't reset message history when re-connecting

* [talking avatar] update real time sample to support using cached local video for idle status, to help save customer cost

* Update chat.html and README.md

* Update batch avatar sample to use mp4 as default format, to avoid defaultly showing slow speed with vp9

* A minor refinement

* Some refinement

* Some bug fixing

* Refine the reponse receiving logic for AOAI streaming mode, to make it more robust

* [Talking Avatar] update real-time sample code to log result id (turn id) for ease of debugging

* [Talking Avatar] Update avatar live chat sample, to upgrade AOAI API version from 2023-03-15-preview to 2023-12-01-preview

* [Talking Avatar][Live Chat] Update AOAI API to be long term support version 2023-06-01-preview

* [Talking Avatar] Add real time avatar sample code for server/client hybrid web app, with server code written in python

* Some refinements

* Add README.md

* Fix repo check failure: files that are neither marked as binary nor text, please extend .gitattributes

* [Python][TTS Avatar] Add chat sample

* [Python][TTS Avatar] Add chat sample - continue

* Support multiple clients management

* Update README.md

* [Python][TTS Avatar] Support customized ICE server

* [Talking Avatar][Python] Support stop speaking

* Tolerat speech sdk to unsupport sending message with connection

* [Python][TTS Avatar] Send local SDP as post body instead of header, to avoid header size over limit

* [python][avatar] update requirements.txt to add the missing dependencies

* [python][avatar] update real-time sample to make auto-connection more smoothy

* [Python][Avatar] Fix some small bugs

* [python][avatar] Support AAD authorization on private endpoint

* [Java][Android][Avatar] Add Android sample code for real time avatar

* Code refinement

* More refinement

* More refinement

* Update README.md

* [Java][Android][Avatar] Remove AddStream method, which is not available with Unified Plan SDP semantics, and use AddTrack per suggestion

* [Python][Avatar][Live] Get speaking status from WebRTC event, and remove the checkSpeakingStatus API from backend code

* [Java][Android][Live Avatar] Update the sample to demonstrate switching audio output device to loud speaker

* [Python][Avatar][Live] Switch from REST API to SDK for calling AOAI

* [Python][Avatar][Live] Trigger barging at first recognizing event which is earlier

* [Python][Avatar][Live] Enable continuous conversation by default

* [Python][Avatar][Live] Disable multi-lingual by default for better latency

* [Python][Avatar][Live] Configure shorter segmentation silence timeout for quicker SR

* [Live Avatar][Python, CSharp] Add logging for latency

* [TTS Avatar][Live][Python, CSharp, JS] Fix a bug to correctly clean up audio player

* [TTS Avatar][Live][JavaScript] Output display text with a slower rate, to follow the avatar speaking progress

* Make the display text / speech alignment able for on/off

* [TTS Avatar][Live][CSharp] Output display text with a slower rate, to follow the avatar speaking progress

* Create an auto-deploy file

* Unlink the containerApp yinhew-avatar-app from this repo

* Delete unnecessary file

* [talking avatar][python] Update real time sample to add option to connect with server through WebSocket, and do STT on server side

* [TTS Avatar][Live][js] update sample code for support of setting of background image and remote TURN server URL

* [talking avatar][live][python] make sure host can still start up without AOAI resource

* [Talking Avatar][Live] update sample code to close WS connection in-time, when user closes/refreshes web page, or auto-reconnection is applied

* Some refinement to connection object

* Update csharp sample as well

---------

Co-authored-by: Yulin Li <[email protected]>
  • Loading branch information
yinhew and Yulin Li authored Jan 7, 2025
1 parent cb2e265 commit 4c02fe0
Show file tree
Hide file tree
Showing 10 changed files with 178 additions and 58 deletions.
115 changes: 83 additions & 32 deletions samples/csharp/web/avatar/Controllers/AvatarController.cs
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ public IActionResult BasicView()
}

[HttpGet("chat")]
public ActionResult ChatView()
public IActionResult ChatView()
{
var clientId = _clientService.InitializeClient();
if (chatClient == null)
Expand Down Expand Up @@ -124,9 +124,11 @@ public async Task<IActionResult> ConnectAvatar()
try
{
var clientId = new Guid(Request.Headers["ClientId"]!);

var clientContext = _clientService.GetClientContext(clientId);

// disconnect avatar if already connected
await DisconnectAvatarInternal(clientId);

// Override default values with client provided values
clientContext.AzureOpenAIDeploymentName = Request.Headers["AoaiDeploymentName"].FirstOrDefault() ?? _clientSettings.AzureOpenAIDeploymentName;
clientContext.CognitiveSearchIndexName = Request.Headers["CognitiveSearchIndexName"].FirstOrDefault() ?? _clientSettings.CognitiveSearchIndexName;
Expand Down Expand Up @@ -262,6 +264,18 @@ public async Task<IActionResult> ConnectAvatar()

var connection = Connection.FromSpeechSynthesizer(speechSynthesizer);
connection.SetMessageProperty("speech.config", "context", JsonConvert.SerializeObject(avatarConfig));
connection.Connected += (sender, args) =>
{
Console.WriteLine("TTS Avatar service connected.");
};

connection.Disconnected += (sender, args) =>
{
Console.WriteLine("TTS Avatar service disconnected.");
clientContext.SpeechSynthesizerConnection = null;
};

clientContext.SpeechSynthesizerConnection = connection;

var speechSynthesisResult = speechSynthesizer.SpeakTextAsync("").Result;
Console.WriteLine($"Result ID: {speechSynthesisResult.ResultId}");
Expand All @@ -285,7 +299,7 @@ public async Task<IActionResult> ConnectAvatar()
}

[HttpPost("api/speak")]
public async Task<ActionResult> Speak()
public async Task<IActionResult> Speak()
{
try
{
Expand Down Expand Up @@ -314,7 +328,7 @@ public async Task<ActionResult> Speak()
}

[HttpPost("api/stopSpeaking")]
public async Task<ActionResult> StopSpeaking()
public async Task<IActionResult> StopSpeaking()
{
try
{
Expand Down Expand Up @@ -342,7 +356,7 @@ public async Task<ActionResult> StopSpeaking()
}

[HttpPost("api/chat")]
public async Task<ActionResult> Chat()
public async Task<IActionResult> Chat()
{
// Retrieve and parse the ClientId from headers
var clientIdHeaderValues = Request.Headers["ClientId"];
Expand Down Expand Up @@ -392,7 +406,7 @@ public IActionResult ClearChatHistory()
// Retrieve the client context and clear chat history
var clientContext = _clientService.GetClientContext(clientId);
var systemPrompt = Request.Headers["SystemPrompt"].FirstOrDefault() ?? string.Empty;
InitializeChatContext(systemPrompt, clientId);
InitializeChatContext(systemPrompt, clientId);
clientContext.ChatInitiated = true;

return Ok("Chat history cleared.");
Expand All @@ -404,7 +418,7 @@ public IActionResult ClearChatHistory()
}

[HttpPost("api/disconnectAvatar")]
public IActionResult DisconnectAvatar()
public async Task<IActionResult> DisconnectAvatar()
{
try
{
Expand All @@ -415,21 +429,7 @@ public IActionResult DisconnectAvatar()
return BadRequest("Invalid ClientId");
}

// Retrieve the client context
var clientContext = _clientService.GetClientContext(clientId);

if (clientContext == null)
{
return StatusCode(StatusCodes.Status204NoContent, "Client context not found");
}

var speechSynthesizer = clientContext.SpeechSynthesizer as SpeechSynthesizer;
if (speechSynthesizer != null)
{
var connection = Connection.FromSpeechSynthesizer(speechSynthesizer);
connection.Close();
}

await DisconnectAvatarInternal(clientId);
return Ok("Disconnected avatar");
}
catch (Exception ex)
Expand All @@ -439,7 +439,7 @@ public IActionResult DisconnectAvatar()
}

[HttpGet("api/initializeClient")]
public ActionResult InitializeClient()
public IActionResult InitializeClient()
{
try
{
Expand All @@ -452,7 +452,36 @@ public ActionResult InitializeClient()
}
}

public async Task HandleUserQuery(string userQuery, Guid clientId, HttpResponse httpResponse)
[HttpPost("api/releaseClient")]
public async Task<IActionResult> ReleaseClient()
{
// Extract the client ID from the request body
var clientIdString = string.Empty;
using (var reader = new StreamReader(Request.Body, Encoding.UTF8))
{
clientIdString = JObject.Parse(await reader.ReadToEndAsync()).Value<string>("clientId");
}

if (!Guid.TryParse(clientIdString, out Guid clientId))
{
return BadRequest("Invalid ClientId");
}

try
{
await DisconnectAvatarInternal(clientId);
await Task.Delay(2000); // Wait some time for the connection to close
_clientService.RemoveClient(clientId);
Console.WriteLine($"Client context released for client id {clientId}.");
return Ok($"Client context released for client id {clientId}.");
}
catch (Exception ex)
{
return BadRequest($"Client context release failed for client id {clientId}. Error message: {ex.Message}");
}
}

private async Task HandleUserQuery(string userQuery, Guid clientId, HttpResponse httpResponse)
{
var clientContext = _clientService.GetClientContext(clientId);
var azureOpenaiDeploymentName = clientContext.AzureOpenAIDeploymentName;
Expand Down Expand Up @@ -594,7 +623,7 @@ public void InitializeChatContext(string systemPrompt, Guid clientId)
}

// Speak the given text. If there is already a speaking in progress, add the text to the queue. For chat scenario.
public Task SpeakWithQueue(string text, int endingSilenceMs, Guid clientId, HttpResponse httpResponse)
private Task SpeakWithQueue(string text, int endingSilenceMs, Guid clientId, HttpResponse httpResponse)
{
var clientContext = _clientService.GetClientContext(clientId);

Expand Down Expand Up @@ -636,7 +665,7 @@ public Task SpeakWithQueue(string text, int endingSilenceMs, Guid clientId, Http
return Task.CompletedTask;
}

public async Task<string> SpeakText(string text, string voice, string speakerProfileId, int endingSilenceMs, Guid clientId)
private async Task<string> SpeakText(string text, string voice, string speakerProfileId, int endingSilenceMs, Guid clientId)
{
var escapedText = HttpUtility.HtmlEncode(text);
string ssml;
Expand Down Expand Up @@ -668,7 +697,7 @@ public async Task<string> SpeakText(string text, string voice, string speakerPro
return await SpeakSsml(ssml, clientId);
}

public async Task<string> SpeakSsml(string ssml, Guid clientId)
private async Task<string> SpeakSsml(string ssml, Guid clientId)
{
var clientContext = _clientService.GetClientContext(clientId);

Expand All @@ -695,23 +724,45 @@ public async Task<string> SpeakSsml(string ssml, Guid clientId)
return speechSynthesisResult.ResultId;
}

public async Task StopSpeakingInternal(Guid clientId)
private async Task StopSpeakingInternal(Guid clientId)
{
var clientContext = _clientService.GetClientContext(clientId);

var speechSynthesizer = clientContext.SpeechSynthesizer as SpeechSynthesizer;
var spokenTextQueue = clientContext.SpokenTextQueue;
spokenTextQueue.Clear();

try
{
var connection = Connection.FromSpeechSynthesizer(speechSynthesizer);
await connection.SendMessageAsync("synthesis.control", "{\"action\":\"stop\"}");
var connection = clientContext.SpeechSynthesizerConnection as Connection;
if (connection != null)
{
await connection.SendMessageAsync("synthesis.control", "{\"action\":\"stop\"}");
Console.WriteLine("Stop speaking message sent.");
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}

private async Task DisconnectAvatarInternal(Guid clientId)
{
// Retrieve the client context
var clientContext = _clientService.GetClientContext(clientId);

if (clientContext == null)
{
throw new Exception("Client context not found");
}

await StopSpeakingInternal(clientId);
await Task.Delay(2000); // Wait for the last speech to finish

var connection = clientContext.SpeechSynthesizerConnection as Connection;
if (connection != null)
{
connection.Close();
}
}
}
}
2 changes: 2 additions & 0 deletions samples/csharp/web/avatar/Models/ClientContext.cs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ public class ClientContext

public object? SpeechSynthesizer { get; set; }

public object? SpeechSynthesizerConnection { get; set; }

public string? SpeechToken { get; set; }

public string? IceToken { get; set; }
Expand Down
5 changes: 5 additions & 0 deletions samples/csharp/web/avatar/Services/ClientService.cs
Original file line number Diff line number Diff line change
Expand Up @@ -52,5 +52,10 @@ public ClientContext GetClientContext(Guid clientId)

return context;
}

public void RemoveClient(Guid clientId)
{
_clientContexts.TryRemove(clientId, out _);
}
}
}
2 changes: 2 additions & 0 deletions samples/csharp/web/avatar/Services/IClientService.cs
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,7 @@ public interface IClientService
Guid InitializeClient();

ClientContext GetClientContext(Guid clientId);

void RemoveClient(Guid clientId);
}
}
4 changes: 4 additions & 0 deletions samples/csharp/web/avatar/wwwroot/js/basic.js
Original file line number Diff line number Diff line change
Expand Up @@ -322,3 +322,7 @@ window.updataTransparentBackground = () => {
document.getElementById('backgroundImageUrl').disabled = false
}
}

window.onbeforeunload = () => {
navigator.sendBeacon('/api/releaseClient', JSON.stringify({ clientId: clientId }))
}
4 changes: 4 additions & 0 deletions samples/csharp/web/avatar/wwwroot/js/chat.js
Original file line number Diff line number Diff line change
Expand Up @@ -602,3 +602,7 @@ window.updateLocalVideoForIdle = () => {
document.getElementById('showTypeMessageCheckbox').hidden = false
}
}

window.onbeforeunload = () => {
navigator.sendBeacon('/api/releaseClient', JSON.stringify({ clientId: clientId }))
}
6 changes: 6 additions & 0 deletions samples/js/browser/avatar/js/chat.js
Original file line number Diff line number Diff line change
Expand Up @@ -594,6 +594,12 @@ function checkHung() {
sessionActive = false
if (document.getElementById('autoReconnectAvatar').checked) {
console.log(`[${(new Date()).toISOString()}] The video stream got disconnected, need reconnect.`)
// Release the existing avatar connection
if (avatarSynthesizer !== undefined) {
avatarSynthesizer.close()
}

// Setup a new avatar connection
connectAvatar()
}
}
Expand Down
Loading

0 comments on commit 4c02fe0

Please sign in to comment.