Skip to content

Commit 4c02fe0

Browse files
yinhewYulin Li
andauthored
[Talking Avatar][Live] update sample code to close WS connection in-time, when user closes/refreshes web page, or auto-reconnection is applied (#2713)
* [TalkingAvatar] Add sample code for TTS talking avatar real-time API * sample codes for batch avatar synthesis * Address repository check failure * update * [Avatar] Update real time avatar sample code to support multi-lingual * [avatar] update real time avatar chat sample to receive GPT response streamingly * [Live Avatar] update chat sample to make some refinements * [TTS Avatar] Update real-time sample to support 1. non-continuous recognition mode 2. a button to stop speaking 3. user can type query without speech * [TTS Avatar] Update real time avatar sample to support auto-reconnect * Don't reset message history when re-connecting * [talking avatar] update real time sample to support using cached local video for idle status, to help save customer cost * Update chat.html and README.md * Update batch avatar sample to use mp4 as default format, to avoid defaultly showing slow speed with vp9 * A minor refinement * Some refinement * Some bug fixing * Refine the reponse receiving logic for AOAI streaming mode, to make it more robust * [Talking Avatar] update real-time sample code to log result id (turn id) for ease of debugging * [Talking Avatar] Update avatar live chat sample, to upgrade AOAI API version from 2023-03-15-preview to 2023-12-01-preview * [Talking Avatar][Live Chat] Update AOAI API to be long term support version 2023-06-01-preview * [Talking Avatar] Add real time avatar sample code for server/client hybrid web app, with server code written in python * Some refinements * Add README.md * Fix repo check failure: files that are neither marked as binary nor text, please extend .gitattributes * [Python][TTS Avatar] Add chat sample * [Python][TTS Avatar] Add chat sample - continue * Support multiple clients management * Update README.md * [Python][TTS Avatar] Support customized ICE server * [Talking Avatar][Python] Support stop speaking * Tolerat speech sdk to unsupport sending message with connection * [Python][TTS Avatar] Send local SDP as post body instead of header, to avoid header size over limit * [python][avatar] update requirements.txt to add the missing dependencies * [python][avatar] update real-time sample to make auto-connection more smoothy * [Python][Avatar] Fix some small bugs * [python][avatar] Support AAD authorization on private endpoint * [Java][Android][Avatar] Add Android sample code for real time avatar * Code refinement * More refinement * More refinement * Update README.md * [Java][Android][Avatar] Remove AddStream method, which is not available with Unified Plan SDP semantics, and use AddTrack per suggestion * [Python][Avatar][Live] Get speaking status from WebRTC event, and remove the checkSpeakingStatus API from backend code * [Java][Android][Live Avatar] Update the sample to demonstrate switching audio output device to loud speaker * [Python][Avatar][Live] Switch from REST API to SDK for calling AOAI * [Python][Avatar][Live] Trigger barging at first recognizing event which is earlier * [Python][Avatar][Live] Enable continuous conversation by default * [Python][Avatar][Live] Disable multi-lingual by default for better latency * [Python][Avatar][Live] Configure shorter segmentation silence timeout for quicker SR * [Live Avatar][Python, CSharp] Add logging for latency * [TTS Avatar][Live][Python, CSharp, JS] Fix a bug to correctly clean up audio player * [TTS Avatar][Live][JavaScript] Output display text with a slower rate, to follow the avatar speaking progress * Make the display text / speech alignment able for on/off * [TTS Avatar][Live][CSharp] Output display text with a slower rate, to follow the avatar speaking progress * Create an auto-deploy file * Unlink the containerApp yinhew-avatar-app from this repo * Delete unnecessary file * [talking avatar][python] Update real time sample to add option to connect with server through WebSocket, and do STT on server side * [TTS Avatar][Live][js] update sample code for support of setting of background image and remote TURN server URL * [talking avatar][live][python] make sure host can still start up without AOAI resource * [Talking Avatar][Live] update sample code to close WS connection in-time, when user closes/refreshes web page, or auto-reconnection is applied * Some refinement to connection object * Update csharp sample as well --------- Co-authored-by: Yulin Li <[email protected]>
1 parent cb2e265 commit 4c02fe0

File tree

10 files changed

+178
-58
lines changed

10 files changed

+178
-58
lines changed

samples/csharp/web/avatar/Controllers/AvatarController.cs

Lines changed: 83 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ public IActionResult BasicView()
4444
}
4545

4646
[HttpGet("chat")]
47-
public ActionResult ChatView()
47+
public IActionResult ChatView()
4848
{
4949
var clientId = _clientService.InitializeClient();
5050
if (chatClient == null)
@@ -124,9 +124,11 @@ public async Task<IActionResult> ConnectAvatar()
124124
try
125125
{
126126
var clientId = new Guid(Request.Headers["ClientId"]!);
127-
128127
var clientContext = _clientService.GetClientContext(clientId);
129128

129+
// disconnect avatar if already connected
130+
await DisconnectAvatarInternal(clientId);
131+
130132
// Override default values with client provided values
131133
clientContext.AzureOpenAIDeploymentName = Request.Headers["AoaiDeploymentName"].FirstOrDefault() ?? _clientSettings.AzureOpenAIDeploymentName;
132134
clientContext.CognitiveSearchIndexName = Request.Headers["CognitiveSearchIndexName"].FirstOrDefault() ?? _clientSettings.CognitiveSearchIndexName;
@@ -262,6 +264,18 @@ public async Task<IActionResult> ConnectAvatar()
262264

263265
var connection = Connection.FromSpeechSynthesizer(speechSynthesizer);
264266
connection.SetMessageProperty("speech.config", "context", JsonConvert.SerializeObject(avatarConfig));
267+
connection.Connected += (sender, args) =>
268+
{
269+
Console.WriteLine("TTS Avatar service connected.");
270+
};
271+
272+
connection.Disconnected += (sender, args) =>
273+
{
274+
Console.WriteLine("TTS Avatar service disconnected.");
275+
clientContext.SpeechSynthesizerConnection = null;
276+
};
277+
278+
clientContext.SpeechSynthesizerConnection = connection;
265279

266280
var speechSynthesisResult = speechSynthesizer.SpeakTextAsync("").Result;
267281
Console.WriteLine($"Result ID: {speechSynthesisResult.ResultId}");
@@ -285,7 +299,7 @@ public async Task<IActionResult> ConnectAvatar()
285299
}
286300

287301
[HttpPost("api/speak")]
288-
public async Task<ActionResult> Speak()
302+
public async Task<IActionResult> Speak()
289303
{
290304
try
291305
{
@@ -314,7 +328,7 @@ public async Task<ActionResult> Speak()
314328
}
315329

316330
[HttpPost("api/stopSpeaking")]
317-
public async Task<ActionResult> StopSpeaking()
331+
public async Task<IActionResult> StopSpeaking()
318332
{
319333
try
320334
{
@@ -342,7 +356,7 @@ public async Task<ActionResult> StopSpeaking()
342356
}
343357

344358
[HttpPost("api/chat")]
345-
public async Task<ActionResult> Chat()
359+
public async Task<IActionResult> Chat()
346360
{
347361
// Retrieve and parse the ClientId from headers
348362
var clientIdHeaderValues = Request.Headers["ClientId"];
@@ -392,7 +406,7 @@ public IActionResult ClearChatHistory()
392406
// Retrieve the client context and clear chat history
393407
var clientContext = _clientService.GetClientContext(clientId);
394408
var systemPrompt = Request.Headers["SystemPrompt"].FirstOrDefault() ?? string.Empty;
395-
InitializeChatContext(systemPrompt, clientId);
409+
InitializeChatContext(systemPrompt, clientId);
396410
clientContext.ChatInitiated = true;
397411

398412
return Ok("Chat history cleared.");
@@ -404,7 +418,7 @@ public IActionResult ClearChatHistory()
404418
}
405419

406420
[HttpPost("api/disconnectAvatar")]
407-
public IActionResult DisconnectAvatar()
421+
public async Task<IActionResult> DisconnectAvatar()
408422
{
409423
try
410424
{
@@ -415,21 +429,7 @@ public IActionResult DisconnectAvatar()
415429
return BadRequest("Invalid ClientId");
416430
}
417431

418-
// Retrieve the client context
419-
var clientContext = _clientService.GetClientContext(clientId);
420-
421-
if (clientContext == null)
422-
{
423-
return StatusCode(StatusCodes.Status204NoContent, "Client context not found");
424-
}
425-
426-
var speechSynthesizer = clientContext.SpeechSynthesizer as SpeechSynthesizer;
427-
if (speechSynthesizer != null)
428-
{
429-
var connection = Connection.FromSpeechSynthesizer(speechSynthesizer);
430-
connection.Close();
431-
}
432-
432+
await DisconnectAvatarInternal(clientId);
433433
return Ok("Disconnected avatar");
434434
}
435435
catch (Exception ex)
@@ -439,7 +439,7 @@ public IActionResult DisconnectAvatar()
439439
}
440440

441441
[HttpGet("api/initializeClient")]
442-
public ActionResult InitializeClient()
442+
public IActionResult InitializeClient()
443443
{
444444
try
445445
{
@@ -452,7 +452,36 @@ public ActionResult InitializeClient()
452452
}
453453
}
454454

455-
public async Task HandleUserQuery(string userQuery, Guid clientId, HttpResponse httpResponse)
455+
[HttpPost("api/releaseClient")]
456+
public async Task<IActionResult> ReleaseClient()
457+
{
458+
// Extract the client ID from the request body
459+
var clientIdString = string.Empty;
460+
using (var reader = new StreamReader(Request.Body, Encoding.UTF8))
461+
{
462+
clientIdString = JObject.Parse(await reader.ReadToEndAsync()).Value<string>("clientId");
463+
}
464+
465+
if (!Guid.TryParse(clientIdString, out Guid clientId))
466+
{
467+
return BadRequest("Invalid ClientId");
468+
}
469+
470+
try
471+
{
472+
await DisconnectAvatarInternal(clientId);
473+
await Task.Delay(2000); // Wait some time for the connection to close
474+
_clientService.RemoveClient(clientId);
475+
Console.WriteLine($"Client context released for client id {clientId}.");
476+
return Ok($"Client context released for client id {clientId}.");
477+
}
478+
catch (Exception ex)
479+
{
480+
return BadRequest($"Client context release failed for client id {clientId}. Error message: {ex.Message}");
481+
}
482+
}
483+
484+
private async Task HandleUserQuery(string userQuery, Guid clientId, HttpResponse httpResponse)
456485
{
457486
var clientContext = _clientService.GetClientContext(clientId);
458487
var azureOpenaiDeploymentName = clientContext.AzureOpenAIDeploymentName;
@@ -594,7 +623,7 @@ public void InitializeChatContext(string systemPrompt, Guid clientId)
594623
}
595624

596625
// Speak the given text. If there is already a speaking in progress, add the text to the queue. For chat scenario.
597-
public Task SpeakWithQueue(string text, int endingSilenceMs, Guid clientId, HttpResponse httpResponse)
626+
private Task SpeakWithQueue(string text, int endingSilenceMs, Guid clientId, HttpResponse httpResponse)
598627
{
599628
var clientContext = _clientService.GetClientContext(clientId);
600629

@@ -636,7 +665,7 @@ public Task SpeakWithQueue(string text, int endingSilenceMs, Guid clientId, Http
636665
return Task.CompletedTask;
637666
}
638667

639-
public async Task<string> SpeakText(string text, string voice, string speakerProfileId, int endingSilenceMs, Guid clientId)
668+
private async Task<string> SpeakText(string text, string voice, string speakerProfileId, int endingSilenceMs, Guid clientId)
640669
{
641670
var escapedText = HttpUtility.HtmlEncode(text);
642671
string ssml;
@@ -668,7 +697,7 @@ public async Task<string> SpeakText(string text, string voice, string speakerPro
668697
return await SpeakSsml(ssml, clientId);
669698
}
670699

671-
public async Task<string> SpeakSsml(string ssml, Guid clientId)
700+
private async Task<string> SpeakSsml(string ssml, Guid clientId)
672701
{
673702
var clientContext = _clientService.GetClientContext(clientId);
674703

@@ -695,23 +724,45 @@ public async Task<string> SpeakSsml(string ssml, Guid clientId)
695724
return speechSynthesisResult.ResultId;
696725
}
697726

698-
public async Task StopSpeakingInternal(Guid clientId)
727+
private async Task StopSpeakingInternal(Guid clientId)
699728
{
700729
var clientContext = _clientService.GetClientContext(clientId);
701-
702-
var speechSynthesizer = clientContext.SpeechSynthesizer as SpeechSynthesizer;
703730
var spokenTextQueue = clientContext.SpokenTextQueue;
704731
spokenTextQueue.Clear();
705732

706733
try
707734
{
708-
var connection = Connection.FromSpeechSynthesizer(speechSynthesizer);
709-
await connection.SendMessageAsync("synthesis.control", "{\"action\":\"stop\"}");
735+
var connection = clientContext.SpeechSynthesizerConnection as Connection;
736+
if (connection != null)
737+
{
738+
await connection.SendMessageAsync("synthesis.control", "{\"action\":\"stop\"}");
739+
Console.WriteLine("Stop speaking message sent.");
740+
}
710741
}
711742
catch (Exception ex)
712743
{
713744
Console.WriteLine(ex.Message);
714745
}
715746
}
747+
748+
private async Task DisconnectAvatarInternal(Guid clientId)
749+
{
750+
// Retrieve the client context
751+
var clientContext = _clientService.GetClientContext(clientId);
752+
753+
if (clientContext == null)
754+
{
755+
throw new Exception("Client context not found");
756+
}
757+
758+
await StopSpeakingInternal(clientId);
759+
await Task.Delay(2000); // Wait for the last speech to finish
760+
761+
var connection = clientContext.SpeechSynthesizerConnection as Connection;
762+
if (connection != null)
763+
{
764+
connection.Close();
765+
}
766+
}
716767
}
717768
}

samples/csharp/web/avatar/Models/ClientContext.cs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@ public class ClientContext
2828

2929
public object? SpeechSynthesizer { get; set; }
3030

31+
public object? SpeechSynthesizerConnection { get; set; }
32+
3133
public string? SpeechToken { get; set; }
3234

3335
public string? IceToken { get; set; }

samples/csharp/web/avatar/Services/ClientService.cs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,5 +52,10 @@ public ClientContext GetClientContext(Guid clientId)
5252

5353
return context;
5454
}
55+
56+
public void RemoveClient(Guid clientId)
57+
{
58+
_clientContexts.TryRemove(clientId, out _);
59+
}
5560
}
5661
}

samples/csharp/web/avatar/Services/IClientService.cs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,7 @@ public interface IClientService
1212
Guid InitializeClient();
1313

1414
ClientContext GetClientContext(Guid clientId);
15+
16+
void RemoveClient(Guid clientId);
1517
}
1618
}

samples/csharp/web/avatar/wwwroot/js/basic.js

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -322,3 +322,7 @@ window.updataTransparentBackground = () => {
322322
document.getElementById('backgroundImageUrl').disabled = false
323323
}
324324
}
325+
326+
window.onbeforeunload = () => {
327+
navigator.sendBeacon('/api/releaseClient', JSON.stringify({ clientId: clientId }))
328+
}

samples/csharp/web/avatar/wwwroot/js/chat.js

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -602,3 +602,7 @@ window.updateLocalVideoForIdle = () => {
602602
document.getElementById('showTypeMessageCheckbox').hidden = false
603603
}
604604
}
605+
606+
window.onbeforeunload = () => {
607+
navigator.sendBeacon('/api/releaseClient', JSON.stringify({ clientId: clientId }))
608+
}

samples/js/browser/avatar/js/chat.js

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -594,6 +594,12 @@ function checkHung() {
594594
sessionActive = false
595595
if (document.getElementById('autoReconnectAvatar').checked) {
596596
console.log(`[${(new Date()).toISOString()}] The video stream got disconnected, need reconnect.`)
597+
// Release the existing avatar connection
598+
if (avatarSynthesizer !== undefined) {
599+
avatarSynthesizer.close()
600+
}
601+
602+
// Setup a new avatar connection
597603
connectAvatar()
598604
}
599605
}

0 commit comments

Comments
 (0)