Skip to content

Commit 83fc7ae

Browse files
authored
Merge pull request #124 from yulin-li/yulin/voice-live-update
update voice live agent sample for FDP projects
2 parents 5460a66 + 652871a commit 83fc7ae

File tree

5 files changed

+69
-188
lines changed

5 files changed

+69
-188
lines changed
Lines changed: 35 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
# Voice Live Agent
22

3-
This sample showcases how to voice-enable any agents built with Azure AI Foundry Agent Service, utilizing Azure AI Voice Live API.
3+
This sample showcases how to voice-enable any agents built with Azure AI Foundry Agent Service, utilizing Azure AI Voice Live API.
44

5-
**IMPORTANT NOTE:** Starter templates, instructions, code samples and resources in this msft-agent-samples file (“samples”) are designed to assist in accelerating development of agents for specific scenarios. It is important that you review all provided resources and carefully test Agent behavior in the context of your use case: ([Learn More](https://learn.microsoft.com/en-us/legal/cognitive-services/agents/transparency-note?context=%2Fazure%2Fai-services%2Fagents%2Fcontext%2Fcontext)).
5+
**IMPORTANT NOTE:** Starter templates, instructions, code samples and resources in this msft-agent-samples file (“samples”) are designed to assist in accelerating development of agents for specific scenarios. It is important that you review all provided resources and carefully test Agent behavior in the context of your use case: ([Learn More](https://learn.microsoft.com/en-us/legal/cognitive-services/agents/transparency-note?context=%2Fazure%2Fai-services%2Fagents%2Fcontext%2Fcontext)).
66

7-
Certain Agent offerings may be subject to legal and regulatory requirements, may require licenses, or may not be suitable for all industries, scenarios, or use cases. By using any sample, you are acknowledging that Agents or other output created using that sample are solely your responsibility, and that you will comply with all applicable laws, regulations, and relevant safety standards, terms of service, and codes of conduct.
7+
Certain Agent offerings may be subject to legal and regulatory requirements, may require licenses, or may not be suitable for all industries, scenarios, or use cases. By using any sample, you are acknowledging that Agents or other output created using that sample are solely your responsibility, and that you will comply with all applicable laws, regulations, and relevant safety standards, terms of service, and codes of conduct.
88

99
## Use cases
1010

11-
Voice-enabled agents are high in demand, now more than ever. Voice agents are agents that users can interact with naturally and conversationally using just their voice. From an end-user perspective, voice is becoming the preferred mode of interaction, as it enables speed, accessibility, and multitasking.
11+
Voice-enabled agents are high in demand, now more than ever. Voice agents are agents that users can interact with naturally and conversationally using just their voice. From an end-user perspective, voice is becoming the preferred mode of interaction, as it enables speed, accessibility, and multitasking.
1212

1313
We see increasing demand across several key use-cases including:
1414

15-
**Customer service** – think about getting support from your favorite department store, your bank, your travel agency, or even your government;
15+
**Customer service** – think about getting support from your favorite department store, your bank, your travel agency, or even your government;
1616

1717
**Automotive** – think about in-car assistants with hands-free interaction;
1818

@@ -26,28 +26,28 @@ The system consists of:
2626

2727
- An AI Agent created with Azure AI Agent Service. You can create an agent using any of the templates provided in agent-catalog/azure-ai-agent-service-blueprints at main · microsoft/agent-catalog
2828

29-
- An Azure Voice Live API request. You can set up your Voice Live API request following the instructions in this document and code sample.
29+
- An Azure Voice Live API request. You can set up your Voice Live API request following the instructions in this document and code sample.
3030

3131
```text
32-
+-----------------+
33-
| User Query |
34-
| (Speech Input) |
35-
+-------+---------+
36-
|
37-
v
38-
+---------------------+ invokes +----------------------------+
39-
| Voice Live Agent | ------------------------> | Azure AI Agent |
40-
| | <------------------------ | + Knowledge + Actions |
41-
+---------------------+ results +----------------------------+
42-
|
43-
v
44-
+-------------------+
45-
| Agent Response |
46-
| (Speech Output) |
47-
+-------------------+
32+
+-----------------+
33+
| User Query |
34+
| (Speech Input) |
35+
+-------+---------+
36+
|
37+
v
38+
+---------------------+ invokes +----------------------------+
39+
| Voice Live Agent | ------------------------> | Azure AI Agent |
40+
| | <------------------------ | + Knowledge + Actions |
41+
+---------------------+ results +----------------------------+
42+
|
43+
v
44+
+-------------------+
45+
| Agent Response |
46+
| (Speech Output) |
47+
+-------------------+
4848
```
4949

50-
## Voice Live API introduction
50+
## Voice Live API introduction
5151

5252
Azure AI Voice Live API (preview) is an innovative, unified single API that enables streaming interactions with the foundation model of your choice, for both speech input and output. It includes advanced features such as customizable speech recognition, diverse text-to-speech options, brand voices, avatars, audio enhancement, among other functionalities. With Voice Live API, you can add real-time speech interaction capabilities to any agent built with the Azure AI Agent Service.
5353

@@ -61,35 +61,35 @@ A live demo (<https://aka.ms/voice-agent/demo>) is also available to experience
6161

6262
### Prerequisites
6363

64-
**Set up an agent**. Follow the templates provided in agent-catalog/azure-ai-agent-service-blueprints at main · microsoft/agent-catalog to create your agent.
64+
**Set up an agent**. Follow [the templates](../) to create an agent using the Azure AI Agent Service.
6565

6666
**Resource and authentication**. An Azure AI Foundry resource is required to access the Voice Live API. To learn how to create an Azure AI Foundry resource, please see: <https://learn.microsoft.com/azure/ai-services/multi-service-resource>.
6767

6868
Note: The resource must be in the `eastus2` or `swedencentral` regions at this time. Other regions are not supported.
6969

70-
@@ -31,25 +87,24 @@ We support two authentication methods for the Voice Live API:
71-
7270
For the recommended keyless authentication with Microsoft Entra ID, you need to:
7371

74-
- Assign the `Azure AI User` role to your user account or a managed identity. You can assign roles in the Azure portal under **Access control (IAM)**
75-
76-
**Add role assignment**.
72+
- Assign the `Azure AI User` role to your user account or a managed identity. You can assign roles in the Azure portal under **Access control (IAM)** > **Add role assignment**.
7773
- Generate a token using the Azure CLI or Azure SDKs. The token must be generated with the `https://ai.azure.com/.default` scope.
7874
- Use the token in the `Authorization` header of the WebSocket connection request, with the format `Bearer <token>`.
7975

8076
## Set Agent Info
8177

8278
You are supposed to specify the agent info in the WebSocket endpoint URL.
8379

84-
| Parameter | Description |
85-
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
86-
| `agent-project-name` | The Azure AI project name which the agent belongs to. |
87-
| `agent-id` | The ID of the agent to use. |
80+
| Parameter | Description |
81+
| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
82+
| `agent-project-name` | The Azure AI project name which the agent belongs to. |
83+
| `agent-id` | The ID of the agent to use. |
8884
| `agent-access-token` | The Entra access token to access the agent. Make sure the identity has access to Azure AI Project, You can grant the built-in role `Azure AI User` to the identity. The scope should be `https://ai.azure.com/.default`. |
8985

9086
> Note: The token must be generated with the `https://ai.azure.com/.default` scope. e.g., `az account get-access-token --resource https://ai.azure.com --query accessToken -o tsv`.
91-
A sample endpoint is `wss://<custom-domain>.cognitiveservices.azure.com/voice-agent/realtime?api-version=2025-05-01-preview&agent-project-name=<agent-project-name>&agent-id=<agent-id>&agent-access-token=<access-token>`.
87+
A sample endpoint is `wss://<your-ai-foundry-resource-name>.cognitiveservices.azure.com/voice-live/realtime?api-version=2025-05-01-preview&agent-project-name=<agent-project-name>&agent-id=<agent-id>&agent-access-token=<access-token>`.
9288
9389
## Interact with the Voice Live API
9490

95-
Refer to the [full documentation of Voice Live API](https://learn.microsoft.com/azure/ai-services/speech-service/voice-live) for more details on how to interact with the Voice Live API.
91+
Refer to the [full documentation of Voice Live API](https://learn.microsoft.com/azure/ai-services/<placeholder>) for more details on how to interact with the Voice Live API.
92+
93+
## Getting started
94+
95+
Follow the instructions [here](./samples/react/README.md) to get started with this sample.

samples/agent-catalog/msft-agent-samples/foundry-agent-service-sdk/voice-live-agent/samples/react/package-lock.json

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

samples/agent-catalog/msft-agent-samples/foundry-agent-service-sdk/voice-live-agent/samples/react/src/app/chat-interface.tsx

Lines changed: 31 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -184,12 +184,16 @@ const readme = `
184184
- The endpoint can be the regional endpoint (e.g., \`https://<region>.api.cognitive.microsoft.com/\`) or a custom domain endpoint (e.g., \`https://<custom-domain>.cognitiveservices.azure.com/\`).
185185
- The resource must be in the \`eastus2\` or \`swedencentral\` region. Other regions are not supported.
186186
187+
2. **(Optional) Set the Agent**
188+
- Set the project name and agent ID to connect to a specific agent.
189+
- Entra ID auth is required for agent mode, use \`az account get-access-token --resource https://ai.azure.com --query accessToken -o tsv\` to get the token.
190+
187191
2. **Select noise suppression or echo cancellation**
188192
- Enable noise suppression and/or echo cancellation to improve audio quality.
189193
190194
3. **Select the Turn Detection**
191195
- Choose the desired turn detection method. The default is \`Server VAD\`, which uses server-side voice activity detection.
192-
- The \`Server SD (reduced false alarms)\` option is also available for better performance.
196+
- The \`Azure Semantic VAD\` option is also available for better performance.
193197
194198
4. **Select the Voice**
195199
- Choose the desired voice from the list.
@@ -410,9 +414,9 @@ const ChatInterface = () => {
410414

411415
// Add mode state and agent fields
412416
const [mode, setMode] = useState<"model" | "agent">("model");
413-
const [agentConnectionString, setAgentConnectionString] = useState("");
417+
const [agentProjectName, setAgentProjectName] = useState("");
414418
const [agentId, setAgentId] = useState("");
415-
const [agentAccessToken, setAgentAccessToken] = useState("");
419+
// const [agentAccessToken, setAgentAccessToken] = useState("");
416420
const [agents, setAgents] = useState<{ id: string; name: string }[]>([]);
417421
const [isMobile, setIsMobile] = useState(false);
418422

@@ -448,9 +452,9 @@ const ChatInterface = () => {
448452
setPredefinedScenarios(config.pre_defined_scenarios);
449453
}
450454
// Parse agent configs from /config
451-
if (config.agent && config.agent.connection_string) {
452-
setAgentAccessToken(config.agent.access_token);
453-
setAgentConnectionString(config.agent.connection_string);
455+
if (config.agent && config.agent.project_name) {
456+
// setAgentAccessToken(config.agent.access_token);
457+
setAgentProjectName(config.agent.project_name);
454458
if (Array.isArray(config.agent.agents)) {
455459
setAgents(config.agent.agents);
456460
// If only one agent, auto-select it
@@ -520,8 +524,8 @@ const ChatInterface = () => {
520524
? {
521525
modelOrAgent: {
522526
agentId,
523-
agentConnectionString,
524-
agentAccessToken,
527+
projectName: agentProjectName,
528+
agentAccessToken: entraToken,
525529
},
526530
apiVersion: "2025-05-01-preview",
527531
}
@@ -1174,6 +1178,7 @@ const ChatInterface = () => {
11741178
"gpt-4.1",
11751179
"gpt-4.1-mini",
11761180
"gpt-4.1-nano",
1181+
"phi4-mini",
11771182
];
11781183
return cascadedModels.includes(model);
11791184
}
@@ -1253,24 +1258,33 @@ const ChatInterface = () => {
12531258
onChange={(e) => setEndpoint(e.target.value)}
12541259
disabled={isConnected || configLoaded}
12551260
/>
1256-
{!configLoaded && (
1261+
{(!configLoaded && mode === "model") && (
12571262
<Input
12581263
placeholder="Subscription Key"
12591264
value={apiKey}
12601265
onChange={(e) => setApiKey(e.target.value)}
12611266
disabled={isConnected}
12621267
/>
12631268
)}
1269+
{ mode === "agent" && (
1270+
<Input
1271+
placeholder="Entra Token"
1272+
value={entraToken}
1273+
onChange={(e) => setEntraToken(e.target.value)}
1274+
disabled={isConnected}
1275+
/>
1276+
)}
1277+
{/* Entra token input */}
12641278
{/* Show agent fields if agent mode */}
12651279
{mode === "agent" ? (
12661280
<>
12671281
<div className="space-y-2">
12681282
<label className="text-sm font-medium">Agent</label>
12691283
</div>
12701284
<Input
1271-
placeholder="Agent Connection String"
1272-
value={agentConnectionString}
1273-
onChange={(e) => setAgentConnectionString(e.target.value)}
1285+
placeholder="Agent Project Name"
1286+
value={agentProjectName}
1287+
onChange={(e) => setAgentProjectName(e.target.value)}
12741288
disabled={isConnected}
12751289
/>
12761290
{/* Agent ID as Select if agents available, else Input */}
@@ -1299,12 +1313,12 @@ const ChatInterface = () => {
12991313
disabled={isConnected}
13001314
/>
13011315
)}
1302-
<Input
1316+
{/* <Input
13031317
placeholder="Agent Access Token"
13041318
value={agentAccessToken}
13051319
onChange={(e) => setAgentAccessToken(e.target.value)}
13061320
disabled={isConnected}
1307-
/>
1321+
/> */}
13081322
</>
13091323
) : (
13101324
<>
@@ -1343,6 +1357,9 @@ const ChatInterface = () => {
13431357
<SelectItem value="phi4-mm">
13441358
Phi4-MM Realtime
13451359
</SelectItem>
1360+
<SelectItem value="phi4-mini">
1361+
Phi4 Mini (Cascaded)
1362+
</SelectItem>
13461363
</SelectContent>
13471364
</Select>
13481365
</div>

0 commit comments

Comments
 (0)