A developer-ready sample demonstrating Microsoft Agent Framework with Foundry Hosted Agents and Model Router. Four specialist agents run concurrently to triage incident signals (alerts, logs, metrics, runbook excerpts) and produce structured JSON output — including root-cause analysis, immediate actions, communications drafts, and a post-incident report.
Live demo video: docs/demo_ui.mp4
Get the sample running locally in under 5 minutes.
| Tool | Required | Install |
|---|---|---|
| Python 3.10+ | Yes | python.org |
Azure CLI (az) |
Yes | Install Azure CLI |
| Microsoft Foundry project | Yes | Create a Foundry project |
| Microsoft Foundry Model Router deployment | Yes | Deploy Model Router |
| Docker Desktop | For deployment only | docker.com |
git clone https://github.com/Azure-Samples/On-Call-Copilot-Multi-Agent.git
cd On-Call-Copilot-Multi-Agent
python -m venv .venvActivate the virtual environment:
# Windows PowerShell
.venv\Scripts\Activate.ps1
# Windows cmd
.venv\Scripts\activate.bat
# Linux / macOS
source .venv/bin/activateInstall dependencies:
pip install -r requirements.txtCopy the template and fill in your Azure values:
# Linux / macOS
cp .env.example .env
# Windows
copy .env.example .envOpen .env and set the hosted-agent project plus the Model Router project. These can be the same Foundry project, but they are separate variables so the hosted agent can run in one project while inference uses a Model Router deployment in another.
AZURE_AI_PROJECT_ENDPOINT=https://<hosted-account>.services.ai.azure.com/api/projects/<hosted-project>
AZURE_MODEL_PROJECT_ENDPOINT=https://<model-account>.services.ai.azure.com/api/projects/<model-project>
AZURE_OPENAI_ENDPOINT=https://<model-account>.cognitiveservices.azure.com/
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=model-router
MODEL_ROUTER_DEPLOYMENT=model-routerWhere to find these values:
Variable Location in Azure Portal AZURE_AI_PROJECT_ENDPOINTMicrosoft Foundry hosted-agent project → Overview → Endpoint AZURE_MODEL_PROJECT_ENDPOINTMicrosoft Foundry model project → Overview → Endpoint AZURE_OPENAI_ENDPOINTAzure AI Services resource → Keys and Endpoint AZURE_OPENAI_API_KEYMicrosoft Foundry Model → Keys and Endpoint AZURE_OPENAI_CHAT_DEPLOYMENT_NAMEMicrosoft Foundry → Deployments (usually model-router)MODEL_ROUTER_DEPLOYMENTSame Model Router deployment name used by scripts and mock telemetry
.envis in.gitignore— your credentials will not be committed.
az loginIf you have multiple subscriptions, select the correct one:
az account set --subscription "<your-subscription-name-or-id>"
The startup script handles venv creation, dependency installation, Azure login, and launches both the agent server and the browser UI:
# Windows PowerShell
.\scripts\start.ps1
# Linux / macOS
bash scripts/start.shThis starts:
- Agent server on
http://localhost:8088 - Browser UI on
http://localhost:7860
Startup script options:
| Flag | PowerShell | Bash | Description |
|---|---|---|---|
| Agent server only | .\scripts\start.ps1 -SkipUI |
bash scripts/start.sh --skip-ui |
Skip the UI server |
| Mock mode | .\scripts\start.ps1 -MockMode |
bash scripts/start.sh --mock |
No Azure credentials needed |
| Skip install | .\scripts\start.ps1 -SkipInstall |
bash scripts/start.sh --skip-install |
Skip pip install step |
What the script does:
- Checks prerequisites (Python, Azure CLI)
- Creates and activates a
.venvvirtual environment- Installs dependencies from
requirements.txt- Creates
.envfrom.env.exampleif missing- Checks Azure login (reads
AZURE_TENANT_IDfrom.envif set)- Starts the agent server and UI server together
Press Ctrl+C to stop both servers.
If you prefer to start the servers separately:
# Terminal 1 — agent server
python main.py
# Listening on http://localhost:8088
# Terminal 2 — browser UI (with venv activated)
python ui/server.py
# Opens at http://localhost:7860The UI lets you load sample incidents, send them to the agent, and view results across all four agent panels.
# PowerShell
.\scripts\test_local.ps1 -Demo 1
# Bash
bash scripts/test_local.sh 1Or use scripts/test_local.http with the VS Code REST Client extension.
flowchart TD
Client["Client<br/>CLI / curl / Foundry UI"]
subgraph Foundry["Foundry Agent Service (Hosted Container)"]
Orchestrator["Agent Framework workflow<br/>(ResponsesHostServer)"]
subgraph Concurrent["Concurrent execution - asyncio.gather()"]
direction LR
Triage["Triage Agent<br/>suspected_root_causes<br/>immediate_actions<br/>missing_information<br/>runbook_alignment"]
Summary["Summary Agent<br/>summary"]
Comms["Comms Agent<br/>comms<br/>(Slack + stakeholder)"]
PIR["PIR Agent<br/>post_incident_report<br/>(timeline, impact, prevention)"]
end
Merge["Merge JSON fragments<br/>+ inject telemetry block"]
end
ModelRouter["Microsoft Foundry Model Router<br/>(single deployment -<br/>routes to best model<br/>per request complexity)"]
Client -->|"POST /responses (Responses input envelope)"| Orchestrator
Orchestrator --> Concurrent
Triage -->|JSON fragment| Merge
Summary -->|JSON fragment| Merge
Comms -->|JSON fragment| Merge
PIR -->|JSON fragment| Merge
Merge -->|"Structured JSON response"| Client
Triage -->|Azure OpenAI API calls| ModelRouter
ModelRouter -->|response| Triage
Summary -->|Azure OpenAI API calls| ModelRouter
ModelRouter -->|response| Summary
Comms -->|Azure OpenAI API calls| ModelRouter
ModelRouter -->|response| Comms
PIR -->|Azure OpenAI API calls| ModelRouter
ModelRouter -->|response| PIR
style Foundry fill:#e8f4fd,stroke:#0078d4,stroke-width:2px
style Concurrent fill:#f0f8e8,stroke:#107c10,stroke-width:1px,stroke-dasharray:5 5
style ModelRouter fill:#fff4e5,stroke:#f7630c,stroke-width:2px
style Orchestrator fill:#dce9f5,stroke:#0078d4
style Merge fill:#dce9f5,stroke:#0078d4
- Request arrives via the Responses API protocol (port 8088)
- The hosted
WorkflowAgentreceives the user message - Four specialist
Agentinstances are created with dedicated instructions fromapp/agents/ ConcurrentBuilderinvokes all four specialists concurrently against Model Router- Each specialist returns a JSON fragment covering its output keys
- The response text contains the specialist JSON fragments for downstream parsing
| Agent | Responsibility | Output Keys |
|---|---|---|
| Triage | Root cause analysis, immediate actions, missing info, runbook alignment | suspected_root_causes, immediate_actions, missing_information, runbook_alignment |
| Summary | Concise incident narrative | summary |
| Comms | Slack update, stakeholder briefing | comms |
| PIR | Post-incident timeline, customer impact, prevention actions | post_incident_report |
Model Router automatically routes each request to the best model based on complexity — no model-selection logic needed in your code:
| Scenario | Complexity | Routing |
|---|---|---|
| Simple alert triage | Low | Faster, cheaper model |
| Multi-signal correlation | High | More capable model |
| Post-incident synthesis | High | High-capability model |
Uses the Agent Framework with four concurrent agents. Requires Azure credentials.
python main.py
# http://localhost:8088Uses the FastAPI server with golden outputs for local schema validation:
# Windows PowerShell
$env:MOCK_MODE="true"; python -m app.main
# Linux / macOS
MOCK_MODE=true python -m app.mainpython scripts/validate.py # all 5 scenarios
python scripts/validate.py --scenario 2 # single scenario| # | File | Description |
|---|---|---|
| 1 | scripts/demos/demo_1_simple_alert.json |
Single 5xx alert — quick triage |
| 2 | scripts/demos/demo_2_multi_signal.json |
3 alerts + logs + metrics — multi-signal correlation |
| 3 | scripts/demos/demo_3_post_incident.json |
Resolved SEV1 TLS cert expiry — full PIR synthesis |
| # | File | Severity | Description |
|---|---|---|---|
| 1 | scenario_1_redis_outage.json | SEV2 | Redis cache cluster unresponsive |
| 2 | scenario_2_aks_scaling.json | SEV1 | Kubernetes node pool scaling failure |
| 3 | scenario_3_dns_cascade.json | SEV1 | DNS resolution failures cascading |
| 4 | scenario_4_minimal_alert.json | SEV4 | Minimal CPU alert on staging |
| 5 | scenario_5_storage_throttle_pir.json | SEV2 | Storage throttling — post-incident review |
python scripts/invoke.py # default prompt
python scripts/invoke.py --demo 1 # built-in demo
python scripts/invoke.py --scenario 2 # built-in scenario
python scripts/invoke.py --prompt "db connection pool exhausted" # custom prompt
python scripts/run_scenarios.py # all scenarios
python scripts/run_scenarios.py --list # list available
python scripts/run_scenarios.py --scenario 3 # single scenarioThe Agent Framework server accepts the Responses protocol. Put the incident payload into the content field as text.
{
"input": [
{
"role": "user",
"content": "{\"incident_id\":\"INC-20260217-001\",\"title\":\"API Gateway 5xx spike\",\"severity\":\"SEV1\"}"
}
]
}{
"incident_id": "INC-20260217-001",
"title": "API Gateway 5xx spike",
"severity": "SEV1",
"timeframe": { "start": "2026-02-17T03:42:00Z", "end": null },
"alerts": [
{ "name": "HighErrorRate", "description": "...", "timestamp": "..." }
],
"logs": [
{ "source": "order-service", "lines": ["ERROR ...", "WARN ..."] }
],
"metrics": [
{ "name": "http_5xx_rate", "window": "5m", "values_summary": "..." }
],
"runbook_excerpt": "Step 1: Check dashboard. Step 2: ...",
"constraints": {
"max_time_minutes": 15,
"environment": "production",
"region": "eastus2"
}
}{
"summary": { "what_happened": "...", "current_status": "..." },
"suspected_root_causes": [{ "hypothesis": "...", "evidence": [], "confidence": 0.0 }],
"immediate_actions": [{ "step": "...", "owner_role": "...", "priority": "P0" }],
"missing_information": [{ "question": "...", "why_it_matters": "..." }],
"runbook_alignment": { "matched_steps": [], "gaps": [] },
"comms": { "slack_update": "...", "stakeholder_update": "..." },
"post_incident_report": { "timeline": [], "customer_impact": "...", "prevention_actions": [] },
"telemetry": { "correlation_id": "...", "model_router_deployment": "...", "selected_model_if_available": null, "tokens_if_available": null }
}For a comprehensive step-by-step deployment, hosting, and migration guide, see docs/HOSTED_AGENT_GUIDE.md.
- Azure CLI 2.80+ and Azure Developer CLI (
azd) 1.23.0+ - Docker Desktop
- A Microsoft Foundry project with required permissions (details)
azd init -t https://github.com/Azure-Samples/azd-ai-starter-basic
azd ai agent init -m agent.yaml
azd upVerify:
az cognitiveservices agent show \
--account-name <your-account-name> \
--project-name <your-project-name> \
--name oncall-copilotClean up: azd down
# 1. Build and push container image
docker build --platform linux/amd64 -t oncall-copilot:v1 .
az acr login --name <your-registry>
docker tag oncall-copilot:v1 <your-registry>.azurecr.io/oncall-copilot:v1
docker push <your-registry>.azurecr.io/oncall-copilot:v1
# 2. Grant project managed identity "Container Registry Repository Reader" on your ACR
# 3. Deploy
export ACR_IMAGE="<your-registry>.azurecr.io/oncall-copilot:v1"
python scripts/deploy_sdk.py
# 4. Verify
python scripts/verify_agent.py
# Clean up
python scripts/deploy_sdk.py --delete- Install the Microsoft Foundry extension (Extensions view → search "Microsoft Foundry" → Install)
- Open Command Palette (
Ctrl+Shift+P) → Microsoft Foundry: Set Default Project - Sign in and select your subscription, resource group, and Foundry project
- Open any demo/scenario JSON file, copy the contents, and paste into the Foundry Agent Playground chat
Click to expand UI screenshots
| Variable | Required | Description |
|---|---|---|
AZURE_OPENAI_ENDPOINT |
Yes | Microsoft Model/ AI Services endpoint |
AZURE_OPENAI_API_KEY |
No | Optional API key for the UI server; the Agent Framework server uses Azure identity |
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME |
Yes | Model Router deployment name (e.g. model-router) |
MODEL_ROUTER_DEPLOYMENT |
Yes | Model Router deployment name used by scripts and telemetry |
AZURE_AI_PROJECT_ENDPOINT |
Yes | Foundry project endpoint for the hosted agent |
AZURE_MODEL_PROJECT_ENDPOINT |
Yes | Foundry project endpoint that contains the Model Router deployment |
AZURE_TENANT_ID |
Recommended | Tenant used by local CLI helper scripts |
AZURE_SUBSCRIPTION_ID |
Recommended | Subscription used for deployment metadata |
AGENT_NAME |
No | Agent name for SDK scripts (default: oncall-copilot) |
AGENT_VERSION |
No | Agent version for SDK scripts (default: latest) |
ACR_IMAGE |
No | ACR image URI for deploy_sdk.py |
MOCK_MODE |
No | Set to true for mock validation without Azure |
LOG_LEVEL |
No | Logging level (default: INFO) |
On-Call-Copilot-Multi-Agent/
├── main.py # Agent Framework entrypoint (hosted agent)
├── agent.yaml # Hosted Agent definition
├── azure.yaml # azd configuration
├── Dockerfile # linux/amd64 container for Foundry
├── requirements.txt
├── .env.example # Environment variable template → copy to .env
├── app/
│ ├── agents/
│ │ ├── triage.py # Triage Agent instructions
│ │ ├── summary.py # Summary Agent instructions
│ │ ├── comms.py # Comms Agent instructions
│ │ └── pir.py # PIR Agent instructions
│ ├── main.py # FastAPI server (mock mode)
│ ├── mock_router.py # Mock model router for validation
│ ├── schemas.py # Input/output JSON schemas
│ └── telemetry.py # OpenTelemetry + structured logging
├── scripts/
│ ├── demos/ # 3 demo payloads
│ ├── scenarios/ # 5 incident scenarios
│ ├── golden_outputs/ # Expected outputs for schema validation
│ ├── validate.py # Schema validation (mock mode)
│ ├── deploy_sdk.py # Deploy agent via Python SDK
│ ├── invoke.py # Invoke deployed agent
│ ├── run_scenarios.py # Batch scenario runner
│ ├── verify_agent.py # Deployment health check
│ └── test_local.* # Local test scripts (http/sh/ps1)
├── ui/
│ ├── index.html # Browser UI
│ └── server.py # UI server (port 7860)
├── infra/
│ └── main.bicep # Azure infrastructure (Bicep)
└── docs/ # Architecture diagrams, screenshots, blog post
| Package | Purpose |
|---|---|
agent-framework |
Core Agent abstraction and workflow integration |
agent-framework-foundry-hosting |
ResponsesHostServer for Foundry hosted-agent protocol hosting |
agent-framework-orchestrations |
ConcurrentBuilder orchestration for the four specialist agents |
agent-framework-foundry |
FoundryChatClient integration with Foundry Model Router |
azure-identity |
DefaultAzureCredential for Azure OpenAI bearer tokens |
python-dotenv |
Auto-load .env file at startup |
| Signal | Implementation |
|---|---|
| Structured logs | JSON via Python logging; each request logs correlation_id, incident_id, severity |
| Correlation IDs | UUID per request, in X-Correlation-ID header and output telemetry block |
| OTel spans | Spans around handle_responses, validate_input, call_model_router, validate_output |
| OTLP export | Set OTEL_EXPORTER_OTLP_ENDPOINT to ship traces to Jaeger / Azure Monitor |
- Secret redaction — regex-based scrubbing of credential patterns before they reach the model
- No hallucination — system prompt sets
confidence: 0and populatesmissing_informationwhen data is insufficient - JSON-only output —
response_format: json_objectwith schema validation and fallback - Unknowns marked — literal
"UNKNOWN"for undeterminable fields
| Error | HTTP | Fix |
|---|---|---|
SubscriptionIsNotRegistered |
400 | Register the subscription provider |
InvalidAcrPullCredentials |
401 | Fix managed identity or registry RBAC |
UnauthorizedAcrPull |
403 | Assign Container Registry Repository Reader to project identity |
AcrImageNotFound |
404 | Correct image name/tag or push image to ACR |
RegistryNotFound |
400/404 | Fix registry DNS or network reachability |
| Gateway 400 "ID cannot be null" | 400 | Avoid "Title: CapName." pattern in prompts |
For local validation issues, run python scripts/validate.py with MOCK_MODE=true.
See CONTRIBUTING.md for development setup, code style guidelines, and the PR checklist.
See SECURITY.md for the security policy and how to report a vulnerability.
MIT — see LICENSE.











