β Back to README | Prerequisites & Setup | Architecture Overview | AI Toolkit variant β
This step-by-step guide walks you through adding OpenTelemetry (OTEL) distributed tracing to the Children's Story Studio backend and viewing traces in the .NET Aspire Dashboard β a Docker-based trace viewer. By the end, you'll be able to see the full lifecycle of every story generation request β across all five agents β in a visual trace waterfall.
Prefer staying in VS Code? The AI Toolkit variant of this guide shows the same OTEL instrumentation but uses the AI Toolkit for VS Code to view traces directly in your editor β no Docker required.
Note: Unlike the Activity Page Agents and TTS guides, this guide does not use GitHub Copilot to generate the implementation. Every code change is provided directly β you'll copy the code, understand what it does, and wire it in yourself.
- What You'll Build
- Why This Matters
- Before You Start
- Step 1: Start the Aspire Dashboard
- Step 2: Add OTEL Python Packages
- Step 3: Add OTEL Settings
- Step 4: Create the Telemetry Module
- Step 5: Wire Telemetry into the App
- Step 6: Generate a Story and View Traces
- What to Look For
- Alternative: Exporting to Azure Application Insights
- Troubleshooting
After completing this guide, every story generation request will emit distributed traces that flow through the entire multi-agent workflow:
POST /api/generate-story β FastAPI auto-instrumented span
β
βββ Workflow: run β Agent Framework workflow span
β βββ Executor: orchestrator β Agent Framework executor spans
β β βββ LLM: chat/completions
β βββ Executor: story_architect
β β βββ LLM: chat/completions
β βββ Executor: art_director
β β βββ LLM: chat/completions
β β βββ Image: generate (page 1) β Parallel image generation
β β βββ Image: generate (page 2)
β β βββ ...
β βββ Executor: story_reviewer
β β βββ LLM: chat/completions
β βββ Executor: decision
β
βββ Response streamed via SSE
These traces are exported via the OpenTelemetry Protocol (OTLP) to the .NET Aspire Dashboard β a lightweight, local trace viewer that runs in Docker. No Azure resources are required.
| Concept | What It Shows |
|---|---|
| Distributed Tracing for Agents | How OTEL traces flow through a multi-agent workflow, giving visibility into every executor invocation |
| Per-Agent Latency Analysis | Which agents are the bottleneck β e.g., image generation in ArtDirector vs. LLM calls in StoryArchitect |
| Debugging Revision Loops | When the StoryReviewer rejects a draft, the revision loop back to Orchestrator is clearly visible as repeated spans |
| LLM Call Correlation | Each LLM call is nested under its parent executor, showing token usage and response times in context |
| Production Observability Patterns | The same OTEL setup works with any OTLP-compatible backend β Aspire today, Application Insights or Jaeger tomorrow |
| Zero-Code Auto-Instrumentation | Agent Framework and FastAPI both emit spans automatically when a TracerProvider is configured β no manual span creation needed |
NOTE: If you do not want to implement OTEL observability in the app manually, feel free to experiment with prompting GitHub Copilot to follow the documentation in this file to perform the implementation.
Ensure you've completed all steps in Prerequisites & Setup and that the base application is working (Running the Demo).
The Aspire Dashboard runs as a Docker container. If you don't already have Docker Desktop installed:
- macOS / Windows: Download from docker.com/products/docker-desktop
- Linux: Follow the Docker Engine install guide
Verify Docker is running:
docker --versiongit checkout main
git pull origin main
git checkout -b my-otel-observabilityThe .NET Aspire Dashboard is a lightweight OpenTelemetry trace viewer that runs locally in Docker. It accepts OTLP data over gRPC and provides a browser-based UI for exploring traces, logs, and metrics.
Run the following command to start it:
docker run --rm -it -d \
-p 18888:18888 \
-p 4317:18889 \
--name aspire-dashboard \
mcr.microsoft.com/dotnet/aspire-dashboard:latestPort mapping:
| Host Port | Container Port | Purpose |
|---|---|---|
18888 |
18888 |
Aspire Dashboard web UI |
4317 |
18889 |
OTLP gRPC receiver (where the backend sends traces) |
Open the dashboard in your browser:
http://localhost:18888
The dashboard requires a login token on first access. Retrieve it from the container logs:
docker logs aspire-dashboardLook for a line like:
Login to the dashboard at http://localhost:18888/login?t=<YOUR_TOKEN>
Copy that full URL into your browser to authenticate.
Tip: The Aspire Dashboard will show "No resources found" until the backend starts sending traces. That's expected β we'll configure the backend in the next steps.
Good news:
agent-framework-corealready bundlesopentelemetry-api,opentelemetry-sdk, andopentelemetry-semantic-conventions-aias transitive dependencies. You only need to add the OTLP exporter (to send data to Aspire) and the FastAPI instrumentor (to auto-trace HTTP requests).
Open backend/requirements.txt and append the following lines at the end of the file:
# OpenTelemetry (api + sdk are already included via agent-framework-core)
opentelemetry-exporter-otlp-proto-grpc>=1.28.0
opentelemetry-instrumentation-fastapi>=0.49b0Then install the new dependencies:
cd backend
source .venv/bin/activate
pip install -r requirements.txtOr run the Backend: Install Python deps VS Code task from the Command Palette.
What these packages do:
| Package | Purpose |
|---|---|
opentelemetry-exporter-otlp-proto-grpc |
Exports spans to any OTLP-compatible receiver (Aspire, Jaeger, etc.) over gRPC |
opentelemetry-instrumentation-fastapi |
Auto-instruments FastAPI β creates spans for every incoming HTTP request automatically |
Agent Framework uses a combination of standard OpenTelemetry environment variables and Agent Frameworkβspecific environment variables to control observability. Add the following to your backend/.env file:
# ββ Agent Framework Observability ββββββββββββββββββββββββββββββββββββββββββ
# Activates Agent Framework's built-in instrumentation (spans for agent
# invocations, LLM chat calls, and tool executions).
ENABLE_INSTRUMENTATION=true
# Includes prompts, completions, function arguments and results in span
# attributes. See the WARNING below before enabling.
ENABLE_SENSITIVE_DATA=true
# ββ Standard OpenTelemetry βββββββββββββββββββββββββββββββββββββββββββββββββ
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_SERVICE_NAME=children-story-studio
β οΈ WARNING β Sensitive Data: WhenENABLE_SENSITIVE_DATA=true, Agent Framework records full prompt text, LLM responses, function call arguments, and function results as span attributes. This is extremely useful for debugging during development, but may expose personally identifiable information (PII), API keys embedded in prompts, or other confidential data in your trace viewer. Only enable this in development or test environments. SetENABLE_SENSITIVE_DATA=false(or remove the variable entirely) before deploying to any shared or production environment.
What these variables do:
| Variable | Default | Description |
|---|---|---|
ENABLE_INSTRUMENTATION |
false |
Activates Agent Framework's OpenTelemetry instrumentation code paths β without this, the framework will not emit invoke_agent, chat, or execute_tool spans |
ENABLE_SENSITIVE_DATA |
false |
When true, includes prompt/response content and function arguments in span attributes. Development only β see warning above |
OTEL_EXPORTER_OTLP_ENDPOINT |
(none) | The OTLP gRPC endpoint. Points to the Aspire Dashboard's receiver on port 4317 |
OTEL_SERVICE_NAME |
agent_framework |
The service name that appears in the trace viewer |
Next, open backend/app/config.py and add a single new field to the Settings class β a master switch that lets you disable OTEL entirely without removing environment variables:
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
model_config = SettingsConfigDict(
env_file=".env",
env_file_encoding="utf-8",
case_sensitive=False,
extra="ignore",
)
foundry_project_endpoint: str = ""
foundry_model_deployment_name: str = "gpt-4o"
foundry_image_model_deployment_name: str = "gpt-image-1"
# CORS origin for the React dev server
cors_origin: str = "http://localhost:5173"
# OpenTelemetry β master switch (set to False to disable without removing env vars)
otel_enabled: bool = True
settings = Settings()The only new field is otel_enabled. All other OTEL configuration is handled by the standard environment variables above, which Agent Framework's configure_otel_providers() reads automatically.
Create a new file at backend/app/telemetry.py with the following contents:
"""
telemetry.py β OpenTelemetry bootstrap for the story-generation backend.
Uses Agent Framework's built-in ``configure_otel_providers()`` to set up
the TracerProvider, exporters, and Agent Framework instrumentation from
environment variables. Also auto-instruments FastAPI so every incoming
HTTP request gets its own trace span automatically.
See: https://learn.microsoft.com/en-us/agent-framework/agents/observability
"""
import logging
from agent_framework.observability import configure_otel_providers
from fastapi import FastAPI
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from .config import settings
logger = logging.getLogger(__name__)
def configure_telemetry(app: FastAPI) -> None:
"""Set up OTEL providers via Agent Framework and instrument FastAPI.
This must be called **before** any agent-framework imports that create
workflows or executors, so the framework can pick up the active
TracerProvider and emit its own spans.
Does nothing if ``settings.otel_enabled`` is False.
"""
if not settings.otel_enabled:
logger.info("OpenTelemetry is disabled (OTEL_ENABLED=false)")
return
# 1. Configure OTEL providers β reads OTEL_EXPORTER_OTLP_ENDPOINT,
# OTEL_SERVICE_NAME, ENABLE_INSTRUMENTATION, and ENABLE_SENSITIVE_DATA
# from environment variables automatically.
configure_otel_providers()
# 2. Auto-instrument FastAPI (creates a parent span for every HTTP request).
# This is NOT covered by configure_otel_providers(), so we add it here.
FastAPIInstrumentor.instrument_app(app)
logger.info("OpenTelemetry configured via Agent Framework")What this does:
configure_otel_providers()β Agent Framework's built-in bootstrap function. It:- Reads
OTEL_EXPORTER_OTLP_ENDPOINTand creates an OTLP gRPC exporter targeting the Aspire Dashboard - Creates a
TracerProvider(plus log and metric providers) withOTEL_SERVICE_NAMEas the service resource - Reads
ENABLE_INSTRUMENTATIONβ whentrue, activates the framework's instrumentation code paths so it emitsinvoke_agent,chat, andexecute_toolspans automatically - Reads
ENABLE_SENSITIVE_DATAβ whentrue, includes prompt text, LLM responses, and function arguments/results as span attributes - Registers everything as the global OTEL providers
- Reads
FastAPIInstrumentor.instrument_app(app)β Wraps every FastAPI route handler to create a parent span for each HTTP request. This is separate from Agent Framework's instrumentation and must be added explicitly.
Now modify backend/app/main.py to call configure_telemetry() at startup.
Important β Import Ordering: The
story_workflowobject inworkflow.pyis a module-level singleton β it's created the momentworkflow.pyis imported. That import chain starts whenmain.pyimportsStoryGenerator(which importsworkflow.py). For Agent Framework to emit its own spans, the globalTracerProvidermust be active before the workflow is built. This means we need to configure telemetry before importingStoryGenerator.
Replace the contents of backend/app/main.py with:
"""
main.py β FastAPI application entry point.
Endpoints:
GET /api/health β health check
POST /api/generate-story β runs the story workflow; streams SSE progress events
"""
import logging
from dotenv import load_dotenv
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from sse_starlette.sse import EventSourceResponse
load_dotenv() # Agent Framework reads env vars directly β ensure .env is loaded early
from .config import settings # noqa: E402
from .models import StoryRequest # noqa: E402
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(levelname)-8s %(name)s β %(message)s",
)
logger = logging.getLogger(__name__)
# βββ App ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
app = FastAPI(
title="Children's Story Multi-Agent API",
description=(
"Multi-agent orchestration for generating illustrated children's stories "
"using Microsoft Agent Framework."
),
version="1.0.0",
)
app.add_middleware(
CORSMiddleware,
allow_origins=[settings.cors_origin, "http://localhost:5173", "http://localhost:5174"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# βββ Telemetry (must be configured BEFORE importing StoryGenerator) βββββββββββ
from .telemetry import configure_telemetry # noqa: E402
configure_telemetry(app)
# βββ Service instances ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
from .story_generator import StoryGenerator # noqa: E402
_story_generator = StoryGenerator()
# βββ Health check βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
@app.get("/api/health")
async def health() -> dict:
return {"status": "ok", "service": "children-story-multi-agent"}
# βββ Story generation (SSE) βββββββββββββββββββββββββββββββββββββββββββββββββββ
@app.post("/api/generate-story")
async def generate_story(request: StoryRequest) -> EventSourceResponse:
"""Accepts story parameters and streams back SSE events as the multi-agent
workflow progresses. The final event (type: 'complete') contains the
full illustrated StoryResponse.
"""
return _story_generator.event_source_response(request)What changed (compared to the original main.py):
load_dotenv()was added near the top of the file. Agent Framework'sconfigure_otel_providers()reads environment variables directly (viaos.environ), not through Pydantic settings β so we must ensure the.envfile is loaded into the process environment before calling it.- The
from .story_generator import StoryGeneratorimport was moved down β it now appears afterconfigure_telemetry(app)is called. - Two new lines were added between the CORS middleware and the service instances:
from .telemetry import configure_telemetry configure_telemetry(app)
- The
# noqa: E402comments suppress the linter warning about imports not being at the top of the file. This is intentional β the import ordering is required for correct OTEL initialization.
Everything else β the endpoints, CORS config, logging β is unchanged.
cd backend
source .venv/bin/activate
uvicorn app.main:app --reload --port 8000You should see a log line confirming telemetry is active:
OpenTelemetry configured via Agent Framework
cd frontend
npm run devOpen the app at http://localhost:5173, fill in the story form, and click Generate Story. Wait for the story to complete.
Open the Aspire Dashboard at http://localhost:18888 and navigate to the Traces tab.
You should see a trace for the POST /api/generate-story request. Click on it to open the trace waterfall view, which shows a timeline of all spans in the request.
When examining a trace in the Aspire Dashboard, look for these patterns:
The top-level span is the FastAPI HTTP request (POST /api/generate-story). Beneath it, you should see spans emitted by Agent Framework for the workflow and each agent. The framework uses OpenTelemetry GenAI Semantic Conventions for span naming:
| Span Name Pattern | What It Represents |
|---|---|
POST /api/generate-story |
The full HTTP request lifecycle (auto-instrumented by FastAPI) |
invoke_agent <agent_name> |
Each agent invocation β the top-level span for an agent's work within an executor |
chat <model_name> |
An LLM chat completion call. When ENABLE_SENSITIVE_DATA=true, the prompt and response text appear as span attributes |
execute_tool <function_name> |
A function tool execution (if your agents use tools). Includes arguments and results when sensitive data is enabled |
The waterfall view shows each executor's duration as a horizontal bar. You'll typically see:
- Orchestrator β Fast (single LLM call to create an outline)
- StoryArchitect β Moderate (one LLM call to write the full narrative)
- ArtDirector β Longest (LLM call for image prompts + parallel image generation calls)
- StoryReviewer β Moderate (one LLM call to review the draft)
- Decision β Near-instant (routing logic only)
If the StoryReviewer rejects a draft, you'll see the workflow loop back to Orchestrator. In the trace waterfall, this appears as repeated executor spans β a second pass through Orchestrator β StoryArchitect β ArtDirector β StoryReviewer β Decision. The revision count (max 2) is visible as repeated cycles.
Inside the ArtDirector executor span, look for multiple image generation spans running concurrently (up to 5 in parallel, controlled by the semaphore in the existing code). These will appear as overlapping horizontal bars in the waterfall.
If you set ENABLE_SENSITIVE_DATA=true, expand any chat span's attributes to see:
gen_ai.request.instructionsβ The system prompt sent to the LLMgen_ai.usage.input_tokens/gen_ai.usage.output_tokensβ Token counts for cost trackinggen_ai.response.idβ The LLM response identifier
This is invaluable for debugging prompt issues but remember to disable it before sharing traces or moving to a shared environment.
The Aspire Dashboard is great for local development. For production or shared environments, you can export the same OTEL traces to Azure Application Insights instead β with no changes to the agent or workflow code.
| Feature | Description |
|---|---|
| Application Map | Visual diagram showing the call flow between agents β automatically generated from trace data |
| Live Metrics | Real-time view of incoming requests, failures, and performance |
| Transaction Search | Search and filter across all traces by executor name, duration, status, etc. |
| KQL Querying | Write custom queries against trace data (e.g., "show me all stories where ArtDirector took > 30 seconds") |
| Smart Detection | Automatic alerts for anomalies in failure rates or response times |
| Long-Term Retention | Traces are retained for 90 days by default (configurable) β vs. Aspire which only holds data while the container is running |
Create an Application Insights resource in the Azure Portal (or via CLI). Copy the Connection String from the resource's Overview page.
In backend/requirements.txt, replace the OTLP gRPC exporter with the Azure Monitor package:
# OpenTelemetry (api + sdk are already included via agent-framework-core)
opentelemetry-instrumentation-fastapi>=0.49b0
azure-monitor-opentelemetry>=1.6.4Note: The
azure-monitor-opentelemetrymeta-package bundles the Azure Monitor exporter with auto-instrumentation for HTTP clients and more.
Install the updated dependencies:
pip install -r requirements.txtAdd the Application Insights connection string to your backend/.env:
APPLICATIONINSIGHTS_CONNECTION_STRING=InstrumentationKey=xxx;IngestionEndpoint=https://xxx.in.applicationinsights.azure.com/;...And add a corresponding field in backend/app/config.py inside the Settings class:
# Azure Application Insights (alternative to local Aspire Dashboard)
applicationinsights_connection_string: str = ""Modify backend/app/telemetry.py to use configure_azure_monitor along with Agent Framework's enable_instrumentation(). Replace the file contents with:
"""
telemetry.py β OpenTelemetry bootstrap (Application Insights variant).
Uses azure-monitor-opentelemetry to export traces to Application Insights,
then activates Agent Framework instrumentation.
See: https://learn.microsoft.com/en-us/agent-framework/agents/observability
"""
import logging
from azure.monitor.opentelemetry import configure_azure_monitor
from agent_framework.observability import create_resource, enable_instrumentation
from fastapi import FastAPI
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from .config import settings
logger = logging.getLogger(__name__)
def configure_telemetry(app: FastAPI) -> None:
if not settings.otel_enabled:
logger.info("OpenTelemetry is disabled (OTEL_ENABLED=false)")
return
if not settings.applicationinsights_connection_string:
logger.warning(
"APPLICATIONINSIGHTS_CONNECTION_STRING is not set β telemetry disabled"
)
return
# 1. Configure Azure Monitor with the Agent Framework resource
configure_azure_monitor(
connection_string=settings.applicationinsights_connection_string,
resource=create_resource(), # Uses OTEL_SERVICE_NAME from env
enable_live_metrics=True,
)
# 2. Activate Agent Framework instrumentation (reads ENABLE_SENSITIVE_DATA
# from env vars, or pass it explicitly here)
enable_instrumentation()
# 3. Auto-instrument FastAPI
FastAPIInstrumentor.instrument_app(app)
logger.info("OpenTelemetry configured β exporting to Application Insights")The key differences from the Aspire version are:
configure_azure_monitor()replacesconfigure_otel_providers()β it sets up the Azure Monitor exporter with the connection stringcreate_resource()β Agent Framework helper that builds an OTELResourceusingOTEL_SERVICE_NAMEand the framework versionenable_instrumentation()β Explicitly activates Agent Framework's instrumentation code paths (alternatively, this is handled by theENABLE_INSTRUMENTATIONenv var, but calling it explicitly ensures it's active regardless of env configuration)
main.py does not need any changes β the import ordering, load_dotenv(), and configure_telemetry(app) call stay the same.
Ensure Docker is running and the container is up:
docker ps --filter name=aspire-dashboardIf the container isn't listed, start it again with the docker run command from Step 1.
-
Check the backend logs β You should see
OpenTelemetry configured via Agent Framework. If you seeOpenTelemetry is disabled, check thatOTEL_ENABLEDis not set tofalsein your.env. -
Verify
load_dotenv()is called β Agent Framework'sconfigure_otel_providers()reads environment variables directly fromos.environ, not from Pydantic settings. Ifload_dotenv()is missing frommain.py, the.envvalues forOTEL_EXPORTER_OTLP_ENDPOINT,ENABLE_INSTRUMENTATION, etc. won't be available. -
Verify the OTLP endpoint β The exporter must be able to reach
localhost:4317, which Docker maps to the Aspire container's OTLP receiver on port 18889. If you changed the port mapping, updateOTEL_EXPORTER_OTLP_ENDPOINTaccordingly. -
Generate at least one story β Traces only appear after a request has been made. The health check endpoint (
GET /api/health) will also generate a span if you want a quick test.
This usually means one of two things:
-
ENABLE_INSTRUMENTATIONis not set totrueβ This is the most common cause. Without this environment variable, Agent Framework will not emitinvoke_agent,chat, orexecute_toolspans even if aTracerProvideris active. Check yourbackend/.envfile. -
Import ordering β The
TracerProvidermust be registered as the global provider before thestory_workflowsingleton is created. This happens at module import time inworkflow.py. Double-check that yourmain.pyfollows the import ordering from Step 5:load_dotenv()is called first (so env vars are available)configure_telemetry(app)is called secondfrom .story_generator import StoryGeneratorcomes after
If the StoryGenerator import is above the configure_telemetry() call, the workflow will be built before the provider is active.
Retrieve the token from the container logs:
docker logs aspire-dashboard 2>&1 | grep "login"Copy the full URL (including the ?t= parameter) into your browser.
When you're done, stop and remove the container:
docker stop aspire-dashboardThe --rm flag in the original docker run command ensures the container is automatically removed when stopped.
- Experiment with custom spans β add manual tracing to specific operations (e.g., JSON parsing, image prompt construction) using
tracer.start_as_current_span(). - Explore the Metrics and Structured Logs tabs in the Aspire Dashboard β OTEL supports all three signals, and the SDK can be extended to export metrics and logs alongside traces.
- Try the Application Insights alternative to see how the same traces look in a production-grade monitoring tool with Application Map, KQL queries, and alerting.
- If you haven't already, try the Activity Page Agents guide or Text-to-Speech guide to extend the application with new capabilities.