Skip to content

Latest commit

 

History

History
481 lines (324 loc) · 22.1 KB

File metadata and controls

481 lines (324 loc) · 22.1 KB

Guide: OTEL Observability — AI Toolkit for VS Code

← Back to README | Prerequisites & Setup | Architecture Overview | Aspire Dashboard variant →

This step-by-step guide walks you through adding OpenTelemetry (OTEL) distributed tracing to the Children's Story Studio backend and viewing traces directly inside VS Code using the AI Toolkit extension. No Docker, no external dashboards — everything stays in your editor.

Prefer a browser-based dashboard? The Aspire Dashboard variant of this guide uses the .NET Aspire Dashboard (Docker) as the trace viewer instead.

Note: Unlike the Activity Page Agents and TTS guides, this guide does not use GitHub Copilot to generate the implementation. Every code change is provided directly — you'll copy the code, understand what it does, and wire it in yourself.


Table of Contents


What You'll Build

After completing this guide, every story generation request will emit distributed traces that flow through the entire multi-agent workflow:

  POST /api/generate-story          ← FastAPI auto-instrumented span
    │
    ├── Workflow: run                ← Agent Framework workflow span
    │   ├── Executor: orchestrator   ← Agent Framework executor spans
    │   │   └── LLM: chat/completions
    │   ├── Executor: story_architect
    │   │   └── LLM: chat/completions
    │   ├── Executor: art_director
    │   │   ├── LLM: chat/completions
    │   │   ├── Image: generate (page 1)   ← Parallel image generation
    │   │   ├── Image: generate (page 2)
    │   │   └── ...
    │   ├── Executor: story_reviewer
    │   │   └── LLM: chat/completions
    │   └── Executor: decision
    │
    └── Response streamed via SSE

You'll view these traces in the AI Toolkit's built-in trace viewer — right inside VS Code.


Why AI Toolkit?

Benefit Description
No Docker required Unlike the Aspire Dashboard, AI Toolkit runs entirely within VS Code — no containers to manage
Stay in your editor View traces, inspect LLM calls, and debug agents without context-switching to a browser
Agent-aware trace view AI Toolkit understands agent framework traces and presents them with agent-specific context
Prompt inspection Click into any LLM call span to see the full prompt and response — ideal for prompt engineering
Zero additional setup for tracing AI Toolkit includes a built-in OTLP receiver — just point your app at it

NOTE: If you do not want to implement OTEL observability in the app on your own, feel free to experiment with prompting GitHub Copilot to follow the documentation in this file to perform the implementation.

Before You Start

1. Complete Base Prerequisites

Ensure you've completed all steps in Prerequisites & Setup and that the base application is working (Running the Demo).

2. Verify AI Toolkit Extension

Confirm that the AI Toolkit for VS Code extension is installed and enabled:

  1. Open the Extensions panel (Cmd+Shift+X / Ctrl+Shift+X)
  2. Search for "AI Toolkit"
  3. Verify that AI Toolkit for Visual Studio Code (by Microsoft) shows as installed and enabled

If not installed, click Install now.

3. Create a Working Branch

git checkout main
git pull origin main
git checkout -b my-otel-observability

Step 1: Configure AI Toolkit for Tracing

AI Toolkit includes a built-in OTLP receiver that can accept OpenTelemetry traces from your application.

  1. Open the AI Toolkit panel in the VS Code sidebar (look for the AI Toolkit icon)
  2. Navigate to the Tracing section
  3. Start the OTLP trace receiver — AI Toolkit will display the endpoint URL and port it's listening on (typically http://localhost:4317)

Note: Take note of the endpoint URL displayed by AI Toolkit. You'll use this in Step 3 when configuring the OTEL_EXPORTER_OTLP_ENDPOINT environment variable. If AI Toolkit uses a different port than 4317, adjust accordingly.


Step 2: Add OTEL Python Packages

Good news: agent-framework-core already bundles opentelemetry-api, opentelemetry-sdk, and opentelemetry-semantic-conventions-ai as transitive dependencies. You only need to add the OTLP exporter (to send data to AI Toolkit) and the FastAPI instrumentor (to auto-trace HTTP requests).

Open backend/requirements.txt and append the following lines at the end of the file:

# OpenTelemetry (api + sdk are already included via agent-framework-core)
opentelemetry-exporter-otlp-proto-grpc>=1.28.0
opentelemetry-instrumentation-fastapi>=0.49b0

Then install the new dependencies:

cd backend
source .venv/bin/activate
pip install -r requirements.txt

Or run the Backend: Install Python deps VS Code task from the Command Palette.

What these packages do:

Package Purpose
opentelemetry-exporter-otlp-proto-grpc Exports spans to any OTLP-compatible receiver (AI Toolkit, Aspire, Jaeger, etc.) over gRPC
opentelemetry-instrumentation-fastapi Auto-instruments FastAPI — creates spans for every incoming HTTP request automatically

Step 3: Add OTEL Settings

Agent Framework uses a combination of standard OpenTelemetry environment variables and Agent Framework–specific environment variables to control observability. Add the following to your backend/.env file:

# ── Agent Framework Observability ──────────────────────────────────────────
# Activates Agent Framework's built-in instrumentation (spans for agent
# invocations, LLM chat calls, and tool executions).
ENABLE_INSTRUMENTATION=true

# Includes prompts, completions, function arguments and results in span
# attributes.  See the WARNING below before enabling.
ENABLE_SENSITIVE_DATA=true

# ── Standard OpenTelemetry ─────────────────────────────────────────────────
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_SERVICE_NAME=children-story-studio

Important: Verify the OTEL_EXPORTER_OTLP_ENDPOINT matches the endpoint shown by AI Toolkit's trace receiver (from Step 1). The default is http://localhost:4317.

⚠️ WARNING — Sensitive Data: When ENABLE_SENSITIVE_DATA=true, Agent Framework records full prompt text, LLM responses, function call arguments, and function results as span attributes. This is extremely useful for debugging during development, but may expose personally identifiable information (PII), API keys embedded in prompts, or other confidential data in your trace viewer. Only enable this in development or test environments. Set ENABLE_SENSITIVE_DATA=false (or remove the variable entirely) before deploying to any shared or production environment.

What these variables do:

Variable Default Description
ENABLE_INSTRUMENTATION false Activates Agent Framework's OpenTelemetry instrumentation code paths — without this, the framework will not emit invoke_agent, chat, or execute_tool spans
ENABLE_SENSITIVE_DATA false When true, includes prompt/response content and function arguments in span attributes. Development only — see warning above
OTEL_EXPORTER_OTLP_ENDPOINT (none) The OTLP gRPC endpoint. Points to AI Toolkit's built-in receiver on port 4317
OTEL_SERVICE_NAME agent_framework The service name that appears in the trace viewer

Next, open backend/app/config.py and add a single new field to the Settings class — a master switch that lets you disable OTEL entirely without removing environment variables:

from pydantic_settings import BaseSettings, SettingsConfigDict


class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        case_sensitive=False,
        extra="ignore",
    )

    foundry_project_endpoint: str = ""
    foundry_model_deployment_name: str = "gpt-4o"
    foundry_image_model_deployment_name: str = "gpt-image-1"

    # CORS origin for the React dev server
    cors_origin: str = "http://localhost:5173"

    # OpenTelemetry — master switch (set to False to disable without removing env vars)
    otel_enabled: bool = True


settings = Settings()

The only new field is otel_enabled. All other OTEL configuration is handled by the standard environment variables above, which Agent Framework's configure_otel_providers() reads automatically.


Step 4: Create the Telemetry Module

Create a new file at backend/app/telemetry.py with the following contents:

"""
telemetry.py — OpenTelemetry bootstrap for the story-generation backend.

Uses Agent Framework's built-in ``configure_otel_providers()`` to set up
the TracerProvider, exporters, and Agent Framework instrumentation from
environment variables.  Also auto-instruments FastAPI so every incoming
HTTP request gets its own trace span automatically.

See: https://learn.microsoft.com/en-us/agent-framework/agents/observability
"""

import logging

from agent_framework.observability import configure_otel_providers
from fastapi import FastAPI
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

from .config import settings

logger = logging.getLogger(__name__)


def configure_telemetry(app: FastAPI) -> None:
    """Set up OTEL providers via Agent Framework and instrument FastAPI.

    This must be called **before** any agent-framework imports that create
    workflows or executors, so the framework can pick up the active
    TracerProvider and emit its own spans.

    Does nothing if ``settings.otel_enabled`` is False.
    """
    if not settings.otel_enabled:
        logger.info("OpenTelemetry is disabled (OTEL_ENABLED=false)")
        return

    # 1. Configure OTEL providers — reads OTEL_EXPORTER_OTLP_ENDPOINT,
    #    OTEL_SERVICE_NAME, ENABLE_INSTRUMENTATION, and ENABLE_SENSITIVE_DATA
    #    from environment variables automatically.
    configure_otel_providers()

    # 2. Auto-instrument FastAPI (creates a parent span for every HTTP request).
    #    This is NOT covered by configure_otel_providers(), so we add it here.
    FastAPIInstrumentor.instrument_app(app)

    logger.info("OpenTelemetry configured via Agent Framework")

What this does:

  1. configure_otel_providers() — Agent Framework's built-in bootstrap function. It:
    • Reads OTEL_EXPORTER_OTLP_ENDPOINT and creates an OTLP gRPC exporter targeting AI Toolkit's receiver
    • Creates a TracerProvider (plus log and metric providers) with OTEL_SERVICE_NAME as the service resource
    • Reads ENABLE_INSTRUMENTATION — when true, activates the framework's instrumentation code paths so it emits invoke_agent, chat, and execute_tool spans automatically
    • Reads ENABLE_SENSITIVE_DATA — when true, includes prompt text, LLM responses, and function arguments/results as span attributes
    • Registers everything as the global OTEL providers
  2. FastAPIInstrumentor.instrument_app(app) — Wraps every FastAPI route handler to create a parent span for each HTTP request. This is separate from Agent Framework's instrumentation and must be added explicitly.

Step 5: Wire Telemetry into the App

Now modify backend/app/main.py to call configure_telemetry() at startup.

Important — Import Ordering: The story_workflow object in workflow.py is a module-level singleton — it's created the moment workflow.py is imported. That import chain starts when main.py imports StoryGenerator (which imports workflow.py). For Agent Framework to emit its own spans, the global TracerProvider must be active before the workflow is built. This means we need to configure telemetry before importing StoryGenerator.

Replace the contents of backend/app/main.py with:

"""
main.py — FastAPI application entry point.

Endpoints:
  GET  /api/health              — health check
  POST /api/generate-story      — runs the story workflow; streams SSE progress events
"""

import logging

from dotenv import load_dotenv
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from sse_starlette.sse import EventSourceResponse

load_dotenv()  # Agent Framework reads env vars directly — ensure .env is loaded early

from .config import settings  # noqa: E402
from .models import StoryRequest  # noqa: E402

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s  %(levelname)-8s  %(name)s — %(message)s",
)
logger = logging.getLogger(__name__)

# ─── App ──────────────────────────────────────────────────────────────────────

app = FastAPI(
    title="Children's Story Multi-Agent API",
    description=(
        "Multi-agent orchestration for generating illustrated children's stories "
        "using Microsoft Agent Framework."
    ),
    version="1.0.0",
)

app.add_middleware(
    CORSMiddleware,
    allow_origins=[settings.cors_origin, "http://localhost:5173", "http://localhost:5174"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# ─── Telemetry (must be configured BEFORE importing StoryGenerator) ───────────

from .telemetry import configure_telemetry  # noqa: E402

configure_telemetry(app)

# ─── Service instances ────────────────────────────────────────────────────────

from .story_generator import StoryGenerator  # noqa: E402

_story_generator = StoryGenerator()

# ─── Health check ─────────────────────────────────────────────────────────────


@app.get("/api/health")
async def health() -> dict:
    return {"status": "ok", "service": "children-story-multi-agent"}


# ─── Story generation (SSE) ───────────────────────────────────────────────────


@app.post("/api/generate-story")
async def generate_story(request: StoryRequest) -> EventSourceResponse:
    """Accepts story parameters and streams back SSE events as the multi-agent
    workflow progresses.  The final event (type: 'complete') contains the
    full illustrated StoryResponse.
    """
    return _story_generator.event_source_response(request)

What changed (compared to the original main.py):

  1. load_dotenv() was added near the top of the file. Agent Framework's configure_otel_providers() reads environment variables directly (via os.environ), not through Pydantic settings — so we must ensure the .env file is loaded into the process environment before calling it.
  2. The from .story_generator import StoryGenerator import was moved down — it now appears after configure_telemetry(app) is called.
  3. Two new lines were added between the CORS middleware and the service instances:
    from .telemetry import configure_telemetry
    configure_telemetry(app)
  4. The # noqa: E402 comments suppress the linter warning about imports not being at the top of the file. This is intentional — the import ordering is required for correct OTEL initialization.

Everything else — the endpoints, CORS config, logging — is unchanged.


Step 6: Generate a Story and View Traces

1. Start the AI Toolkit trace receiver

Open the AI Toolkit panel in VS Code and ensure the trace receiver is running (from Step 1).

2. Start the backend

cd backend
source .venv/bin/activate
uvicorn app.main:app --reload --port 8000

You should see a log line confirming telemetry is active:

OpenTelemetry configured via Agent Framework

3. Start the frontend

cd frontend
npm run dev

4. Generate a story

Open the app at http://localhost:5173, fill in the story form, and click Generate Story. Wait for the story to complete.

5. View traces in AI Toolkit

Switch back to VS Code and open the AI Toolkit panel. Navigate to the Tracing section — you should see the trace for the POST /api/generate-story request. Click on it to expand the trace waterfall and explore each agent's spans.


What to Look For

When examining a trace in AI Toolkit, look for these patterns:

Span Hierarchy

The top-level span is the FastAPI HTTP request (POST /api/generate-story). Beneath it, you should see spans emitted by Agent Framework for the workflow and each agent. The framework uses OpenTelemetry GenAI Semantic Conventions for span naming:

Span Name Pattern What It Represents
POST /api/generate-story The full HTTP request lifecycle (auto-instrumented by FastAPI)
invoke_agent <agent_name> Each agent invocation — the top-level span for an agent's work within an executor
chat <model_name> An LLM chat completion call. When ENABLE_SENSITIVE_DATA=true, the prompt and response text appear as span attributes
execute_tool <function_name> A function tool execution (if your agents use tools). Includes arguments and results when sensitive data is enabled

Per-Agent Latency

The trace view shows each executor's duration. You'll typically see:

  • Orchestrator — Fast (single LLM call to create an outline)
  • StoryArchitect — Moderate (one LLM call to write the full narrative)
  • ArtDirector — Longest (LLM call for image prompts + parallel image generation calls)
  • StoryReviewer — Moderate (one LLM call to review the draft)
  • Decision — Near-instant (routing logic only)

Prompt and Response Inspection

One of AI Toolkit's strengths is the ability to click into any chat span and see the full prompt and response directly in VS Code. With ENABLE_SENSITIVE_DATA=true, you can:

  • Inspect the system prompt sent to each agent
  • Review the LLM response for each agent invocation
  • See token counts (gen_ai.usage.input_tokens / gen_ai.usage.output_tokens) for cost tracking
  • Compare prompt/response pairs across revision loops to see how the story evolves

This makes AI Toolkit particularly useful for prompt engineering — you can iterate on prompts in code, regenerate a story, and immediately inspect the results without leaving your editor.

Revision Loops

If the StoryReviewer rejects a draft, you'll see the workflow loop back to Orchestrator. This appears as repeated executor spans — a second pass through Orchestrator → StoryArchitect → ArtDirector → StoryReviewer → Decision.

Parallel Image Generation

Inside the ArtDirector executor span, look for multiple image generation spans running concurrently (up to 5 in parallel, controlled by the semaphore in the existing code).


Troubleshooting

No traces appear in AI Toolkit

  1. Check that the trace receiver is running — Open the AI Toolkit panel and verify the OTLP receiver is active and listening.

  2. Check the backend logs — You should see OpenTelemetry configured via Agent Framework. If you see OpenTelemetry is disabled, check that OTEL_ENABLED is not set to false in your .env.

  3. Verify load_dotenv() is called — Agent Framework's configure_otel_providers() reads environment variables directly from os.environ, not from Pydantic settings. If load_dotenv() is missing from main.py, the .env values for OTEL_EXPORTER_OTLP_ENDPOINT, ENABLE_INSTRUMENTATION, etc. won't be available.

  4. Verify the OTLP endpoint matches — The OTEL_EXPORTER_OTLP_ENDPOINT in your .env must match the port AI Toolkit's receiver is listening on. The default is http://localhost:4317.

  5. Generate at least one story — Traces only appear after a request has been made. The health check endpoint (GET /api/health) will also generate a span if you want a quick test.

FastAPI spans appear but no Agent Framework spans

  1. ENABLE_INSTRUMENTATION is not set to true — Without this environment variable, Agent Framework will not emit invoke_agent, chat, or execute_tool spans even if a TracerProvider is active. Check your backend/.env file.

  2. Import ordering — The TracerProvider must be registered as the global provider before the story_workflow singleton is created. Double-check that your main.py follows the import ordering from Step 5.


Next Steps

  • Try the Aspire Dashboard variant to see how the same traces look in a browser-based waterfall view — useful for demos and screen sharing.
  • Experiment with custom spans — add manual tracing to specific operations using tracer.start_as_current_span().
  • If you haven't already, try the Activity Page Agents guide or Text-to-Speech guide to extend the application with new capabilities.

← Back to README