Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
203 changes: 203 additions & 0 deletions .agents/skills/llmobs-integration/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
---
name: llmobs-integration
description: |
This skill should be used when the user asks to "add LLMObs support", "create an LLMObs plugin",
"instrument an LLM library", "add LLM Observability", "add llmobs", "add llm observability",
"instrument chat completions", "instrument streaming", "instrument embeddings",
"instrument agent runs", "instrument orchestration", "instrument LLM",
"LLMObsPlugin", "LlmObsPlugin", "getLLMObsSpanRegisterOptions", "setLLMObsTags",
"tagLLMIO", "tagEmbeddingIO", "tagRetrievalIO", "tagTextIO", "tagMetrics", "tagMetadata",
"tagSpanTags", "tagPrompt", "LlmObsCategory", "LlmObsSpanKind",
"span kind llm", "span kind workflow", "span kind agent", "span kind embedding",
"span kind tool", "span kind retrieval",
"openai llmobs", "anthropic llmobs", "genai llmobs", "google llmobs",
"langchain llmobs", "langgraph llmobs", "ai-sdk llmobs",
"llm span", "llmobs span event", "model provider", "model name",
"CompositePlugin llmobs", "llmobs tracing", "VCR cassettes",
or needs to build, modify, or debug an LLMObs plugin for any LLM library in dd-trace-js.
---

# LLM Observability Integration Skill

## Purpose

This skill helps you create LLMObs plugins that instrument LLM library operations and emit proper span events for LLM observability in dd-trace-js. Supported operation types include:

- **Chat completions** — standard request/response LLM calls
- **Streaming chat completions** — streamed token-by-token responses
- **Embeddings** — vector embedding generation
- **Agent runs** — autonomous LLM agent execution loops
- **Orchestration** — multi-step workflow and graph execution (langgraph, etc.)
- **Tool calls** — tool/function invocations
- **Retrieval** — vector DB / RAG operations

## When to Use

- Creating a new LLMObs plugin for an LLM library
- Adding LLMObs support to an existing tracing integration
- Understanding LLMObsPlugin architecture and patterns
- Determining how to instrument a new LLM package

## Core Concepts

### 1. LLMObsPlugin Base Class

All LLMObs plugins extend the `LLMObsPlugin` base class, which provides the core instrumentation framework.

**Key responsibilities:**
- **Span registration**: Define span metadata (model provider, model name, span kind)
- **Tag extraction**: Extract and tag LLM-specific data (messages, metrics, metadata)
- **Context management**: Handle span lifecycle and parent context

**Required methods to implement:**
- `getLLMObsSpanRegisterOptions(ctx)` - Returns span registration options (modelProvider, modelName, kind, name)
- `setLLMObsTags(ctx)` - Extracts and tags LLM data (input/output messages, metrics, metadata)

**Plugin lifecycle:**
1. `start(ctx)` - Registers span with LLMObs, captures context
2. Operation executes (chat completion call)
3. `asyncEnd(ctx)` - Calls `setLLMObsTags()` to extract and tag data
4. `end(ctx)` - Restores parent context

See [references/plugin-architecture.md](references/plugin-architecture.md) for complete implementation details.

### 2. Package Category System

**CRITICAL:** Every integration must be classified into one category using the `LlmObsCategory` enum. This determines test strategy and implementation approach.

#### LlmObsCategory Enum Values

- **`LlmObsCategory.LLM_CLIENT`** - Direct API wrappers (openai, anthropic, genai)
- Signs: Makes HTTP calls to LLM provider endpoints, requires API keys
- Test strategy: VCR with real API calls via proxy
- Instrumentation: Hook chat/completion methods

- **`LlmObsCategory.MULTI_PROVIDER`** - Multi-provider frameworks (ai-sdk, langchain)
- Signs: Supports multiple LLM providers via configuration, wraps LLM_CLIENT libraries
- Test strategy: VCR with real API calls via proxy
- Instrumentation: Hook provider abstraction layer

- **`LlmObsCategory.ORCHESTRATION`** - Workflow managers (langgraph)
- Signs: Graph/workflow execution, state management, NO direct HTTP to LLM providers
- Test strategy: Pure function tests, NO VCR, NO real API calls
- Instrumentation: Hook workflow lifecycle (invoke, stream, run)
- **Special:** Tests should use actual LLM as orchestration node (not mock responses)

- **`LlmObsCategory.INFRASTRUCTURE`** - Protocols/servers (MCP)
- Signs: Protocol implementation, server/client architecture, transport layers
- Test strategy: Mock server tests
- Instrumentation: Hook protocol handlers

#### Decision Tree

Answer these questions by reading the code:

1. **Does the package make direct HTTP calls to LLM provider endpoints?**
- YES → Go to question 2
- NO → Go to question 3

2. **Does it support multiple LLM providers via configuration?**
- YES → **`LlmObsCategory.MULTI_PROVIDER`**
- NO → **`LlmObsCategory.LLM_CLIENT`**

3. **Does it implement workflow/graph orchestration with state management?**
- YES → **`LlmObsCategory.ORCHESTRATION`**
- NO → **`LlmObsCategory.INFRASTRUCTURE`**

See [references/category-detection.md](references/category-detection.md) for detailed heuristics and examples.

### 3. LLM Span Kinds

Use the `LlmObsSpanKind` enum:

- **`LlmObsSpanKind.LLM`** - Chat completions, text generation
- **`LlmObsSpanKind.WORKFLOW`** - Graph/chain execution
- **`LlmObsSpanKind.AGENT`** - Agent runs
- **`LlmObsSpanKind.TOOL`** - Tool/function calls
- **`LlmObsSpanKind.EMBEDDING`** - Embedding generation
- **`LlmObsSpanKind.RETRIEVAL`** - Vector DB/RAG retrieval

**Most common:** Use `'llm'` for chat completions/text generation in LLM_CLIENT and MULTI_PROVIDER categories.

### 4. Message Extraction

All plugins must convert provider-specific message formats to the standard format:

**Standard format:** `[{content: string, role: string}]`

**Common roles:** `'user'`, `'assistant'`, `'system'`, `'tool'`

**Provider-specific handling:**
- OpenAI: Direct format match, handle `function_call` and `tool_calls`
- Anthropic: Map `role` values, flatten nested content arrays
- Google GenAI: Extract from `parts` arrays, map role names
- Multi-provider: Detect provider and apply appropriate extraction

See [references/message-extraction.md](references/message-extraction.md) for provider-specific patterns.

## Implementation Steps

1. **Detect package category** (REQUIRED FIRST STEP)
- Follow decision tree above
- Output: category, confidence, reasoning

2. **Create plugin file**
- Location: `packages/dd-trace/src/llmobs/plugins/{integration}/index.js`
- Extend: `LLMObsPlugin` base class
- Implement: Required methods per plugin architecture

3. **Implement `getLLMObsSpanRegisterOptions(ctx)`**
- Extract model provider and name from context
- Determine span kind (usually `'llm'`)
- Return registration options object

4. **Implement `setLLMObsTags(ctx)`**
- Extract input messages from `ctx.arguments`
- Extract output messages from `ctx.result`
- Extract token metrics (input_tokens, output_tokens, total_tokens)
- Extract metadata (temperature, max_tokens, etc.)
- Tag span using `this._tagger` methods

5. **Handle edge cases**
- Streaming responses (if applicable)
- Error cases (empty output messages)
- Non-standard message formats
- Missing metadata

See [references/plugin-architecture.md](references/plugin-architecture.md) for step-by-step implementation guide.

## Common Patterns

Based on category:

- **LLM_CLIENT**: Messages in array, straightforward extraction from `result.choices[0]` or equivalent
- **MULTI_PROVIDER**: Handle multiple provider formats with provider detection logic
- **ORCHESTRATION**: May use `'workflow'` span kind instead of `'llm'`, focus on lifecycle events
- **INFRASTRUCTURE**: Protocol-specific instrumentation, may not have traditional messages

## Plugin Registration

All plugins must export an array:

**Static properties required:**
- `integration` - Integration name (e.g., 'openai')
- `id` - Unique plugin ID (e.g., 'llmobs_openai')
- `prefix` - Channel prefix (e.g., 'tracing:apm:openai:chat')

## References

For detailed information, see:

- [references/plugin-architecture.md](references/plugin-architecture.md) - Complete plugin structure, implementation steps, helper methods
- [references/category-detection.md](references/category-detection.md) - Package classification heuristics and detection process
- [references/message-extraction.md](references/message-extraction.md) - Provider-specific message format patterns
- [references/reference-implementations.md](references/reference-implementations.md) - Working plugin examples (Anthropic, Google GenAI)

## Key Principles

1. **Category determines approach** - Always detect category first using decision tree
2. **Use enum values** - Reference `LlmObsCategory` and `LlmObsSpanKind` enums from models
3. **Standard message format** - Always convert to `[{content, role}]` format
4. **Complete metadata** - Extract all available model parameters and token metrics
5. **Error handling** - Handle failures gracefully (empty messages on error)
6. **Test strategy follows category** - VCR for clients, pure functions for orchestration
180 changes: 180 additions & 0 deletions .agents/skills/llmobs-integration/references/category-detection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# Package Category Detection Reference

Detailed guide for classifying LLM packages into `LlmObsCategory` enum values.

## Categories Explained

### LlmObsCategory.LLM_CLIENT

**Definition:** Direct wrappers around LLM provider APIs.

**Examples:**
- `@google/generative-ai` - Google GenAI client (recommended reference implementation)
- `@anthropic-ai/sdk` - Anthropic Claude client (recommended reference implementation)
- `openai` - OpenAI API client

**Observable signs:**
- Package name contains provider name (openai, anthropic, genai, etc.)
- Has chat/completion/embedding methods (`chat.completions.create`, `messages.create`)
- Makes HTTP calls directly to LLM provider endpoints
- Requires API keys for authentication
- Has HTTP client dependencies (axios, fetch, request)
- Code contains HTTP request patterns

**Test strategy:** VCR with real API calls via proxy

**Enum value:** `LlmObsCategory.LLM_CLIENT`

### LlmObsCategory.MULTI_PROVIDER

**Definition:** Unified interfaces that abstract multiple LLM providers.

**Examples:**
- `@ai-sdk/vercel` - Vercel AI SDK
- `langchain` - LangChain framework

**Observable signs:**
- Package name suggests multi-provider (ai-sdk, langchain)
- Provider configuration and switching support
- Wraps multiple Category 1 libraries
- Dependencies include 2+ LLM provider SDKs
- Has abstraction layers over providers

**Test strategy:** VCR with real API calls via proxy

**Enum value:** `LlmObsCategory.MULTI_PROVIDER`

### LlmObsCategory.ORCHESTRATION

**Definition:** Workflow/graph managers that coordinate LLM calls but don't make them directly.

**Examples:**
- `@langchain/langgraph` - LangGraph workflow engine
- Workflow engines, agent coordinators

**Observable signs:**
- Package name suggests orchestration (langgraph, crew, workflow, graph)
- Has graph/workflow/chain execution methods (`invoke`, `stream`, `run`)
- Manages state and control flow between nodes/agents
- Dependencies include orchestration libraries (e.g., @langchain/core)
- Methods focus on state management, not API calls

**Test strategy:** Pure function tests, NO VCR, NO real API calls

**Enum value:** `LlmObsCategory.ORCHESTRATION`

### LlmObsCategory.INFRASTRUCTURE

**Definition:** Communication protocols, server frameworks, infrastructure layers.

**Examples:**
- MCP (Model Context Protocol) clients/servers
- Protocol implementations
- Transport layers

**Observable signs:**
- Package name suggests infrastructure (mcp, protocol, server, transport)
- Implements protocols or server/client architecture
- Transport layer code

**Test strategy:** Mock server tests

**Enum value:** `LlmObsCategory.INFRASTRUCTURE`

## Decision Tree

Follow this tree to determine category:

```
1. Does the package make direct HTTP calls to LLM provider endpoints?
├─ YES → Go to question 2
└─ NO → Go to question 3

2. Does it support multiple LLM providers via configuration?
├─ YES → LlmObsCategory.MULTI_PROVIDER
└─ NO → LlmObsCategory.LLM_CLIENT

3. Does it implement workflow/graph orchestration with state management?
├─ YES → LlmObsCategory.ORCHESTRATION
└─ NO → LlmObsCategory.INFRASTRUCTURE
```

## Detection Process

### Step 1: Read Package Name

Analyze package name for patterns:
- Contains "openai", "anthropic", "genai" → Likely `LlmObsCategory.LLM_CLIENT`
- Contains "langchain", "llamaindex", "ai-sdk" → Likely `LlmObsCategory.MULTI_PROVIDER`
- Contains "langgraph", "crew", "workflow" → Likely `LlmObsCategory.ORCHESTRATION`
- Contains "mcp", "protocol", "server" → Likely `LlmObsCategory.INFRASTRUCTURE`

### Step 2: Check package.json Dependencies

```bash
cat node_modules/{{package}}/package.json
```

Look for:
- HTTP clients (axios, fetch, got) → `LlmObsCategory.LLM_CLIENT`
- Multiple LLM SDKs (openai + anthropic + cohere) → `LlmObsCategory.MULTI_PROVIDER`
- LangChain/orchestration libs → `LlmObsCategory.ORCHESTRATION`
- Protocol/transport libs → `LlmObsCategory.INFRASTRUCTURE`

### Step 3: Check Exported Methods

```bash
node -e "console.log(Object.keys(require('{{package}}')))"
```

Method patterns:
- `chat()`, `complete()`, `embed()` → `LlmObsCategory.LLM_CLIENT` or `MULTI_PROVIDER`
- `invoke()`, `stream()`, `graph()`, `workflow()` → `LlmObsCategory.ORCHESTRATION`
- `connect()`, `listen()`, `handle()` → `LlmObsCategory.INFRASTRUCTURE`

### Step 4: Analyze Source Code

Check for:
- HTTP request patterns (`http.request`, `.post(`, `fetch(`) → `LlmObsCategory.LLM_CLIENT`
- Provider switching logic → `LlmObsCategory.MULTI_PROVIDER`
- State management, graph execution → `LlmObsCategory.ORCHESTRATION`
- Protocol implementation → `LlmObsCategory.INFRASTRUCTURE`

## Real-World Examples

### Example 1: Anthropic (LLM_CLIENT)

**Package:** `@anthropic-ai/sdk` — see `packages/datadog-plugin-anthropic/`

**Category:** `LlmObsCategory.LLM_CLIENT` — name contains "anthropic", direct HTTP calls to Claude API, requires API key, methods are `messages.create`

### Example 2: Google GenAI (LLM_CLIENT)

**Package:** `@google/generative-ai` — see `packages/datadog-plugin-google-genai/`

**Category:** `LlmObsCategory.LLM_CLIENT` — name contains "genai", direct HTTP calls to Gemini API, complex nested message format (contents/parts)

### Example 3: Vercel AI SDK (MULTI_PROVIDER)

**Package:** `ai` (Vercel AI SDK)

- Name contains "ai-sdk" → multi_provider
- Depends on openai + anthropic SDKs (multiple LLM providers)
- Methods include provider-agnostic chat interface

**Category:** `LlmObsCategory.MULTI_PROVIDER`

### Example 4: LangGraph (ORCHESTRATION)

**Package:** `@langchain/langgraph` — see `packages/dd-trace/src/llmobs/plugins/langgraph/`

**Category:** `LlmObsCategory.ORCHESTRATION` — name indicates graph orchestration, depends on `@langchain/core`, methods manage workflow state (`StateGraph.invoke`, `Pregel.stream`), no direct LLM HTTP calls

## Edge Cases

When signals conflict or are weak, choose the category with the most evidence and prefer the category that matches test strategy needs: if the package makes HTTP calls it needs VCR (LLM_CLIENT/MULTI_PROVIDER); if it doesn't, use pure functions (ORCHESTRATION) or mock servers (INFRASTRUCTURE).

Some packages don't fit cleanly:
- Utilities/helpers → Check what they instrument
- Plugins/extensions → Follow parent library category
- Hybrid packages → Categorize by primary function
Loading
Loading