Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,15 @@ This installs the `opik` skill into your local skills environment.
| --- | --- |
| Tracing | Python decorators, TypeScript client tracing, REST API tracing, span types |
| Integrations | OpenAI, Anthropic, LangChain, CrewAI, DSPy, Google ADK, Vercel AI SDK, and more |
| Agent Config | `opik.Config`, `get_or_create_config()`, environments, prompt fields |
| Prompt Library | `client.create_prompt` / `client.get_prompt` (and chat variants), versioning, `metadata` for model/temperature |
| Local Runner | `opik connect`, pairing flow, entrypoint requirements, troubleshooting |
| Evaluation | Test Suites, `run_tests()`, assertions, execution policies, CI gating |
| Conversations | `thread_id`, conversation metrics, common pitfalls |
| Observability | trace boundaries, metadata, feedback scores, distributed tracing |

## One-time Opik setup

These skills are designed to help your coding agent fully integrate Opik into your project, including tracing, evaluations, `AgentConfig`, threads, and `opik connect`.
These skills are designed to help your coding agent fully integrate Opik into your project, including tracing, evaluations, the Prompt Library, threads, and `opik connect`.

Before using them, authenticate Opik once in the environment where your agent will work:

Expand Down
93 changes: 85 additions & 8 deletions skills/instrument/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Find:
- **Entrypoint**: the top-level function that kicks off the agent (e.g., `main`, `run`, `agent`, `handle_message`, a route handler, or whatever the user's main orchestration function is)
- **LLM call sites**: functions that call an LLM provider directly
- **Tool functions**: retrieval, search, API calls, or other tool-like operations
- **Existing config classes**: dataclasses, Pydantic models, or plain classes holding model names, temperatures, prompts, or other tunable parameters
- **Prompts and prompt-related config**: hardcoded prompt strings, system messages, message templates, and any associated model/temperature values — note these as candidates for the Prompt Library (`client.get_prompt` / `client.get_chat_prompt` with `metadata` for model config)

### Entrypoint Parameter Rules

Expand Down Expand Up @@ -134,6 +134,8 @@ import { OpikExporter } from "opik-vercel";

## Step 5 — Add `@opik.track` Decorators (Python) or Client Tracing (TypeScript)

This step adds the tracing scaffolding that the prompt migration in Step 6 relies on. Add decorators first so that the `get_prompt` / `get_chat_prompt` calls introduced next will land inside `@opik.track`-decorated functions.

### Python

Add `import opik` at the top of each file you instrument.
Expand All @@ -147,10 +149,10 @@ Add `import opik` at the top of each file you instrument.
| Other helper in the call chain | `@opik.track` |

- **Entrypoint parameters must be primitives only** (`str`, `int`, `float`, `bool`, `list`, `dict`). If the natural entrypoint takes a complex type, create a wrapper — see Step 3 "Entrypoint Parameter Rules".
- **Config access must happen inside `@opik.track`**: Any call to `client.get_or_create_config()` and subsequent access of config fields must occur inside a `@opik.track`-decorated function, or in a function called downstream from one. This is how Opik injects config metadata into the current trace. Calling it at module level or outside the traced call stack will raise an error.
- Place the decorator **above** any existing decorators (e.g., above `@app.route`)
- For async functions, `@opik.track` works the same way — no changes needed
- If the function is a **script entrypoint** (not a long-running server), add `opik.flush_tracker()` after the top-level call
- **`client.get_prompt()` / `client.get_chat_prompt()` must be called inside a `@opik.track`-decorated function** — this links the fetched prompt version to the trace so it appears in the Traces view. Fetching at module level works but the prompt won't be visible in traces.

### TypeScript

Expand Down Expand Up @@ -180,7 +182,80 @@ const myAgent = track(
);
```

## Step 6 — Conversational Agents: Add `thread_id`
## Step 6 — Migrate Prompts to the Prompt Library

For every prompt found in Step 3, replace the hardcoded value with a `get_prompt` / `get_chat_prompt` call inside the enclosing `@opik.track`-decorated function added in Step 5.

**Classify each prompt:**
- Single string (system prompt, instruction, template) → `create_prompt` / `get_prompt`
- List of `{"role", "content"}` messages → `create_chat_prompt` / `get_chat_prompt`

**Include model name, temperature, and any other call-level parameters in `metadata`** so they version together with the prompt template and can be updated from the Opik UI without a code change.

`get_prompt` / `get_chat_prompt` returns `None` if the prompt doesn't exist yet — check for `None` and create on first run so the same code handles both initial setup and subsequent runs.

**Python:**

```python
opik_client = opik.Opik()

@opik.track(entrypoint=True, project_name="<project-name>")
def run_agent(question: str) -> str:
prompt = opik_client.get_prompt(name="<prompt-name>")
if prompt is None:
prompt = opik_client.create_prompt(
name="<prompt-name>",
prompt="<original hardcoded prompt text>",
metadata={"model": "<model>", "temperature": <value>},
)
system_message = prompt.format() # pass template vars if any: prompt.format(var=value)
return llm_call(
model=prompt.metadata["model"],
temperature=prompt.metadata["temperature"],
system_prompt=system_message,
question=question,
)
```

For multi-turn message lists:

```python
chat_prompt = opik_client.get_chat_prompt(name="<prompt-name>")
if chat_prompt is None:
chat_prompt = opik_client.create_chat_prompt(
name="<prompt-name>",
messages=[...], # original hardcoded messages list
metadata={"model": "<model>", "temperature": <value>},
)
messages = chat_prompt.format() # pass template vars if any
return llm_call(
model=chat_prompt.metadata["model"],
temperature=chat_prompt.metadata["temperature"],
messages=messages,
)
```

**TypeScript:**

```typescript
const opikClient = new Opik({ projectName: "<project-name>" });

const runAgent = track({ entrypoint: true, projectName: "<project-name>" }, async (question: string) => {
let prompt = await opikClient.getPrompt({ name: "<prompt-name>" });
if (prompt === null) {
prompt = await opikClient.createPrompt({
name: "<prompt-name>",
prompt: "<original hardcoded prompt text>",
metadata: { model: "<model>", temperature: <value> },
});
}
const systemMessage = prompt.format(); // pass template vars if any
const { model, temperature } = prompt.metadata as { model: string; temperature: number };
return llmCall({ model, temperature, systemMessage, question });
});
```

## Step 7 — Conversational Agents: Add `thread_id`

If the agent handles multi-turn conversations (chat bots, support agents, multi-step assistants), wire `thread_id`:

Expand All @@ -193,7 +268,7 @@ def handle_message(session_id: str, message: str) -> str:

Skip this for single-shot agents or batch processing.

## Step 7 — Environment Config
## Step 8 — Environment Config

Follow the setup decision tree from the main opik skill:

Expand All @@ -217,7 +292,7 @@ The URL suffix depends on where Opik is hosted:
- **Self-hosted** (typically `localhost` or an internal hostname): append only `/api` — no `/opik` prefix
- When writing or suggesting an `OPIK_URL_OVERRIDE` value, apply this rule so users don't have to remember it

## Step 8 — Install Dependencies
## Step 9 — Install Dependencies

Print the install command but do NOT run it automatically. Let the user decide.

Expand All @@ -233,25 +308,27 @@ npm install opik
```
Plus framework-specific packages: `opik-openai`, `opik-vercel`, `opik-langchain`, `opik-gemini` as needed.

## Step 9 — Verify
## Step 10 — Verify

After instrumentation, do a quick audit:

- [ ] Every LLM call site is traced (via integration wrapper or `@opik.track`)
- [ ] Exactly one function has `entrypoint=True`
- [ ] The entrypoint function accepts only primitive parameters (`str`, `int`, `float`, `bool`, `list`, `dict`) — no Pydantic models, dataclasses, or custom classes
- [ ] All `get_or_create_config()` calls and config field access happen inside `@opik.track`-decorated functions (or downstream from one)
- [ ] Script entrypoints call `opik.flush_tracker()` (Python) or `await client.flush()` (TypeScript)
- [ ] LiteLLM calls inside `@opik.track` pass `current_span_data` via metadata
- [ ] No hardcoded API keys were introduced
- [ ] Existing tests still import correctly (no circular imports introduced)
- [ ] No deprecated `opik.Prompt` / `opik.ChatPrompt` / `opik.Config` usage introduced — use the Prompt library instead
- [ ] All `client.get_prompt()` / `client.get_chat_prompt()` calls are inside `@opik.track`-decorated functions — prompt version will not appear in traces otherwise

## Anti-Patterns to Avoid

- **Double-wrapping**: Don't add `@opik.track(type="llm")` to a function that already uses a framework integration (e.g., `track_openai`). The integration handles tracing.
- **Orphaned LiteLLM traces**: Always pass `current_span_data` when `OpikLogger` is used inside `@opik.track` code.
- **Complex entrypoint parameters**: The entrypoint function must only accept primitives (`str`, `int`, `float`, `bool`, `list`, `dict`). Pydantic models, dataclasses, or custom classes can't be typed into a UI input field. If the natural entrypoint takes a complex type, create a thin wrapper that accepts primitives.
- **Config access outside `@opik.track`**: `get_or_create_config()` and config field reads must happen inside a `@opik.track`-decorated function or downstream from one. Module-level or untraced calls will fail and won't attach config metadata to the trace.
- **Using deprecated `opik.Prompt` / `opik.ChatPrompt` / `opik.Config`**: These have been retired. Use `client.get_prompt()` / `client.get_chat_prompt()` from the Prompt library instead.
- **Fetching prompts outside `@opik.track`**: `client.get_prompt()` / `client.get_chat_prompt()` must be called inside a `@opik.track`-decorated function. Fetching at module level works functionally but the prompt version won't be linked to the trace and won't appear in the Traces view.
- **Missing entrypoint**: Without `entrypoint=True`, Local Runner (`opik connect`) won't discover the agent.
- **Missing flush**: Scripts that exit without flushing lose trace data.
- **Overwriting config**: Check before writing to `.env` or `~/.opik.config`.
Expand Down
115 changes: 71 additions & 44 deletions skills/opik/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@
---
name: opik
description: Opik observability for LLM agents — Agent Configuration, Local Runner (opik connect), Test Suites, threads, integrations. Use for "configure my agent", "connect my agent", "evaluate my agent" or "integrate with Opik".
description: Opik observability for LLM agents — Prompt Library, Local Runner (opik connect), Test Suites, threads, integrations. Use for "manage my prompts", "connect my agent", "evaluate my agent" or "integrate with Opik".
---

# Opik — Observability for LLM Agents

Integrating with Opik always means adding all three components unless the user explicitly asks for only one:
Integrating with Opik always means adding both components unless the user explicitly asks for only one:

1. **Tracing** — instrument LLM calls with the appropriate integration or `@opik.track`
2. **Entrypoint** — mark the top-level function with `entrypoint=True` for Local Runner and UI integration
3. **Agent Configuration** — externalize all tunable parameters into `opik.Config`: model names, temperatures, top_p, max_tokens, all prompts and prompt templates, and any other runtime parameters the user may want to compare or optimize

## Setup

Expand Down Expand Up @@ -218,59 +217,88 @@ Use for chat agents, support bots, multi-step assistants. Skip for single-shot a

**Pitfalls:** Missing `thread_id` → turns appear as unrelated traces. Shared `thread_id` across users → conversations get mixed.

## Agent Configuration
## Prompt Library

Externalize the parts of your agent you expect to tune over time into versioned, immutable config snapshots. This includes prompts, models, temperatures, token limits, and other runtime parameters you may want to compare, optimize, or roll out gradually.
Manage versioned prompts through the `opik.Opik` client. Use `create_prompt` / `get_prompt` for string-based prompts and `create_chat_prompt` / `get_chat_prompt` for multi-turn chat templates. Use `{{variable}}` syntax in prompt text for template variables rendered at call time via `.format()`.

**CRITICAL — Search for existing config classes first.** Before creating a new config, search the codebase for existing classes that hold tunable parameters (model names, temperatures, prompts, token limits, etc.). Look for names like `AgentConfig`, `Config`, `Settings`, `AgentSettings`, `ModelConfig`, or any `@dataclass`/Pydantic model with fields like `model`, `temperature`, `system_prompt`, `max_tokens`. **An existing config class is a migration target, not a reason to skip this step.** If found, convert it to inherit from `opik.Config`:
**Storing model config alongside the prompt.** Model names, temperatures, and other parameters that you want to version together with the prompt text go in the `metadata` dict on the prompt. They are stored at the prompt version level, so when you fetch a prompt you get both the template and its associated config from `prompt.metadata`.
Comment thread
petrotiurin marked this conversation as resolved.

1. Replace the existing base (`@dataclass`, `BaseModel`, plain class) with `opik.Config`
2. Convert plain `str` prompt fields to `opik.Prompt`
3. Wire up `get_or_create_config()` inside the entrypoint
4. Update all call sites that reference the old config to use the new Opik-managed config
**CRITICAL — call `get_prompt` / `get_chat_prompt` inside a `@opik.track`-decorated function.** This is what links the fetched prompt version to the trace, making it visible in the Traces view in the Opik UI. Fetching at module level works but the prompt will not appear in traces.

**Python:**

```python
import opik

class AgentConfig(opik.Config):
model: str
temperature: float
system_prompt: opik.Prompt

DEFAULT_CONFIG = AgentConfig(
model="gpt-4o",
temperature=0.7,
system_prompt=opik.Prompt(
name="agent-system-prompt",
project_name="my-agent",
prompt="You are a helpful assistant for {{product}}.",
),
)

client = opik.Opik()

@opik.track(entrypoint=True, project_name="my-agent")
def run_agent(question: str) -> str:
cfg = client.get_or_create_config(
fallback=DEFAULT_CONFIG,
project_name="my-agent",
# optional: env="staging" | version="v1" | version="latest" (default: prod)
)
# Fetch inside @track so the prompt version is recorded in the trace
prompt = client.get_prompt(name="agent-system-prompt")
if prompt is None:
prompt = client.create_prompt(
name="agent-system-prompt",
prompt="You are a helpful assistant for {{product}}.",
metadata={"model": "gpt-4o", "temperature": 0.7, "max_tokens": 1024},
)
system_message = prompt.format(product="Opik")
return llm_call(
model=cfg.model,
temperature=cfg.temperature,
system_prompt=cfg.system_prompt.format(product="Opik"),
model=prompt.metadata["model"],
temperature=prompt.metadata["temperature"],
max_tokens=prompt.metadata["max_tokens"],
system_prompt=system_message,
question=question,
)
```

- `get_or_create_config()` **must** be inside `@opik.track` — raises error otherwise
- On first call with no existing config, auto-creates from `fallback` and returns it
- On backend failure, returns `fallback` with `is_fallback=True` (never breaks the agent)
- Deploy to environment: `client.set_config_env(version="v1", env="prod")` — admin/ops only
- Prompt fields: use `opik.Prompt` for string-based templates, `opik.ChatPrompt` for multi-turn message templates; `project_name` is required on both and must match the `project_name` in `@opik.track` and `get_or_create_config`
- **Extract:** model, temperature, top_p, max_tokens, system prompt, tunable params
- **Don't extract:** API keys, structural logic, true constants
For a multi-turn chat template:

```python
@opik.track(entrypoint=True, project_name="my-agent")
def run_agent(task: str) -> str:
chat_prompt = client.get_chat_prompt(name="agent-chat-template")
if chat_prompt is None:
chat_prompt = client.create_chat_prompt(
name="agent-chat-template",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Help me with {{task}}"},
],
metadata={"model": "gpt-4o", "temperature": 0.7},
)
messages = chat_prompt.format(task=task)
return llm_call(
model=chat_prompt.metadata["model"],
temperature=chat_prompt.metadata["temperature"],
messages=messages,
)
```

**TypeScript:**

```typescript
import { Opik, track } from "opik";

const client = new Opik({ projectName: "my-agent" });

const runAgent = track({ entrypoint: true, projectName: "my-agent" }, async (question: string) => {
// Fetch inside track() so the prompt version is recorded in the trace
let prompt = await client.getPrompt({ name: "agent-system-prompt" });
if (prompt === null) {
prompt = await client.createPrompt({
name: "agent-system-prompt",
prompt: "You are a helpful assistant for {{product}}.",
metadata: { model: "gpt-4o", temperature: 0.7, maxTokens: 1024 },
});
}
const systemMessage = prompt.format({ product: "Opik" });
const { model, temperature, maxTokens } = prompt.metadata as { model: string; temperature: number; maxTokens: number };
Comment thread
petrotiurin marked this conversation as resolved.
return llmCall({ model, temperature, maxTokens, systemMessage, question });
});
```

After the initial run the prompt is registered in the library and can be edited, versioned, and have its metadata updated from the Opik UI. `get_prompt` / `get_chat_prompt` always returns the latest published version, including its metadata.

## Local Runner (opik connect)

Expand All @@ -293,18 +321,17 @@ After pairing: entrypoint registered as agent, UI shows input form, jobs from UI
| No entrypoint found | Add `entrypoint=True` (Python) or `entrypoint: true` (TS) |
| Invalid pair code | Codes expire — get a new one |
| Connection refused | Check Opik server (OSS) or API key (Cloud) |
| `get_or_create_config` fails saying some fields reference the wrong project | The `project_name` on one or more `opik.Prompt` / `opik.ChatPrompt` fields doesn't match the `project_name` passed to `get_or_create_config` — make them consistent |


## Anti-Patterns

| Anti-Pattern | Fix |
|-------------|-----|
| Existing config class left unconverted (e.g., `@dataclass` with model/temperature/prompt fields) | Convert to `opik.Config` subclass — an existing config is a migration target, not a skip signal |
| Hardcoded config | Use `opik.Config` + `get_or_create_config()` |
| Using deprecated `opik.Prompt` / `opik.ChatPrompt` / `opik.Config` | Migrate to `client.get_prompt()` / `client.get_chat_prompt()` from the Prompt library |
| Storing model/temperature in a separate config object | Put them in `metadata` on the prompt — they version together with the template and are read via `prompt.metadata["model"]` etc. |
| Fetching prompt outside `@opik.track` | Prompt won't appear in traces — fetch inside the decorated function |
| Missing entrypoint | Add `entrypoint=True` for Local Runner |
| No thread_id on conversational agent | Wire `thread_id` from session ID |
| `get_or_create_config()` outside `@track` | Must be inside decorated function |
| TS missing `params` | Add explicit `params` array |
| Missing `flush_tracker()` in scripts | Call before exit |

Expand Down
Loading