feat: add LangfuseProvider for remote trace evaluation by afarntrog · Pull Request #144 · strands-agents/evals

afarntrog · 2026-02-23T19:08:30Z

Description

Summary

Adds LangfuseProvider, the first concrete implementation of the TraceProvider interface, enabling evaluation of Strands agent traces stored in Langfuse. This lets users run the evals pipeline against production agent sessions without needing local in-memory trace data.

Motivation

The existing eval framework requires traces to be available in-process. Teams using Langfuse as their observability backend had no way to feed those traces into the evaluation pipeline. This PR closes that gap by fetching, paginating, and converting Langfuse observations into the typed span model (InferenceSpan, ToolExecutionSpan, AgentInvocationSpan) that evaluators already consume.

What changed

New: LangfuseProvider (src/strands_evals/providers/langfuse_provider.py — 620 lines)

Implements TraceProvider.get_evaluation_data(session_id) → TaskOutput
Fetches traces by session ID with automatic pagination (_fetch_all_pages)
Retries on ReadTimeout with exponential backoff via tenacity (_call_with_retry)
Converts Langfuse observations to evals span types:
- generation → InferenceSpan (with full message parsing: user/assistant, tool calls, tool results)
- span with name Tool: * → ToolExecutionSpan
- span with name Agent * → AgentInvocationSpan
Extracts final agent output from the last trace's last inference span
Configurable timeout, credentials via constructor args or env vars (LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST)

Packaging

Added langfuse optional dependency group (langfuse>=2.0.0) in pyproject.toml
Added langfuse to the hatch test environment features

Cleanup

Removed redundant pass statements from exception classes
Fixed missing newline at end of __init__.py

How to try it

from strands_evals.providers import LangfuseProvider

provider = LangfuseProvider(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
)
task_output = provider.get_evaluation_data(session_id="your-session-id")
# task_output.output  → final agent response string
# task_output.trajectory  → Session with typed traces/spans

Related Issues

#98

Documentation PR

Type of Change

New feature

Testing

29 unit tests covering all conversion paths, pagination, retry logic, error handling, and edge cases
11 integration tests (tests_integ/test_langfuse_provider.py) that hit a real Langfuse instance
I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Introduce an abstract TraceProvider base class for retrieving agent trace data from observability backends for evaluation. This includes: - TraceProvider ABC with get_session, list_sessions, and get_session_by_trace_id methods - SessionFilter dataclass for filtering session discovery - Custom error hierarchy (TraceProviderError, SessionNotFoundError, TraceNotFoundError, ProviderError) - Session and Trace data types with span tree construction and convenience accessors (input/output messages, token usage, duration) - New providers module exposed at package level - Comprehensive unit tests for providers and trace types

Add abstract TraceProvider that retrieves agent trace data from observability backends and returns Session/Trace types the evals system already consumes. - TraceProvider ABC with get_session() (required), list_sessions() and get_session_by_trace_id() (optional, raise NotImplementedError) - SessionFilter dataclass for time-range and limit-based discovery - Exception hierarchy: TraceProviderError base with SessionNotFoundError, TraceNotFoundError, ProviderError - Export providers module from strands_evals package

Implement LangfuseProvider that fetches agent traces from Langfuse and converts them to Session objects for the evals pipeline. Supports session-level and trace-level retrieval with paginated API calls. - get_evaluation_data(): fetch traces by session ID, convert Langfuse observations to typed spans (InferenceSpan, ToolExecutionSpan, AgentInvocationSpan), extract output from last agent invocation - list_sessions(): paginated session discovery with time-range filtering - get_evaluation_data_by_trace_id(): single trace retrieval - Host resolution: explicit param > LANGFUSE_HOST env var > cloud default - 30 unit tests (mocked SDK), 15 integration tests (real Langfuse + evaluators)

- Add `langfuse` optional dependency group in pyproject.toml - Implement session limit filtering in `_fetch_sessions` to stop yielding after reaching the configured limit - Improve type annotations from bare `list` to `list[Any]` - Standardize log messages to structured key=value format - Fix missing newline at end of file

- Extract `_fetch_all_pages` helper to deduplicate pagination logic - Replace if/elif observation dispatch with a dispatch table pattern - Extract helper methods for message building and output parsing - Simplify exception handling using base `TraceProviderError` class - Remove redundant `pass` statements from exception classes - Clean up imports to use shorter relative paths

…to trace_provider_langfuse

- Lazy-load LangfuseProvider via module-level __getattr__ to defer langfuse import until actually needed - Add tenacity-based retry with exponential backoff on ReadTimeout - Introduce configurable timeout parameter (default 120s) passed via request_options to all Langfuse API calls - Remove try/except ImportError guard in favor of lazy loading

Remove the `list_sessions` method and its associated tests, along with the `SessionFilter` import and related test helpers. This simplifies the LangfuseProvider by dropping unused session listing functionality.

…e provider - Remove unused get_evaluation_data_by_trace_id method and TraceNotFoundError import - Rename shadowed local variables in _parse_message for clarity - Reformat multi-line constructor calls for ToolCallContent and ToolResultContent - Update tests to reflect removal of trace-by-id functionality - Add missing newline at end of __init__.py

src/strands_evals/providers/__init__.py

pyproject.toml

src/strands_evals/providers/langfuse_provider.py

mkmeral · 2026-02-23T22:29:02Z

/strands review

Expand docstrings across the Langfuse provider to document observation field mappings, routing logic, data formats, and conversion behavior. This improves developer understanding of how Langfuse traces and observations are transformed into typed eval spans and messages.

- Add lazy __getattr__ import for LangfuseProvider to avoid import errors when the optional langfuse package is not installed - Add comprehensive docstring to LangfuseProvider.__init__ describing credential resolution, usage examples, and parameters - Cap langfuse dependency to <3 to prevent breaking changes

Include "langfuse" in the hatch static analysis environment features alongside "otel" to enable type checking and linting of langfuse-related code.

pyproject.toml

src/strands_evals/providers/langfuse_provider.py

afarntrog added 12 commits February 9, 2026 14:18

Merge branch 'main' into trace_provider

47261a6

Merge remote-tracking branch forked_origin/trace_provider_langfuse in…

35019c4

…to trace_provider_langfuse

Merge branch main into trace_provider_langfuse

ebb49a6

refactor: remove list_sessions method from LangfuseProvider

7bffb88

Remove the `list_sessions` method and its associated tests, along with the `SessionFilter` import and related test helpers. This simplifies the LangfuseProvider by dropping unused session listing functionality.

afarntrog changed the title ~~Trace provider langfuse~~ feat: langfuse Trace provider Feb 23, 2026

afarntrog temporarily deployed to auto-approve February 23, 2026 19:09 — with GitHub Actions Inactive

afarntrog changed the title ~~feat: langfuse Trace provider~~ feat: add LangfuseProvider for remote trace evaluation Feb 23, 2026

mkmeral reviewed Feb 23, 2026

View reviewed changes

src/strands_evals/providers/__init__.py Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

src/strands_evals/providers/langfuse_provider.py Show resolved Hide resolved

github-actions bot added strands-running and removed strands-running labels Feb 23, 2026

afarntrog temporarily deployed to auto-approve February 25, 2026 13:45 — with GitHub Actions Inactive

afarntrog temporarily deployed to auto-approve February 25, 2026 14:00 — with GitHub Actions Inactive

afarntrog changed the title ~~feat: add LangfuseProvider for remote trace evaluation~~ feat: add LangfuseProvider for remote trace evaluation Feb 25, 2026

feat: add langfuse feature to static analysis environment

395abf6

Include "langfuse" in the hatch static analysis environment features alongside "otel" to enable type checking and linting of langfuse-related code.

afarntrog temporarily deployed to auto-approve February 25, 2026 14:17 — with GitHub Actions Inactive

afarntrog requested a review from mkmeral February 25, 2026 15:16

mkmeral approved these changes Feb 25, 2026

View reviewed changes

pyproject.toml Show resolved Hide resolved

JackYPCOnline approved these changes Feb 25, 2026

View reviewed changes

poshinchen reviewed Feb 25, 2026

View reviewed changes

src/strands_evals/providers/langfuse_provider.py Show resolved Hide resolved

poshinchen reviewed Feb 25, 2026

View reviewed changes

src/strands_evals/providers/langfuse_provider.py Show resolved Hide resolved

poshinchen reviewed Feb 25, 2026

View reviewed changes

src/strands_evals/providers/langfuse_provider.py Show resolved Hide resolved

poshinchen approved these changes Feb 25, 2026

View reviewed changes

afarntrog merged commit be204b4 into strands-agents:main Feb 25, 2026
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add LangfuseProvider for remote trace evaluation#144

feat: add LangfuseProvider for remote trace evaluation#144
afarntrog merged 15 commits intostrands-agents:mainfrom
afarntrog:trace_provider_langfuse

afarntrog commented Feb 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mkmeral commented Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

afarntrog commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary

Motivation

What changed

How to try it

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mkmeral commented Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

afarntrog commented Feb 23, 2026 •

edited

Loading