Skip to content

feat: add LangfuseProvider for remote trace evaluation#144

Merged
afarntrog merged 15 commits intostrands-agents:mainfrom
afarntrog:trace_provider_langfuse
Feb 25, 2026
Merged

feat: add LangfuseProvider for remote trace evaluation#144
afarntrog merged 15 commits intostrands-agents:mainfrom
afarntrog:trace_provider_langfuse

Conversation

@afarntrog
Copy link
Copy Markdown
Contributor

@afarntrog afarntrog commented Feb 23, 2026

Description

Summary

Adds LangfuseProvider, the first concrete implementation of the TraceProvider interface, enabling evaluation of Strands agent traces stored in Langfuse. This lets users run the evals pipeline against production agent sessions without needing local in-memory trace data.

Motivation

The existing eval framework requires traces to be available in-process. Teams using Langfuse as their observability backend had no way to feed those traces into the evaluation pipeline. This PR closes that gap by fetching, paginating, and converting Langfuse observations into the typed span model (InferenceSpan, ToolExecutionSpan, AgentInvocationSpan) that evaluators already consume.

What changed

New: LangfuseProvider (src/strands_evals/providers/langfuse_provider.py — 620 lines)

  • Implements TraceProvider.get_evaluation_data(session_id)TaskOutput
  • Fetches traces by session ID with automatic pagination (_fetch_all_pages)
  • Retries on ReadTimeout with exponential backoff via tenacity (_call_with_retry)
  • Converts Langfuse observations to evals span types:
    • generationInferenceSpan (with full message parsing: user/assistant, tool calls, tool results)
    • span with name Tool: *ToolExecutionSpan
    • span with name Agent *AgentInvocationSpan
  • Extracts final agent output from the last trace's last inference span
  • Configurable timeout, credentials via constructor args or env vars (LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST)

Packaging

  • Added langfuse optional dependency group (langfuse>=2.0.0) in pyproject.toml
  • Added langfuse to the hatch test environment features

Cleanup

  • Removed redundant pass statements from exception classes
  • Fixed missing newline at end of __init__.py

How to try it

from strands_evals.providers import LangfuseProvider

provider = LangfuseProvider(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
)
task_output = provider.get_evaluation_data(session_id="your-session-id")
# task_output.output  → final agent response string
# task_output.trajectory  → Session with typed traces/spans

Related Issues

#98

Documentation PR

Type of Change

New feature

Testing

  • 29 unit tests covering all conversion paths, pagination, retry logic, error handling, and edge cases

  • 11 integration tests (tests_integ/test_langfuse_provider.py) that hit a real Langfuse instance

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Introduce an abstract TraceProvider base class for retrieving agent
trace data from observability backends for evaluation. This includes:

- TraceProvider ABC with get_session, list_sessions, and
  get_session_by_trace_id methods
- SessionFilter dataclass for filtering session discovery
- Custom error hierarchy (TraceProviderError, SessionNotFoundError,
  TraceNotFoundError, ProviderError)
- Session and Trace data types with span tree construction and
  convenience accessors (input/output messages, token usage, duration)
- New providers module exposed at package level
- Comprehensive unit tests for providers and trace types
Add abstract TraceProvider that retrieves agent trace data from
observability backends and returns Session/Trace types the evals
system already consumes.

- TraceProvider ABC with get_session() (required), list_sessions()
  and get_session_by_trace_id() (optional, raise NotImplementedError)
- SessionFilter dataclass for time-range and limit-based discovery
- Exception hierarchy: TraceProviderError base with
  SessionNotFoundError, TraceNotFoundError, ProviderError
- Export providers module from strands_evals package
Add abstract TraceProvider that retrieves agent trace data from
observability backends and returns Session/Trace types the evals
system already consumes.

- TraceProvider ABC with get_session() (required), list_sessions()
  and get_session_by_trace_id() (optional, raise NotImplementedError)
- SessionFilter dataclass for time-range and limit-based discovery
- Exception hierarchy: TraceProviderError base with
  SessionNotFoundError, TraceNotFoundError, ProviderError
- Export providers module from strands_evals package
Implement LangfuseProvider that fetches agent traces from Langfuse and
converts them to Session objects for the evals pipeline. Supports
session-level and trace-level retrieval with paginated API calls.

- get_evaluation_data(): fetch traces by session ID, convert Langfuse
  observations to typed spans (InferenceSpan, ToolExecutionSpan,
  AgentInvocationSpan), extract output from last agent invocation
- list_sessions(): paginated session discovery with time-range filtering
- get_evaluation_data_by_trace_id(): single trace retrieval
- Host resolution: explicit param > LANGFUSE_HOST env var > cloud default
- 30 unit tests (mocked SDK), 15 integration tests (real Langfuse + evaluators)
- Add `langfuse` optional dependency group in pyproject.toml
- Implement session limit filtering in `_fetch_sessions` to stop
  yielding after reaching the configured limit
- Improve type annotations from bare `list` to `list[Any]`
- Standardize log messages to structured key=value format
- Fix missing newline at end of file
- Extract `_fetch_all_pages` helper to deduplicate pagination logic
- Replace if/elif observation dispatch with a dispatch table pattern
- Extract helper methods for message building and output parsing
- Simplify exception handling using base `TraceProviderError` class
- Remove redundant `pass` statements from exception classes
- Clean up imports to use shorter relative paths
- Lazy-load LangfuseProvider via module-level __getattr__ to defer
  langfuse import until actually needed
- Add tenacity-based retry with exponential backoff on ReadTimeout
- Introduce configurable timeout parameter (default 120s) passed
  via request_options to all Langfuse API calls
- Remove try/except ImportError guard in favor of lazy loading
Remove the `list_sessions` method and its associated tests, along with
the `SessionFilter` import and related test helpers. This simplifies
the LangfuseProvider by dropping unused session listing functionality.
…e provider

- Remove unused get_evaluation_data_by_trace_id method and TraceNotFoundError import
- Rename shadowed local variables in _parse_message for clarity
- Reformat multi-line constructor calls for ToolCallContent and ToolResultContent
- Update tests to reflect removal of trace-by-id functionality
- Add missing newline at end of __init__.py
@afarntrog afarntrog changed the title Trace provider langfuse feat: langfuse Trace provider Feb 23, 2026
@afarntrog afarntrog changed the title feat: langfuse Trace provider feat: add LangfuseProvider for remote trace evaluation Feb 23, 2026
@mkmeral
Copy link
Copy Markdown
Contributor

mkmeral commented Feb 23, 2026

/strands review

Expand docstrings across the Langfuse provider to document observation
field mappings, routing logic, data formats, and conversion behavior.
This improves developer understanding of how Langfuse traces and
observations are transformed into typed eval spans and messages.
- Add lazy __getattr__ import for LangfuseProvider to avoid import
  errors when the optional langfuse package is not installed
- Add comprehensive docstring to LangfuseProvider.__init__ describing
  credential resolution, usage examples, and parameters
- Cap langfuse dependency to <3 to prevent breaking changes
@afarntrog afarntrog changed the title feat: add LangfuseProvider for remote trace evaluation feat: add LangfuseProvider for remote trace evaluation Feb 25, 2026
Include "langfuse" in the hatch static analysis environment features
alongside "otel" to enable type checking and linting of langfuse-related
code.
@afarntrog afarntrog merged commit be204b4 into strands-agents:main Feb 25, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants