feat: add LangfuseProvider for remote trace evaluation#144
Merged
afarntrog merged 15 commits intostrands-agents:mainfrom Feb 25, 2026
Merged
feat: add LangfuseProvider for remote trace evaluation#144afarntrog merged 15 commits intostrands-agents:mainfrom
afarntrog merged 15 commits intostrands-agents:mainfrom
Conversation
Introduce an abstract TraceProvider base class for retrieving agent trace data from observability backends for evaluation. This includes: - TraceProvider ABC with get_session, list_sessions, and get_session_by_trace_id methods - SessionFilter dataclass for filtering session discovery - Custom error hierarchy (TraceProviderError, SessionNotFoundError, TraceNotFoundError, ProviderError) - Session and Trace data types with span tree construction and convenience accessors (input/output messages, token usage, duration) - New providers module exposed at package level - Comprehensive unit tests for providers and trace types
Add abstract TraceProvider that retrieves agent trace data from observability backends and returns Session/Trace types the evals system already consumes. - TraceProvider ABC with get_session() (required), list_sessions() and get_session_by_trace_id() (optional, raise NotImplementedError) - SessionFilter dataclass for time-range and limit-based discovery - Exception hierarchy: TraceProviderError base with SessionNotFoundError, TraceNotFoundError, ProviderError - Export providers module from strands_evals package
Add abstract TraceProvider that retrieves agent trace data from observability backends and returns Session/Trace types the evals system already consumes. - TraceProvider ABC with get_session() (required), list_sessions() and get_session_by_trace_id() (optional, raise NotImplementedError) - SessionFilter dataclass for time-range and limit-based discovery - Exception hierarchy: TraceProviderError base with SessionNotFoundError, TraceNotFoundError, ProviderError - Export providers module from strands_evals package
Implement LangfuseProvider that fetches agent traces from Langfuse and converts them to Session objects for the evals pipeline. Supports session-level and trace-level retrieval with paginated API calls. - get_evaluation_data(): fetch traces by session ID, convert Langfuse observations to typed spans (InferenceSpan, ToolExecutionSpan, AgentInvocationSpan), extract output from last agent invocation - list_sessions(): paginated session discovery with time-range filtering - get_evaluation_data_by_trace_id(): single trace retrieval - Host resolution: explicit param > LANGFUSE_HOST env var > cloud default - 30 unit tests (mocked SDK), 15 integration tests (real Langfuse + evaluators)
- Add `langfuse` optional dependency group in pyproject.toml - Implement session limit filtering in `_fetch_sessions` to stop yielding after reaching the configured limit - Improve type annotations from bare `list` to `list[Any]` - Standardize log messages to structured key=value format - Fix missing newline at end of file
- Extract `_fetch_all_pages` helper to deduplicate pagination logic - Replace if/elif observation dispatch with a dispatch table pattern - Extract helper methods for message building and output parsing - Simplify exception handling using base `TraceProviderError` class - Remove redundant `pass` statements from exception classes - Clean up imports to use shorter relative paths
…to trace_provider_langfuse
- Lazy-load LangfuseProvider via module-level __getattr__ to defer langfuse import until actually needed - Add tenacity-based retry with exponential backoff on ReadTimeout - Introduce configurable timeout parameter (default 120s) passed via request_options to all Langfuse API calls - Remove try/except ImportError guard in favor of lazy loading
Remove the `list_sessions` method and its associated tests, along with the `SessionFilter` import and related test helpers. This simplifies the LangfuseProvider by dropping unused session listing functionality.
…e provider - Remove unused get_evaluation_data_by_trace_id method and TraceNotFoundError import - Rename shadowed local variables in _parse_message for clarity - Reformat multi-line constructor calls for ToolCallContent and ToolResultContent - Update tests to reflect removal of trace-by-id functionality - Add missing newline at end of __init__.py
mkmeral
reviewed
Feb 23, 2026
Contributor
|
/strands review |
Expand docstrings across the Langfuse provider to document observation field mappings, routing logic, data formats, and conversion behavior. This improves developer understanding of how Langfuse traces and observations are transformed into typed eval spans and messages.
- Add lazy __getattr__ import for LangfuseProvider to avoid import errors when the optional langfuse package is not installed - Add comprehensive docstring to LangfuseProvider.__init__ describing credential resolution, usage examples, and parameters - Cap langfuse dependency to <3 to prevent breaking changes
Include "langfuse" in the hatch static analysis environment features alongside "otel" to enable type checking and linting of langfuse-related code.
mkmeral
approved these changes
Feb 25, 2026
JackYPCOnline
approved these changes
Feb 25, 2026
poshinchen
reviewed
Feb 25, 2026
poshinchen
reviewed
Feb 25, 2026
poshinchen
reviewed
Feb 25, 2026
poshinchen
approved these changes
Feb 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Summary
Adds
LangfuseProvider, the first concrete implementation of theTraceProviderinterface, enabling evaluation of Strands agent traces stored in Langfuse. This lets users run the evals pipeline against production agent sessions without needing local in-memory trace data.Motivation
The existing eval framework requires traces to be available in-process. Teams using Langfuse as their observability backend had no way to feed those traces into the evaluation pipeline. This PR closes that gap by fetching, paginating, and converting Langfuse observations into the typed span model (
InferenceSpan,ToolExecutionSpan,AgentInvocationSpan) that evaluators already consume.What changed
New:
LangfuseProvider(src/strands_evals/providers/langfuse_provider.py— 620 lines)TraceProvider.get_evaluation_data(session_id)→TaskOutput_fetch_all_pages)ReadTimeoutwith exponential backoff viatenacity(_call_with_retry)generation→InferenceSpan(with full message parsing: user/assistant, tool calls, tool results)spanwith nameTool: *→ToolExecutionSpanspanwith nameAgent *→AgentInvocationSpanLANGFUSE_PUBLIC_KEY,LANGFUSE_SECRET_KEY,LANGFUSE_HOST)Packaging
langfuseoptional dependency group (langfuse>=2.0.0) inpyproject.tomllangfuseto the hatch test environment featuresCleanup
passstatements from exception classes__init__.pyHow to try it
Related Issues
#98
Documentation PR
Type of Change
New feature
Testing
29 unit tests covering all conversion paths, pagination, retry logic, error handling, and edge cases
11 integration tests (
tests_integ/test_langfuse_provider.py) that hit a real Langfuse instanceI ran
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.