Problem Statement
There is currently no way to run strands-evals against agent traces stored in OpenSearch. This affects users of observability-stack, agent-health, Amazon OpenSearch Service, or anyone else using OpenSearch as their storage backend for OTel GenAI agent traces.
Proposed Solution
Add OpenSearchProvider(TraceProvider) and OpenSearchSessionMapper(SessionMapper), same pattern as CloudWatchProvider and LangfuseProvider.
OpenSearchProvider
- Wraps
OpenSearchTraceRetriever from opensearch-genai-observability-sdk-py for querying and auth (basic, SigV4, none)
- Implements
get_evaluation_data(session_id) returning TaskOutput
- Queries by conversation ID or trace ID (retriever handles both)
OpenSearchSessionMapper
- Converts genai-sdk
SpanRecord objects to strands-evals Session/Trace/Span types
- Maps
invoke_agent to AgentInvocationSpan, execute_tool to ToolExecutionSpan, chat to InferenceSpan
- Scopes tool attribution to parent agent spans for correct multi-agent behavior
Dependency: opensearch-genai-observability-sdk-py[opensearch]>=0.2.7 as optional extra (pip install strands-agents-evals[opensearch])
Use Case
Local / development (basic auth)
from strands_evals.providers import OpenSearchProvider
from strands_evals.evaluators import HelpfulnessEvaluator
provider = OpenSearchProvider(
host="https://localhost:9200",
auth=("admin", "password"),
verify_certs=False,
)
task_output = provider.get_evaluation_data(session_id="my-session")
evaluator = HelpfulnessEvaluator(model="us.anthropic.claude-sonnet-4-20250514-v1:0")
results = evaluator.evaluate(task_output)
Amazon OpenSearch Service (SigV4)
from opensearchpy import RequestsAWSV4SignerAuth
import boto3
credentials = boto3.Session().get_credentials()
auth = RequestsAWSV4SignerAuth(credentials, "us-east-1", "es")
provider = OpenSearchProvider(
host="https://my-domain.us-east-1.es.amazonaws.com",
auth=auth,
)
Alternatives Considered
Could use opensearch-py directly instead of wrapping the genai-sdk (similar to how CloudWatchProvider uses boto3). The genai-sdk was chosen because it already handles Data Prepper's de-dotted field mappings and OTel message parsing, which is non-trivial to reimplement, plus auth handling (basic, SigV4 signing for AWS managed OpenSearch). Open to the direct approach if maintainers prefer fewer transitive dependencies.
Prior Art
Implementation
PR: #192
Problem Statement
There is currently no way to run strands-evals against agent traces stored in OpenSearch. This affects users of observability-stack, agent-health, Amazon OpenSearch Service, or anyone else using OpenSearch as their storage backend for OTel GenAI agent traces.
Proposed Solution
Add
OpenSearchProvider(TraceProvider)andOpenSearchSessionMapper(SessionMapper), same pattern as CloudWatchProvider and LangfuseProvider.OpenSearchProvider
OpenSearchTraceRetrieverfromopensearch-genai-observability-sdk-pyfor querying and auth (basic, SigV4, none)get_evaluation_data(session_id)returningTaskOutputOpenSearchSessionMapper
SpanRecordobjects to strands-evalsSession/Trace/Spantypesinvoke_agenttoAgentInvocationSpan,execute_tooltoToolExecutionSpan,chattoInferenceSpanDependency:
opensearch-genai-observability-sdk-py[opensearch]>=0.2.7as optional extra (pip install strands-agents-evals[opensearch])Use Case
Local / development (basic auth)
Amazon OpenSearch Service (SigV4)
Alternatives Considered
Could use
opensearch-pydirectly instead of wrapping the genai-sdk (similar to how CloudWatchProvider uses boto3). The genai-sdk was chosen because it already handles Data Prepper's de-dotted field mappings and OTel message parsing, which is non-trivial to reimplement, plus auth handling (basic, SigV4 signing for AWS managed OpenSearch). Open to the direct approach if maintainers prefer fewer transitive dependencies.Prior Art
Implementation
PR: #192