Skip to content

[FEATURE] Add OpenSearchProvider for evaluating traces stored in OpenSearch #191

@kylehounslow

Description

@kylehounslow

Problem Statement

There is currently no way to run strands-evals against agent traces stored in OpenSearch. This affects users of observability-stack, agent-health, Amazon OpenSearch Service, or anyone else using OpenSearch as their storage backend for OTel GenAI agent traces.

Proposed Solution

Add OpenSearchProvider(TraceProvider) and OpenSearchSessionMapper(SessionMapper), same pattern as CloudWatchProvider and LangfuseProvider.

OpenSearchProvider

  • Wraps OpenSearchTraceRetriever from opensearch-genai-observability-sdk-py for querying and auth (basic, SigV4, none)
  • Implements get_evaluation_data(session_id) returning TaskOutput
  • Queries by conversation ID or trace ID (retriever handles both)

OpenSearchSessionMapper

  • Converts genai-sdk SpanRecord objects to strands-evals Session/Trace/Span types
  • Maps invoke_agent to AgentInvocationSpan, execute_tool to ToolExecutionSpan, chat to InferenceSpan
  • Scopes tool attribution to parent agent spans for correct multi-agent behavior

Dependency: opensearch-genai-observability-sdk-py[opensearch]>=0.2.7 as optional extra (pip install strands-agents-evals[opensearch])

Use Case

Local / development (basic auth)

from strands_evals.providers import OpenSearchProvider
from strands_evals.evaluators import HelpfulnessEvaluator

provider = OpenSearchProvider(
    host="https://localhost:9200",
    auth=("admin", "password"),
    verify_certs=False,
)

task_output = provider.get_evaluation_data(session_id="my-session")
evaluator = HelpfulnessEvaluator(model="us.anthropic.claude-sonnet-4-20250514-v1:0")
results = evaluator.evaluate(task_output)

Amazon OpenSearch Service (SigV4)

from opensearchpy import RequestsAWSV4SignerAuth
import boto3

credentials = boto3.Session().get_credentials()
auth = RequestsAWSV4SignerAuth(credentials, "us-east-1", "es")
provider = OpenSearchProvider(
    host="https://my-domain.us-east-1.es.amazonaws.com",
    auth=auth,
)

Alternatives Considered

Could use opensearch-py directly instead of wrapping the genai-sdk (similar to how CloudWatchProvider uses boto3). The genai-sdk was chosen because it already handles Data Prepper's de-dotted field mappings and OTel message parsing, which is non-trivial to reimplement, plus auth handling (basic, SigV4 signing for AWS managed OpenSearch). Open to the direct approach if maintainers prefer fewer transitive dependencies.

Prior Art

Implementation

PR: #192

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions