Skip to content

Conversation

@Dwij1704
Copy link
Collaborator

@Dwij1704 Dwij1704 commented Dec 11, 2025

Note

Adds a new Strands instrumentation package that converts Strands OTEL spans to OpenInference, plus GenAI semconv constants and CI/test wiring.

  • Instrumentation (new package instrumentation/openinference-instrumentation-strands):
    • Processor: StrandsToOpenInferenceProcessor maps Strands spans and GenAI attrs to OpenInference (AGENT/CHAIN/TOOL/LLM, messages, tools, token usage, graph node attrs, invocation params, metadata).
    • Packaging/Docs: pyproject.toml, README.md, CHANGELOG.md, LICENSE.
    • Examples/Tests: examples/* (basic, streaming) and tests/* with mock spans.
  • Core (openinference-instrumentation):
    • Add gen_ai_attributes.py re-exporting OTEL GenAI semconv (incubating) and utility constants; export via __init__.
    • Add dependency opentelemetry-semantic-conventions>=0.60b0; bump version to 0.1.43.
  • Tooling/CI:
    • Extend tox.ini with strands envs and steps; ensure local packages are (re)installed during CI.
    • Update dev requirements to include opentelemetry-semantic-conventions for local/CI.

Written by Cursor Bugbot for commit 8d142c2. This will update automatically on new commits. Configure here.

@Dwij1704 Dwij1704 requested a review from a team as a code owner December 11, 2025 19:38
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Dec 11, 2025
@Dwij1704 Dwij1704 changed the title Aws strands instrumentation feat: aws strands instrumentation Dec 11, 2025
Comment on lines 200 to 213
if hasattr(event, "__dict__") and "stop" in getattr(event, "__dict__", {}):
stop_event = event
yield event
except Exception as exception:
span.set_status(trace_api.Status(trace_api.StatusCode.ERROR, str(exception)))
span.record_exception(exception)
raise

span.set_status(trace_api.StatusCode.OK)
if stop_event is not None:
stop_reason, message, metrics, *_ = stop_event["stop"]
span.set_attribute("event_loop.stop_reason", stop_reason)
if message:
_add_message_attributes(span, message, prefix="output")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical type mismatch bug. Line 200 checks if event has a __dict__ attribute and captures it as stop_event = event, but line 210 treats stop_event as a dictionary with stop_event["stop"]. If stop_event is an object (not a dict), this will raise TypeError: 'ClassName' object is not subscriptable.

Fix:

if stop_event is not None:
    stop_dict = stop_event.__dict__ if hasattr(stop_event, "__dict__") else stop_event
    if "stop" in stop_dict:
        stop_reason, message, metrics, *_ = stop_dict["stop"]
        span.set_attribute("event_loop.stop_reason", stop_reason)
        if message:
            _add_message_attributes(span, message, prefix="output")
Suggested change
if hasattr(event, "__dict__") and "stop" in getattr(event, "__dict__", {}):
stop_event = event
yield event
except Exception as exception:
span.set_status(trace_api.Status(trace_api.StatusCode.ERROR, str(exception)))
span.record_exception(exception)
raise
span.set_status(trace_api.StatusCode.OK)
if stop_event is not None:
stop_reason, message, metrics, *_ = stop_event["stop"]
span.set_attribute("event_loop.stop_reason", stop_reason)
if message:
_add_message_attributes(span, message, prefix="output")
if hasattr(event, "__dict__") and "stop" in getattr(event, "__dict__", {}):
stop_event = event
yield event
except Exception as exception:
span.set_status(trace_api.Status(trace_api.StatusCode.ERROR, str(exception)))
span.record_exception(exception)
raise
span.set_status(trace_api.StatusCode.OK)
if stop_event is not None:
stop_dict = stop_event.__dict__ if hasattr(stop_event, "__dict__") else stop_event
if "stop" in stop_dict:
stop_reason, message, metrics, *_ = stop_dict["stop"]
span.set_attribute("event_loop.stop_reason", stop_reason)
if message:
_add_message_attributes(span, message, prefix="output")

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

@nate-mar
Copy link
Contributor

Hi @Dwij1704 did you look at the span processor approach mentioned in the comment in the related issue. What was the retionale for going with the full instrumentation approach given that it seems like they have a native otel instrumentation?

@Dwij1704
Copy link
Collaborator Author

Hi @Dwij1704 did you look at the span processor approach mentioned in the comment in the related issue. What was the retionale for going with the full instrumentation approach given that it seems like they have a native otel instrumentation?

Yet to check, Will let you know when its ready for review

@Dwij1704 Dwij1704 marked this pull request as draft December 12, 2025 00:40
@Dwij1704
Copy link
Collaborator Author

Hi @Dwij1704 did you look at the span processor approach mentioned in the comment in the related issue. What was the retionale for going with the full instrumentation approach given that it seems like they have a native otel instrumentation?

You're absolutely right - I was unaware about span processor approach, I should have investigated the processor approach first as mentioned in the issue. I've now tested Strands' native OTEL instrumentation, confirmed it's comprehensive, and updated the PR to use a span processor instead of the full instrumentation wrapper. Much cleaner approach!

@Dwij1704 Dwij1704 marked this pull request as ready for review December 12, 2025 02:43
"""
return {
"processor_name": "StrandsToOpenInferenceProcessor",
"version": "2.1.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version inconsistency. The processor reports version 2.1.0 in get_processor_info() but the actual package version in __init__.py line 43 is 0.1.0. This will cause confusion and incorrect version reporting in telemetry and debugging.

Fix:

"version": "0.1.0",
Suggested change
"version": "2.1.0",
"version": "0.1.0",

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems legitimate


def shutdown(self) -> SpanExportResult: # type: ignore[override]
"""Called when the processor is shutdown."""
return SpanExportResult.SUCCESS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Incorrect return type for shutdown method

The shutdown method returns SpanExportResult.SUCCESS but the SpanProcessor interface expects None as the return type. The # type: ignore[override] comment acknowledges this type mismatch. Other span processors in the codebase correctly return None from shutdown(). This could cause issues with code that expects the standard interface behavior.

Fix in Cursor Fix in Web

"""
return {
"processor_name": "StrandsToOpenInferenceProcessor",
"version": "2.1.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems legitimate

elif span.parent and hasattr(span.parent, "span_id"):
parent_id = span.parent.span_id

self.span_hierarchy[span_id] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know some of this was just copied from a previous proposed implementation, but can't we just get access to the parent id from the ReadableSpan in on_end? That way we don't need to worry about tracking any kind of state. Furthermore, start_time should already be included when the span is created.

message["message.content"] = " ".join(text_parts) if text_parts else ""

# Clean up empty arrays
if not message["message.tool_calls"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment to the one above

if span_kind == "LLM":
# LLM format
output_structure = {
"choices": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just clarifying for my benefit: Is there always only one choice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strands emits single choice per LLM call; index 0 per OpenTelemetry GenAI semantic conventions

if span_kind == "AGENT":
result["graph.node.id"] = "strands_agent"
elif span_kind == "CHAIN":
if span_name == "execute_event_loop_cycle":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to understand and document what execute event loop cycle is and how it's needed in the context of graph nodes? I presume this is why span hierarchy needs to be tracked, but it's a bit of an unfortunate proposition to have to try to track historical spans because we dont' know how long they last before we might have to clean them out. Ideally, there'd be some way that we can extract the necessary information in a stateless way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we could potentially omit graph.node.parent_id if it would mean needing to track state about other spans in the processor, which I think we should definitely try to avoid for a number of reasons. It seems like that attribute may be optional anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The processor is now fully stateless:
Removed all state tracking: span_hierarchy, current_cycle_id, and processed_spans have all been removed

Omitted graph.node.parent_id for LLM/TOOL spans -- Since reliably determining parent info requires state tracking, these are now omitted per your suggestion that the attribute is optional

The processor no longer maintains any cross-span state, eliminating memory leak concerns and concurrency issues. Documented execute_event_loop_cycle: Added comment in _set_graph_node_attributes() explaining it's Strands' agentic loop iteration where the LLM reasons, plans, and optionally selects tools

… enhance StrandsToOpenInferenceProcessor with improved attribute handling and JSON serialization

for strands_key, openinf_key in token_mappings:
if value := attrs.get(strands_key):
result[openinf_key] = value
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Token counts of zero are silently dropped

The walrus operator pattern if value := attrs.get(strands_key) evaluates to False when value is 0, causing valid token counts of zero to be dropped. While 0 is uncommon for token usage, it's a valid value that represents legitimate data. The condition treats 0 the same as missing data. A proper check would be value = attrs.get(strands_key); if value is not None: to preserve zero values while still skipping truly missing attributes.

Fix in Cursor Fix in Web

@@ -0,0 +1 @@
_instruments = ("strands-agents >= 0.1.0",)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version mismatch: strands-agents >= 0.1.0 is specified here, but strands-agents >= 1.19.0 is required in pyproject.toml (lines 36, 41) and examples/requirements.txt (line 2). This inconsistency could cause installation of an incompatible version (0.1.x) that lacks features used by the processor, leading to runtime failures.

_instruments = ("strands-agents >= 1.19.0",)
Suggested change
_instruments = ("strands-agents >= 0.1.0",)
_instruments = ("strands-agents >= 1.19.0",)

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.


for strands_key, openinf_key in token_mappings:
if value := attrs.get(strands_key):
result[openinf_key] = value
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Zero token counts are silently dropped during mapping

The _map_token_usage method uses if value := attrs.get(strands_key): which evaluates to False when the token count is 0 (since 0 is falsy in Python). This causes legitimate zero token counts to be silently dropped instead of being recorded. Token counts of zero are valid values (e.g., for empty prompts or no completion tokens) and need to be checked with is not None instead of a truthy check.

Fix in Cursor Fix in Web

if normalized.get("message.role") == "user":
input_messages.append(normalized)
except json.JSONDecodeError:
input_messages.append({"message.role": "user", "message.content": str(prompt)})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Non-list JSON prompts are silently dropped during extraction

In _extract_messages_from_attributes, when processing prompt, only JSON that parses to a list is handled. If prompt is valid JSON that parses to a dict (e.g., '{"role": "user", "content": "Hello"}') or any other non-list type, it falls through without being added to input_messages and no exception is raised, so the except block doesn't run either. The data is silently lost. This is inconsistent with the completion handling in the same function, which has elif isinstance(completion_data, dict) and else branches to handle all JSON types.

Fix in Cursor Fix in Web

Comment on lines 64 to 65
span._attributes.clear() # type: ignore[attr-defined]
span._attributes.update(transformed_attrs) # type: ignore[attr-defined]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this brings up an interesting discussion as to whether it's more appropriate to use an exporter vs a processor for this kind of transformation/mapping.

Arguably, an exporter is more appropriate because you aren't changing the fundamental data in a span for things like redaction, but rather trying to transform your span data for a specific backend. However, if you look at the Exporter interface in Python, it takes in a sequence of readable spans, so they still are immutable spans, though you could deep copy, then transform and re-set them the Sequence.

I think given that there is ongoing conversations (see: open-telemetry/opentelemetry-python#4424 and open-telemetry/opentelemetry-specification#1089) about the appropriate way to handle this in the SDK vs a collector, I'm ok with still doing this in a processor for now, though I think long term an exporter probably makes more sense.

However, for this Python implementation, we should at the very least avoid accessing private attributes of the ReadableSpan. I'm pretty sure they expose an attributes attribute on the ReadableSpan, and then you can modify the key/values as done in an example like this,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, my mistake; I was not looking at a Python example.

You're right here, but if we're modifying private attributes anyway, can we just set it directly? span._attributes = transformed_attributes?

The spec does have info about a on span end hook, but it's unfortunately. not implemented yet, as mentioned in one of the above links.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged as valid architectural concern, but appropriate for now given ongoing OTEL discussions

Comment on lines 77 to 79
logger.error(f"Failed to transform span '{span.name}': {e}", exc_info=True)
span._attributes.clear() # type: ignore[attr-defined]
span._attributes.update(original_attrs) # type: ignore[attr-defined]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, I'd prefer if we not operate on private attributes if we can avoid it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above revised comment.

span._attributes.clear() # type: ignore[attr-defined]
span._attributes.update(original_attrs) # type: ignore[attr-defined]

def _transform_attributes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both attires and span here since attrs comes from the span?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need both. While attrs comes from span, we extract it once at the start and use it throughout transformation to avoid repeated attribute access.


if model_id:
result["llm.model_name"] = model_id
result["gen_ai.request.model"] = model_id
Copy link
Contributor

@nate-mar nate-mar Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we look into using the genai package here to get the constants, so that we can immediately follow up if there is breakage with their library changes, especially since it's still kinda experimental? See the pydantic instrumentation for an example.

if message:
input_messages.append(message)

elif event_name == "gen_ai.assistant.message":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in a comment above, it seems like we reference some of these convention constants in the Pedantic instrumentation. I think there is an opportunity here for us to maybe pull out a few of the these constants and put them into a helper package like the base openinference-instrumentation package so that the mapping can be shared/consistent across all implementations..

@nate-mar
Copy link
Contributor

Please remember to take a look at the auto generated review comments as well. Thanks.

…nd event names, update dependencies, and increment version to 0.1.43
@Dwij1704
Copy link
Collaborator Author

Please remember to take a look at the auto generated review comments as well. Thanks.

Thanks for the review, pushed a bunch of fixes catering your and other reviews! Adding helper for shared semconv is definitely helpful in the longer run

… openinference-semantic-conventions in tox.ini
if tool_calls:
message["message.tool_calls"] = tool_calls

return message
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Empty message returned without content or tool_calls

The _parse_message_content function returns a message dict with only message.role when parsing list content that yields no text or tool calls (e.g., empty JSON array "[]"). This differs from _parse_strands_completion, which explicitly returns None when message.content and message.tool_calls are both absent. The inconsistency causes empty messages to be added to traces since if message: passes for a dict with just a role key.

Additional Locations (1)

Fix in Cursor Fix in Web

else:
result["message.content"] = str(content)

return result
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing default role for dict messages causes inconsistent handling

The _normalize_message function defaults non-dict inputs to "message.role": "user" but does not assign any role when the input is a dict without a "role" key. When a single dict prompt like {"content": "Hello"} is processed at line 215-216, it's appended to input_messages without a role. This causes _create_input_output_values to fail the message.role == "user" check, incorrectly treating a simple message as a "complex conversation" and outputting JSON instead of plain text.

Additional Locations (1)

Fix in Cursor Fix in Web

text_parts = []
for item in message_data:
if isinstance(item, dict) and "text" in item:
text_parts.append(item["text"])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing string conversion causes inconsistent text handling in tool spans

The text_parts.append(item["text"]) call in _handle_tool_span is missing the str() conversion that all other similar calls in this module use (lines 277, 294, 327, 682). If item["text"] contains a non-string value like a number, " ".join(text_parts) will raise a TypeError. While caught by the exception handler, this causes the parsed structured data to be discarded and replaced with a raw string representation, resulting in inconsistent behavior compared to other parsing methods.

Fix in Cursor Fix in Web

params[param_key] = attrs[key]

if params:
result["llm.invocation_parameters"] = safe_json_dumps(params)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Invocation parameters mapping uses incorrect attribute keys

The _map_invocation_parameters function looks for attribute keys "max_tokens", "temperature", and "top_p", but Strands spans use GenAI semantic convention names like "gen_ai.request.max_tokens", "gen_ai.request.temperature", and "gen_ai.request.top_p". Comparing with _map_token_usage which correctly uses constants like GEN_AI_USAGE_INPUT_TOKENS, this function should use GEN_AI_REQUEST_MAX_TOKENS, GEN_AI_REQUEST_TEMPERATURE, and GEN_AI_REQUEST_TOP_P instead. As written, invocation parameters will never be captured from Strands spans.

Fix in Cursor Fix in Web

elif isinstance(prompt_data, dict):
# Handle single dict prompt (e.g., {"role": "user", "content": "Hello"})
normalized = self._normalize_message(prompt_data)
input_messages.append(normalized)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Dict prompt handling inconsistent with list prompt filtering

In _extract_messages_from_attributes, there's inconsistent handling between list and dict prompts. The list case (lines 208-212) filters messages to only include those where message.role == "user", but the dict case (lines 213-216) adds the normalized message unconditionally. Additionally, _normalize_message can return a dict without message.role if the input dict lacks a "role" key. This means a dict prompt with an "assistant" role or no role would be incorrectly added to input_messages, while the same message in a list would be filtered out.

Fix in Cursor Fix in Web

else:
result["message.content"] = str(content)

return result
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Normalized messages may lack required role field

The _normalize_message function can return a dict without message.role when given a dict that lacks both "role" and "content" keys. Unlike non-dict inputs which get a default role of "user", dict inputs without a "role" key produce a result with no role at all. When called from the single-dict prompt path (line 215-216), this incomplete message is unconditionally appended to input_messages without validation, unlike the list path which filters by role. This can produce malformed output structures where messages lack the expected message.role field required by the OpenInference schema.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants