-
Notifications
You must be signed in to change notification settings - Fork 172
feat: aws strands instrumentation #2513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…package name for Strands
| if hasattr(event, "__dict__") and "stop" in getattr(event, "__dict__", {}): | ||
| stop_event = event | ||
| yield event | ||
| except Exception as exception: | ||
| span.set_status(trace_api.Status(trace_api.StatusCode.ERROR, str(exception))) | ||
| span.record_exception(exception) | ||
| raise | ||
|
|
||
| span.set_status(trace_api.StatusCode.OK) | ||
| if stop_event is not None: | ||
| stop_reason, message, metrics, *_ = stop_event["stop"] | ||
| span.set_attribute("event_loop.stop_reason", stop_reason) | ||
| if message: | ||
| _add_message_attributes(span, message, prefix="output") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical type mismatch bug. Line 200 checks if event has a __dict__ attribute and captures it as stop_event = event, but line 210 treats stop_event as a dictionary with stop_event["stop"]. If stop_event is an object (not a dict), this will raise TypeError: 'ClassName' object is not subscriptable.
Fix:
if stop_event is not None:
stop_dict = stop_event.__dict__ if hasattr(stop_event, "__dict__") else stop_event
if "stop" in stop_dict:
stop_reason, message, metrics, *_ = stop_dict["stop"]
span.set_attribute("event_loop.stop_reason", stop_reason)
if message:
_add_message_attributes(span, message, prefix="output")| if hasattr(event, "__dict__") and "stop" in getattr(event, "__dict__", {}): | |
| stop_event = event | |
| yield event | |
| except Exception as exception: | |
| span.set_status(trace_api.Status(trace_api.StatusCode.ERROR, str(exception))) | |
| span.record_exception(exception) | |
| raise | |
| span.set_status(trace_api.StatusCode.OK) | |
| if stop_event is not None: | |
| stop_reason, message, metrics, *_ = stop_event["stop"] | |
| span.set_attribute("event_loop.stop_reason", stop_reason) | |
| if message: | |
| _add_message_attributes(span, message, prefix="output") | |
| if hasattr(event, "__dict__") and "stop" in getattr(event, "__dict__", {}): | |
| stop_event = event | |
| yield event | |
| except Exception as exception: | |
| span.set_status(trace_api.Status(trace_api.StatusCode.ERROR, str(exception))) | |
| span.record_exception(exception) | |
| raise | |
| span.set_status(trace_api.StatusCode.OK) | |
| if stop_event is not None: | |
| stop_dict = stop_event.__dict__ if hasattr(stop_event, "__dict__") else stop_event | |
| if "stop" in stop_dict: | |
| stop_reason, message, metrics, *_ = stop_dict["stop"] | |
| span.set_attribute("event_loop.stop_reason", stop_reason) | |
| if message: | |
| _add_message_attributes(span, message, prefix="output") |
Spotted by Graphite Agent
Is this helpful? React 👍 or 👎 to let us know.
...openinference-instrumentation-strands/src/openinference/instrumentation/strands/_wrappers.py
Outdated
Show resolved
Hide resolved
.../openinference-instrumentation-strands/src/openinference/instrumentation/strands/__init__.py
Outdated
Show resolved
Hide resolved
…eProcessor for span transformation
You're absolutely right - I was unaware about span processor approach, I should have investigated the processor approach first as mentioned in the issue. I've now tested Strands' native OTEL instrumentation, confirmed it's comprehensive, and updated the PR to use a span processor instead of the full instrumentation wrapper. Much cleaner approach! |
…umentation features and attribute transformations
| """ | ||
| return { | ||
| "processor_name": "StrandsToOpenInferenceProcessor", | ||
| "version": "2.1.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version inconsistency. The processor reports version 2.1.0 in get_processor_info() but the actual package version in __init__.py line 43 is 0.1.0. This will cause confusion and incorrect version reporting in telemetry and debugging.
Fix:
"version": "0.1.0",| "version": "2.1.0", | |
| "version": "0.1.0", |
Spotted by Graphite Agent
Is this helpful? React 👍 or 👎 to let us know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems legitimate
...openinference-instrumentation-strands/src/openinference/instrumentation/strands/processor.py
Outdated
Show resolved
Hide resolved
...openinference-instrumentation-strands/src/openinference/instrumentation/strands/processor.py
Show resolved
Hide resolved
|
|
||
| def shutdown(self) -> SpanExportResult: # type: ignore[override] | ||
| """Called when the processor is shutdown.""" | ||
| return SpanExportResult.SUCCESS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Incorrect return type for shutdown method
The shutdown method returns SpanExportResult.SUCCESS but the SpanProcessor interface expects None as the return type. The # type: ignore[override] comment acknowledges this type mismatch. Other span processors in the codebase correctly return None from shutdown(). This could cause issues with code that expects the standard interface behavior.
...openinference-instrumentation-strands/src/openinference/instrumentation/strands/processor.py
Show resolved
Hide resolved
...openinference-instrumentation-strands/src/openinference/instrumentation/strands/processor.py
Outdated
Show resolved
Hide resolved
python/instrumentation/openinference-instrumentation-strands/CHANGELOG.md
Outdated
Show resolved
Hide resolved
| """ | ||
| return { | ||
| "processor_name": "StrandsToOpenInferenceProcessor", | ||
| "version": "2.1.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems legitimate
...openinference-instrumentation-strands/src/openinference/instrumentation/strands/processor.py
Outdated
Show resolved
Hide resolved
| elif span.parent and hasattr(span.parent, "span_id"): | ||
| parent_id = span.parent.span_id | ||
|
|
||
| self.span_hierarchy[span_id] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know some of this was just copied from a previous proposed implementation, but can't we just get access to the parent id from the ReadableSpan in on_end? That way we don't need to worry about tracking any kind of state. Furthermore, start_time should already be included when the span is created.
...openinference-instrumentation-strands/src/openinference/instrumentation/strands/processor.py
Outdated
Show resolved
Hide resolved
...openinference-instrumentation-strands/src/openinference/instrumentation/strands/processor.py
Outdated
Show resolved
Hide resolved
...openinference-instrumentation-strands/src/openinference/instrumentation/strands/processor.py
Outdated
Show resolved
Hide resolved
| message["message.content"] = " ".join(text_parts) if text_parts else "" | ||
|
|
||
| # Clean up empty arrays | ||
| if not message["message.tool_calls"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment to the one above
...openinference-instrumentation-strands/src/openinference/instrumentation/strands/processor.py
Outdated
Show resolved
Hide resolved
| if span_kind == "LLM": | ||
| # LLM format | ||
| output_structure = { | ||
| "choices": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just clarifying for my benefit: Is there always only one choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strands emits single choice per LLM call; index 0 per OpenTelemetry GenAI semantic conventions
...openinference-instrumentation-strands/src/openinference/instrumentation/strands/processor.py
Show resolved
Hide resolved
| if span_kind == "AGENT": | ||
| result["graph.node.id"] = "strands_agent" | ||
| elif span_kind == "CHAIN": | ||
| if span_name == "execute_event_loop_cycle": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we try to understand and document what execute event loop cycle is and how it's needed in the context of graph nodes? I presume this is why span hierarchy needs to be tracked, but it's a bit of an unfortunate proposition to have to try to track historical spans because we dont' know how long they last before we might have to clean them out. Ideally, there'd be some way that we can extract the necessary information in a stateless way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we could potentially omit graph.node.parent_id if it would mean needing to track state about other spans in the processor, which I think we should definitely try to avoid for a number of reasons. It seems like that attribute may be optional anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The processor is now fully stateless:
Removed all state tracking: span_hierarchy, current_cycle_id, and processed_spans have all been removed
Omitted graph.node.parent_id for LLM/TOOL spans -- Since reliably determining parent info requires state tracking, these are now omitted per your suggestion that the attribute is optional
The processor no longer maintains any cross-span state, eliminating memory leak concerns and concurrency issues. Documented execute_event_loop_cycle: Added comment in _set_graph_node_attributes() explaining it's Strands' agentic loop iteration where the LLM reasons, plans, and optionally selects tools
… enhance StrandsToOpenInferenceProcessor with improved attribute handling and JSON serialization
|
|
||
| for strands_key, openinf_key in token_mappings: | ||
| if value := attrs.get(strands_key): | ||
| result[openinf_key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Token counts of zero are silently dropped
The walrus operator pattern if value := attrs.get(strands_key) evaluates to False when value is 0, causing valid token counts of zero to be dropped. While 0 is uncommon for token usage, it's a valid value that represents legitimate data. The condition treats 0 the same as missing data. A proper check would be value = attrs.get(strands_key); if value is not None: to preserve zero values while still skipping truly missing attributes.
| @@ -0,0 +1 @@ | |||
| _instruments = ("strands-agents >= 0.1.0",) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version mismatch: strands-agents >= 0.1.0 is specified here, but strands-agents >= 1.19.0 is required in pyproject.toml (lines 36, 41) and examples/requirements.txt (line 2). This inconsistency could cause installation of an incompatible version (0.1.x) that lacks features used by the processor, leading to runtime failures.
_instruments = ("strands-agents >= 1.19.0",)| _instruments = ("strands-agents >= 0.1.0",) | |
| _instruments = ("strands-agents >= 1.19.0",) |
Spotted by Graphite Agent
Is this helpful? React 👍 or 👎 to let us know.
…son_dumps import in processor module
|
|
||
| for strands_key, openinf_key in token_mappings: | ||
| if value := attrs.get(strands_key): | ||
| result[openinf_key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Zero token counts are silently dropped during mapping
The _map_token_usage method uses if value := attrs.get(strands_key): which evaluates to False when the token count is 0 (since 0 is falsy in Python). This causes legitimate zero token counts to be silently dropped instead of being recorded. Token counts of zero are valid values (e.g., for empty prompts or no completion tokens) and need to be checked with is not None instead of a truthy check.
| if normalized.get("message.role") == "user": | ||
| input_messages.append(normalized) | ||
| except json.JSONDecodeError: | ||
| input_messages.append({"message.role": "user", "message.content": str(prompt)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Non-list JSON prompts are silently dropped during extraction
In _extract_messages_from_attributes, when processing prompt, only JSON that parses to a list is handled. If prompt is valid JSON that parses to a dict (e.g., '{"role": "user", "content": "Hello"}') or any other non-list type, it falls through without being added to input_messages and no exception is raised, so the except block doesn't run either. The data is silently lost. This is inconsistent with the completion handling in the same function, which has elif isinstance(completion_data, dict) and else branches to handle all JSON types.
...openinference-instrumentation-strands/src/openinference/instrumentation/strands/processor.py
Show resolved
Hide resolved
| span._attributes.clear() # type: ignore[attr-defined] | ||
| span._attributes.update(transformed_attrs) # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this brings up an interesting discussion as to whether it's more appropriate to use an exporter vs a processor for this kind of transformation/mapping.
Arguably, an exporter is more appropriate because you aren't changing the fundamental data in a span for things like redaction, but rather trying to transform your span data for a specific backend. However, if you look at the Exporter interface in Python, it takes in a sequence of readable spans, so they still are immutable spans, though you could deep copy, then transform and re-set them the Sequence.
I think given that there is ongoing conversations (see: open-telemetry/opentelemetry-python#4424 and open-telemetry/opentelemetry-specification#1089) about the appropriate way to handle this in the SDK vs a collector, I'm ok with still doing this in a processor for now, though I think long term an exporter probably makes more sense.
However, for this Python implementation, we should at the very least avoid accessing private attributes of the ReadableSpan. I'm pretty sure they expose an attributes attribute on the ReadableSpan, and then you can modify the key/values as done in an example like this,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, my mistake; I was not looking at a Python example.
You're right here, but if we're modifying private attributes anyway, can we just set it directly? span._attributes = transformed_attributes?
The spec does have info about a on span end hook, but it's unfortunately. not implemented yet, as mentioned in one of the above links.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged as valid architectural concern, but appropriate for now given ongoing OTEL discussions
| logger.error(f"Failed to transform span '{span.name}': {e}", exc_info=True) | ||
| span._attributes.clear() # type: ignore[attr-defined] | ||
| span._attributes.update(original_attrs) # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here, I'd prefer if we not operate on private attributes if we can avoid it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above revised comment.
| span._attributes.clear() # type: ignore[attr-defined] | ||
| span._attributes.update(original_attrs) # type: ignore[attr-defined] | ||
|
|
||
| def _transform_attributes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both attires and span here since attrs comes from the span?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need both. While attrs comes from span, we extract it once at the start and use it throughout transformation to avoid repeated attribute access.
|
|
||
| if model_id: | ||
| result["llm.model_name"] = model_id | ||
| result["gen_ai.request.model"] = model_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we look into using the genai package here to get the constants, so that we can immediately follow up if there is breakage with their library changes, especially since it's still kinda experimental? See the pydantic instrumentation for an example.
| if message: | ||
| input_messages.append(message) | ||
|
|
||
| elif event_name == "gen_ai.assistant.message": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in a comment above, it seems like we reference some of these convention constants in the Pedantic instrumentation. I think there is an opportunity here for us to maybe pull out a few of the these constants and put them into a helper package like the base openinference-instrumentation package so that the mapping can be shared/consistent across all implementations..
|
Please remember to take a look at the auto generated review comments as well. Thanks. |
…nd event names, update dependencies, and increment version to 0.1.43
Thanks for the review, pushed a bunch of fixes catering your and other reviews! Adding helper for shared semconv is definitely helpful in the longer run |
… openinference-semantic-conventions in tox.ini
| if tool_calls: | ||
| message["message.tool_calls"] = tool_calls | ||
|
|
||
| return message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Empty message returned without content or tool_calls
The _parse_message_content function returns a message dict with only message.role when parsing list content that yields no text or tool calls (e.g., empty JSON array "[]"). This differs from _parse_strands_completion, which explicitly returns None when message.content and message.tool_calls are both absent. The inconsistency causes empty messages to be added to traces since if message: passes for a dict with just a role key.
Additional Locations (1)
| else: | ||
| result["message.content"] = str(content) | ||
|
|
||
| return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Missing default role for dict messages causes inconsistent handling
The _normalize_message function defaults non-dict inputs to "message.role": "user" but does not assign any role when the input is a dict without a "role" key. When a single dict prompt like {"content": "Hello"} is processed at line 215-216, it's appended to input_messages without a role. This causes _create_input_output_values to fail the message.role == "user" check, incorrectly treating a simple message as a "complex conversation" and outputting JSON instead of plain text.
Additional Locations (1)
| text_parts = [] | ||
| for item in message_data: | ||
| if isinstance(item, dict) and "text" in item: | ||
| text_parts.append(item["text"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Missing string conversion causes inconsistent text handling in tool spans
The text_parts.append(item["text"]) call in _handle_tool_span is missing the str() conversion that all other similar calls in this module use (lines 277, 294, 327, 682). If item["text"] contains a non-string value like a number, " ".join(text_parts) will raise a TypeError. While caught by the exception handler, this causes the parsed structured data to be discarded and replaced with a raw string representation, resulting in inconsistent behavior compared to other parsing methods.
| params[param_key] = attrs[key] | ||
|
|
||
| if params: | ||
| result["llm.invocation_parameters"] = safe_json_dumps(params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Invocation parameters mapping uses incorrect attribute keys
The _map_invocation_parameters function looks for attribute keys "max_tokens", "temperature", and "top_p", but Strands spans use GenAI semantic convention names like "gen_ai.request.max_tokens", "gen_ai.request.temperature", and "gen_ai.request.top_p". Comparing with _map_token_usage which correctly uses constants like GEN_AI_USAGE_INPUT_TOKENS, this function should use GEN_AI_REQUEST_MAX_TOKENS, GEN_AI_REQUEST_TEMPERATURE, and GEN_AI_REQUEST_TOP_P instead. As written, invocation parameters will never be captured from Strands spans.
| elif isinstance(prompt_data, dict): | ||
| # Handle single dict prompt (e.g., {"role": "user", "content": "Hello"}) | ||
| normalized = self._normalize_message(prompt_data) | ||
| input_messages.append(normalized) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Dict prompt handling inconsistent with list prompt filtering
In _extract_messages_from_attributes, there's inconsistent handling between list and dict prompts. The list case (lines 208-212) filters messages to only include those where message.role == "user", but the dict case (lines 213-216) adds the normalized message unconditionally. Additionally, _normalize_message can return a dict without message.role if the input dict lacks a "role" key. This means a dict prompt with an "assistant" role or no role would be incorrectly added to input_messages, while the same message in a list would be filtered out.
| else: | ||
| result["message.content"] = str(content) | ||
|
|
||
| return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Normalized messages may lack required role field
The _normalize_message function can return a dict without message.role when given a dict that lacks both "role" and "content" keys. Unlike non-dict inputs which get a default role of "user", dict inputs without a "role" key produce a result with no role at all. When called from the single-dict prompt path (line 215-216), this incomplete message is unconditionally appended to input_messages without validation, unlike the list path which filters by role. This can produce malformed output structures where messages lack the expected message.role field required by the OpenInference schema.
Note
Adds a new Strands instrumentation package that converts Strands OTEL spans to OpenInference, plus GenAI semconv constants and CI/test wiring.
instrumentation/openinference-instrumentation-strands):StrandsToOpenInferenceProcessormaps Strands spans and GenAI attrs to OpenInference (AGENT/CHAIN/TOOL/LLM, messages, tools, token usage, graph node attrs, invocation params, metadata).pyproject.toml,README.md,CHANGELOG.md,LICENSE.examples/*(basic, streaming) andtests/*with mock spans.openinference-instrumentation):gen_ai_attributes.pyre-exporting OTEL GenAI semconv (incubating) and utility constants; export via__init__.opentelemetry-semantic-conventions>=0.60b0; bump version to0.1.43.tox.iniwithstrandsenvs and steps; ensure local packages are (re)installed during CI.opentelemetry-semantic-conventionsfor local/CI.Written by Cursor Bugbot for commit 8d142c2. This will update automatically on new commits. Configure here.