Add semantic conventions for GenAI agent planning operation#3594
Open
Krishnachaitanyakc wants to merge 1 commit intoopen-telemetry:mainfrom
Open
Add semantic conventions for GenAI agent planning operation#3594Krishnachaitanyakc wants to merge 1 commit intoopen-telemetry:mainfrom
Krishnachaitanyakc wants to merge 1 commit intoopen-telemetry:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Partially addresses #2664
Summary
Adds
planas a newgen_ai.operation.namevalue following the same pattern asexecute_tool-- a dedicated operation name because planning has independent duration, error status, and parent-child structure that justify a distinct span. Zero new attributes; only a new enum member reusing existing attributes.The problem
An agent produces a wrong answer. Was it bad planning (wrong task decomposition) or bad execution (tool returned stale data)? The remediation is different: bad planning needs prompt changes; bad execution needs tool or retrieval fixes.
Without a
planspan, the operator sees this:The first
chatspan might be planning, reasoning, or a direct answer attempt. The operator cannot distinguish these failure modes.With a
planspan:Planning latency, errors, and the LLM call that produced the plan are now isolated under a parent boundary. An operator can filter on
gen_ai.operation.name = plan, set sampling rules on planning spans independently, and immediately see whether planning or execution consumed the time budget.Why a span, not an attribute or grouping primitive
execute_toolgot its own span -- not an attribute onchat-- because tool execution has independent duration, error status, and parent-child structure. Planning has the same properties. A grouping attribute (#3575) correlates sibling spans; aplanspan creates a parent boundary with its own duration. You cannot model a parent as an attribute on its own child.Relationship to
gen_ai.task(#2912)A plan formulates strategy before execution; a task executes assigned work. The plan span is the parent of the planning LLM call and a sibling of the task/tool spans that follow.
Cross-provider evidence
CrewPlanner-- explicit planning phase before task executionSubQuestionQueryEngine-- question decomposition before sub-queriesAgentExecutor._take_next_step()-- deprecated, private APIplanneragent withplan()methodInstrumentation SHOULD only emit
planwhen the framework exposes an explicit planning boundary (see emission rules inspans.yaml).Out of scope
Plan-specific attributes (
strategy,step.count), reflection, and delegation are deferred to follow-up PRs.Reference implementation
AgentTelemetry (PyPI:
agenttelemetry)