Task telemetry and usage rollups for parent-child agent delegation #147
astrobot-houston
started this conversation in
Feature Request
Replies: 1 comment
-
|
looking at this after current main, i think the concept is right but the noun should maybe be Flue’s the repo terminology now reads: and the runtime already emits so maybe task telemetry should enrich that existing operation lineage instead of introducing
then artifact channels can later point at if current operation events are missing fields, i’d strengthen them rather than add a parallel work-record layer. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Task Telemetry and Usage Rollups
Status: Draft proposal
What Hurts
When a parent agent asks a child agent to do work, Flue should not lose the receipt.
Today we can see that a task ran. We cannot reliably answer the simple follow-up questions: which model did the child use, how long did it take, how many tokens did it spend, what did it cost, and does the parent response include that child work?
That is the problem. Delegation looks cheap when the bill is hidden.
What Other Harnesses Usually Do
What Flue Can Do Better
Flue is at the right layer to fix this. It is the runtime that creates the child session. It knows the parent, the task id, the role, the cwd, the model, the duration, the usage, and the outputs produced by that work.
So Flue should record that once, in the same shape no matter which model provider is used. The primitive is not a log line. It is a small work record: identity, parentage, timing, usage, and optional outputs. The runtime should own causality and accounting. Dashboards and billing tools can plug in later.
What Stays Pluggable
The core contract stays small: normalized task events,
PromptUsage,PromptModel, timing metadata, and parent-child session ids. Everything else can plug in around that contract:flue run;Observability vendors, dashboards, and budget policies remain replaceable.
Shared Primitive: Work Records
Task telemetry and artifact channels should meet at one tiny runtime primitive: a work record.
A task is a work unit. A model call is a work unit. A tool call can be a work unit. An artifact is an output of a work unit. Usage and artifacts are therefore facets of the same causality chain, not two separate reporting systems.
The primitive stack stays boring on purpose:
FlueWorkRefgives identity and causality, task telemetry adds the usage facet, artifact channels add the output facet, and CLI/SSE/tracing/FinOps adapters consume the same normalized facts.FlueWorkRefPromptUsageArtifactRefFlueEventMetadataTaskToolResultDetailsFor this proposal,
task_start,task_end, andTaskToolResultDetailsshould carryworkIdandparentWorkId. That is enough for TokenOps and FinOps to group direct usage, child usage, failed work, and later artifact outputs without inventing a second identity system.Primitive Invariants
workIdis stable for one runtime operation across events, task details, and any artifacts produced by that operation.parentWorkIdpoints to the operation that caused this one. It is a correlation key, not a mandate to build a full tracing backend in v1.taskIdremains the task-facing id.workIdis the cross-feature id that can also describe prompt, skill, and tool work.Goals
tasktool.flue run.Non-Goals
flue inspectcommand.Those are natural follow-ups, but this proposal is the first telemetry layer they would build on.
What Changes From Today
workId,parentWorkId,sessionId,parentSessionId, andtaskId.tasktool results do not expose a stable accounting payload.TaskToolResultDetailsbecomes the parent-facing receipt for task usage, model, duration, and optional output refs.Proposed Event Types
The current task events are:
The proposed enriched events are:
These events still receive the existing shared event fields:
For
task_startandtask_end,workIdandworkKind: 'task'should be present. The fields stay optional on the generic event envelope only because not every existing event has work identity yet.Task Tool Details
When the built-in
tasktool completes, the tool result should expose the same telemetry in its details payload:This gives raw event consumers the information even before the parent prompt finishes. If artifact channels are enabled, the same details payload can include bounded
ArtifactRefsummaries for files the child task published. The usage rollup says what the delegated work cost; artifact refs say what durable outputs came from it.Event Sequence
The useful join is intentionally small:
If artifact channels are not installed yet, the same telemetry sequence still works. The
workIdfield is the compatibility anchor for adding artifact outputs later.Usage Rollup Semantics
There are two task entry points with different accounting expectations.
Direct
session.task()When user code calls
session.task()directly, the returned child response already has its own usage.No parent prompt usage should be modified, because there is no enclosing model turn that caused the delegation.
Built-In
taskToolWhen the parent model invokes the built-in
tasktool duringprompt()orskill(), the parent response should include:This makes the returned usage describe the actual cost of that single parent call.
Nested tasks should roll up one level at a time. If child A invokes child B, child A's response usage includes child B once. The parent then adds child A's usage once. The parent should not separately walk child B again.
TokenOps and FinOps
Task telemetry is also the lowest useful accounting unit for TokenOps and FinOps.
At the TokenOps layer, enriched task events let users understand token burn by:
At the FinOps layer, the same data supports cost attribution and governance:
This proposal does not add a FinOps dashboard or budget controls. It makes sure the first telemetry contract preserves enough structure for those features to be built later without changing the meaning of task usage.
CLI Rendering
flue runshould render task events in the same compact spirit as tool events.Task start:
Task success:
Task error:
The CLI should omit unavailable fields rather than printing zero placeholders. Descriptions or prompts should be truncated for log readability.
Implementation Shape
Likely change points:
packages/sdk/src/types.tsFlueWorkRefidentity shape.FlueEventtask event variants.packages/sdk/src/agent.tsTaskToolResultDetails.packages/sdk/src/session.tsworkIdandparentWorkIdfor task calls.runTask().usageandmodelfromPromptResponseandPromptResultResponse.aggregateUsageSince(...).packages/cli/bin/flue.tstask_startandtask_end.Landing Order
The two PRs should not race to create different contracts.
FlueWorkRef, event metadata, and task details withworkId. Theartifactsfield can wait until artifact channels exist.FlueWorkRefandproducer.workIdon artifact records. Usage joins can wait until task telemetry exists.FlueWorkRefshould be defined once in the SDK types module and imported by both feature implementations.Forward Compatibility and Cost
This should be additive to the current task surface. Existing
taskId,sessionId, and event names stay intact. New metadata fields can be optional on the generic event envelope while becoming required for the new task telemetry variants. That gives older consumers a soft landing and gives new consumers a reliable join key.The cognitive cost should stay low for ordinary users: they see clearer CLI lines and more accurate
usage. Advanced consumers learn one new noun,workId, when they need tracing, rollups, or artifact joins.The runtime cost is intentionally small:
tasktool calls;The feature should not make failed or nested work ambiguous. Failed tasks may omit usage if none is available, but they should still emit timing and work identity. Nested rollups should only add direct child usage at each level so totals do not double count.
Acceptance Criteria
A v1 implementation is ready when:
task_startandtask_endevents include work identity, task identity, timing, model, and usage when available;tasktool results expose the same work identity and telemetry inTaskToolResultDetails;session.task()calls still return child usage without mutating unrelated parent usage;flue runrenders task start/end lines without dumping large results or file contents.Open Questions
descriptionbe added to the publicsession.task()options, or only remain a field from the built-in task tool parameters?usage?resultfield to remain freeform, or should it get a narrower text/error shape in a later API cleanup?Original implementation from #107 by @ketankhairnar
Beta Was this translation helpful? Give feedback.
All reactions