Replies: 1 comment 4 replies
-
|
Hi @addu390, +1 on this. Being able to reconstruct and visualize the trace of a single run is really helpful for observing and understanding agent behavior, and it's exactly the gap that neither metrics nor the event log fills today. I'd suggest splitting this into two fairly independent problems:
Once you split it this way, the overhead concern mostly lands on the first part, and I think that part is actually pretty light. We already record every event in the event log. To reconstruct the causal tree, we basically just need one extra field per event: which action emitted the event. With that, the whole run can be rebuilt from the event log. The reverse edges (which actions an event triggered) don't even need to be recorded explicitly. They can be derived from the action trigger rules plus timestamps. So the increment on the recording side is small, and I think it can be kept separate from the concern about full-fidelity tracing being too heavy at streaming QPS. The second part can be fully on-demand rather than always-on. We only run it when needed, e.g. local debugging, CI/eval, or staging. The rendering form doesn't have to be a tree either; PlantUML or something else would work too, and we can discuss that separately. As long as the recording side captures the necessary info, there's a lot of flexibility in what we do with it later. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Context
Flink Agents today ships three observability surfaces:
event-listeners.That covers aggregates (metrics), audit records (event log), and hooks (listener). What it doesn't cover is reconstructing a single run as a causal tree:
That tree shape is what
LangSmith(and Langfuse, Phoenix, etc) render for debugging, or whereMLflowslots into for batch eval runs (one run = one trace, with metrics, prompts, and outputs tracked across versions of the agent). There is alsoOpenTelemetryGenAI semantic conventions. Makes debugging and battle-testing non-trivial agent workflows tractable.Scope
Not proposing this as a default for production streaming jobs. Full-fidelity per-event tracing at streaming QPS is too heavy, metrics + event log stay the right production defaults.
The question is whether the framework should provide first-class support for tracing where it actually pays off:
In all four, you want a single run rendered end-to-end. Today you piece it together from event-log records.
Open question
Is this worth the framework solving? Curious if others hit this in their own dev loop.
Beta Was this translation helpful? Give feedback.
All reactions