Skip to content

feat(datadog_agent source): accept LLMObs telemetry at /api/v2/llmobs#25636

Open
ronitanilkumar wants to merge 3 commits into
vectordotdev:masterfrom
ronitanilkumar:feat/llmobs-endpoint
Open

feat(datadog_agent source): accept LLMObs telemetry at /api/v2/llmobs#25636
ronitanilkumar wants to merge 3 commits into
vectordotdev:masterfrom
ronitanilkumar:feat/llmobs-endpoint

Conversation

@ronitanilkumar

Copy link
Copy Markdown

Summary

Closes #25441

The Datadog LLMObs SDK sends span events to /api/v2/llmobs. The datadog_agent source had no handler for this route, so every request returned a 404 error. This PR registers the route, parses the JSON payload, and emits each span as a Log event.

When multiple_outputs is enabled, events are routed to the llmobs output port and can be referenced as <component_id>.llmobs in downstream transforms and sinks. When multiple_outputs is disabled, events flow to the default output alongside logs.

LLMObs events are modeled as Log events rather than Trace events because Vector's Event::Trace variant is coupled to APM protobuf semantics. Using Log gives users full flexibility to route and transform LLMObs spans with existing sinks and VRL.

A disable_llmobs config field (default: false) follows the same opt-out convention as disable_logs, disable_metrics, and disable_traces.

Vector configuration

sources:
  agent:
    type: datadog_agent
    address: "0.0.0.0:8181"
    store_api_key: true
    multiple_outputs: true

sinks:
  llmobs_out:
    type: console
    inputs:
      - agent.llmobs
    encoding:
      codec: json

How did you test this PR?

  • Added unit tests covering: valid payload parsing, ml_app extraction
    from span._dd.ml_app, API key propagation into event metadata,
    empty span arrays, and invalid JSON rejection.
  • Ran cargo test -p vector --lib sources::datadog_agent. 43 passed,
    0 failed.
  • Ran cargo vdev check events, make check-clippy, make check-fmt,
    and make check-generated-docs locally. All were clean.
  • Manually ran Vector locally and POSTed a sample SDK-format payload to
    /api/v2/llmobs. Confirmed a Log event appeared on the llmobs
    output with all expected fields present (span_id, trace_id,
    ml_app, meta, metrics, tags, status).

Example curl:

curl -X POST http://localhost:8181/api/v2/llmobs \
  -H "Content-Type: application/json" \
  -H "dd-api-key: test1234test1234test1234test1234" \
  -d '[{"event_type":"span","_dd.tracer_version":"2.17.0","spans":[{"span_id":"abc123","trace_id":"xyz789","name":"my.workflow","start_ns":1707763310981223236,"duration":12345678900,"status":"ok","meta":{"span":{"kind":"llm"},"model_name":"gpt-4"},"metrics":{"input_tokens":64},"tags":["env:prod"],"_dd":{"ml_app":"my-llm-app"}}]}]'

Output:

{"_dd":{"tracer_version":"2.17.0"},"duration":12345678900,"meta":{"model_name":"gpt-4","span":{"kind":"llm"}},"metrics":{"input_tokens":64},"ml_app":"my-llm-app","name":"my.workflow","span_id":"abc123","start_ns":1707763310981223236,"status":"ok","tags":["env:prod"],"trace_id":"xyz789"}

Change Type

  • New feature

Is this a breaking change?

  • No

Does this PR include user facing changes?

  • Yes. Changelog fragment added at changelog.d/25441_llmobs_endpoint.feature.md.

References

@ronitanilkumar ronitanilkumar requested review from a team as code owners June 15, 2026 21:17
@github-actions github-actions Bot added docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. domain: sources Anything related to the Vector's sources domain: external docs Anything related to Vector's external, public documentation labels Jun 15, 2026
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@ronitanilkumar

Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca3999197a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

#[derive(Deserialize)]
struct LLMObsSpan {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve optional LLMObs span fields

When an SDK emits fields such as span_links, config, or collection_errors, Serde silently ignores them because they are absent from this struct; the reconstruction below also retains only ml_app from the _dd object. These fields are part of current LLMObs span events, so linked-span relationships, experiment configuration, and collection errors are irreversibly lost before downstream transforms can inspect them.

Useful? React with 👍 / 👎.

.flat_map(|item| {
let tracer_version = item.dd_tracer_version.clone();
item.spans.into_iter().map(move |span| {
let mut log = LogEvent::default();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add standard source metadata to LLMObs logs

Every emitted LLMObs log is created without calling insert_standard_vector_source_metadata, unlike logs::decode_log_body. Consequently these events lack the configured source_type and ingest timestamp in both legacy and Vector log namespaces, breaking pipelines and observability logic that rely on the standard metadata supplied by this source's other log events.

Useful? React with 👍 / 👎.

Comment on lines +154 to +156
.collect();

Ok(events)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Report received LLMObs events

After decoding succeeds, this function returns the events without emitting through source.events_received, while the log, metric, and trace decoders all emit CountByteSize through that registered handle. Deployments receiving LLMObs traffic will therefore under-report component_received_events_total and received-event byte metrics, potentially showing zero received events for an LLMObs-only source despite successfully forwarding data.

Useful? React with 👍 / 👎.

Comment on lines +345 to +346
if !self.disable_llmobs {
output.push(SourceOutput::new_maybe_logs(DataType::Log, llmobs_definition).with_port(LLMOBS))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Define an LLMObs-specific output schema

When multiple_outputs and schema.enabled are both enabled, the llmobs port advertises a clone of the ordinary log decoder schema, which can require fields such as message and does not declare LLMObs fields such as span_id, trace_id, or meta. Schema-aware VRL compilation and sink validation will therefore reason about the wrong event shape; this port needs its own definition matching the events produced by decode_llmobs_body.

Useful? React with 👍 / 👎.

body: Bytes,
api_key: Option<Arc<str>>,
) -> Result<Vec<Event>, ErrorMessage> {
let envelope: Vec<LLMObsEnvelopeItem> = serde_json::from_slice(&body).map_err(|error| {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Accept the SDK's object envelope

Real dd-trace-py payloads, including the referenced 2.17 release, serialize one object shaped like {"_dd.stage":"raw","event_type":"span","spans":[...]}, whereas this deserializes only a top-level JSON array. Posting an SDK-generated payload to the newly registered endpoint therefore fails with HTTP 400 before any span is emitted; the array-shaped fixtures and curl example do not match the SDK wire format.

Useful? React with 👍 / 👎.

Comment on lines +19 to +20
warp::post()
.and(path!("api" / "v2" / "llmobs" / ..))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Register the EVP proxy route used in agent mode

When an LLMObs SDK is configured to send through its Datadog Agent, it posts spans to /evp_proxy/v2/api/v2/llmobs (for example, dd-trace-py defines this as its proxied endpoint), not directly to /api/v2/llmobs. This filter only matches the direct intake path, so standard agent-mode clients pointed at Vector still receive a 404 and the feature only works after a nonstandard endpoint override.

Useful? React with 👍 / 👎.

Comment on lines +134 to +140
if let Some(ml_app) = span
.dd
.as_ref()
.and_then(|dd| dd.get("ml_app"))
.and_then(|v| v.as_str())
{
log.insert("ml_app", ml_app.to_owned());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Extract ml_app from SDK tags

For dd-trace-py 2.17 payloads, ml_app is encoded in the span's tags array as ml_app:<value> and there is no span-level _dd.ml_app object, so this branch never inserts the advertised top-level ml_app field. After accepting that SDK's envelope, otherwise valid Python events will therefore differ from the documented output and from the test fixture unless ml_app is also recovered from the tags.

Useful? React with 👍 / 👎.

Comment on lines +113 to +115
if let Some(v) = span.start_ns {
log.insert("start_ns", v);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the span start time as the log timestamp

LLMObs spans carry their actual event time in nanoseconds via start_ns, but the emitted log only stores that value as an ordinary integer and never assigns it the log timestamp meaning. Once standard source metadata is added, these events will be timestamped at Vector ingestion time instead, so delayed or buffered spans are written to time-aware log sinks at the wrong time; convert start_ns and use it as the event timestamp.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca3999197a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

#[derive(Deserialize)]
struct LLMObsSpan {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve optional LLMObs span fields

When an SDK emits fields such as span_links, config, or collection_errors, Serde silently ignores them because they are absent from this struct; the reconstruction below also retains only ml_app from the _dd object. These fields are part of current LLMObs span events, so linked-span relationships, experiment configuration, and collection errors are irreversibly lost before downstream transforms can inspect them.

Useful? React with 👍 / 👎.

.flat_map(|item| {
let tracer_version = item.dd_tracer_version.clone();
item.spans.into_iter().map(move |span| {
let mut log = LogEvent::default();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add standard source metadata to LLMObs logs

Every emitted LLMObs log is created without calling insert_standard_vector_source_metadata, unlike logs::decode_log_body. Consequently these events lack the configured source_type and ingest timestamp in both legacy and Vector log namespaces, breaking pipelines and observability logic that rely on the standard metadata supplied by this source's other log events.

Useful? React with 👍 / 👎.

Comment on lines +154 to +156
.collect();

Ok(events)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Report received LLMObs events

After decoding succeeds, this function returns the events without emitting through source.events_received, while the log, metric, and trace decoders all emit CountByteSize through that registered handle. Deployments receiving LLMObs traffic will therefore under-report component_received_events_total and received-event byte metrics, potentially showing zero received events for an LLMObs-only source despite successfully forwarding data.

Useful? React with 👍 / 👎.

Comment on lines +345 to +346
if !self.disable_llmobs {
output.push(SourceOutput::new_maybe_logs(DataType::Log, llmobs_definition).with_port(LLMOBS))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Define an LLMObs-specific output schema

When multiple_outputs and schema.enabled are both enabled, the llmobs port advertises a clone of the ordinary log decoder schema, which can require fields such as message and does not declare LLMObs fields such as span_id, trace_id, or meta. Schema-aware VRL compilation and sink validation will therefore reason about the wrong event shape; this port needs its own definition matching the events produced by decode_llmobs_body.

Useful? React with 👍 / 👎.

body: Bytes,
api_key: Option<Arc<str>>,
) -> Result<Vec<Event>, ErrorMessage> {
let envelope: Vec<LLMObsEnvelopeItem> = serde_json::from_slice(&body).map_err(|error| {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Accept the SDK's object envelope

Real dd-trace-py payloads, including the referenced 2.17 release, serialize one object shaped like {"_dd.stage":"raw","event_type":"span","spans":[...]}, whereas this deserializes only a top-level JSON array. Posting an SDK-generated payload to the newly registered endpoint therefore fails with HTTP 400 before any span is emitted; the array-shaped fixtures and curl example do not match the SDK wire format.

Useful? React with 👍 / 👎.

Comment on lines +19 to +20
warp::post()
.and(path!("api" / "v2" / "llmobs" / ..))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Register the EVP proxy route used in agent mode

When an LLMObs SDK is configured to send through its Datadog Agent, it posts spans to /evp_proxy/v2/api/v2/llmobs (for example, dd-trace-py defines this as its proxied endpoint), not directly to /api/v2/llmobs. This filter only matches the direct intake path, so standard agent-mode clients pointed at Vector still receive a 404 and the feature only works after a nonstandard endpoint override.

Useful? React with 👍 / 👎.

Comment on lines +134 to +140
if let Some(ml_app) = span
.dd
.as_ref()
.and_then(|dd| dd.get("ml_app"))
.and_then(|v| v.as_str())
{
log.insert("ml_app", ml_app.to_owned());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Extract ml_app from SDK tags

For dd-trace-py 2.17 payloads, ml_app is encoded in the span's tags array as ml_app:<value> and there is no span-level _dd.ml_app object, so this branch never inserts the advertised top-level ml_app field. After accepting that SDK's envelope, otherwise valid Python events will therefore differ from the documented output and from the test fixture unless ml_app is also recovered from the tags.

Useful? React with 👍 / 👎.

Comment on lines +113 to +115
if let Some(v) = span.start_ns {
log.insert("start_ns", v);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the span start time as the log timestamp

LLMObs spans carry their actual event time in nanoseconds via start_ns, but the emitted log only stores that value as an ordinary integer and never assigns it the log timestamp meaning. Once standard source metadata is added, these events will be timestamped at Vector ingestion time instead, so delayed or buffered spans are written to time-aware log sinks at the wrong time; convert start_ns and use it as the event timestamp.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. domain: external docs Anything related to Vector's external, public documentation domain: sources Anything related to the Vector's sources

Projects

None yet

Development

Successfully merging this pull request may close these issues.

datadog_agent source should accept LLMObs telemetry

1 participant