Add anomaly detection investigator#47824
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 420edba214
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| from := lastQueried | ||
| to := time.Now() | ||
| lastQueried = to |
There was a problem hiding this comment.
Advance poll cursor only after successful fetch
A transient Events API failure will permanently drop data from the failed interval because lastQueried is moved to to before fetchEvents succeeds. In the current loop, if the call errors once, the next poll starts after that window, so events emitted during the outage are never fetched and can’t be investigated.
Useful? React with 👍 / 👎.
| if runBits { | ||
| accumulated = nil | ||
| batchStart = time.Time{} | ||
| seenIDs = make(map[string]struct{}) | ||
| } |
There was a problem hiding this comment.
Reset accumulated events after dry-run flush
The batch is only cleared when runBits is true, so in the default dry-run mode every idle poll re-flushes the same events and keeps growing the same in-memory batch forever. This contradicts the documented/printed “flush + reset” behavior and makes subsequent outputs include stale events instead of only the next batch.
Useful? React with 👍 / 👎.
| } | ||
|
|
||
| printEvents(events, "one-shot", !runBits) | ||
| if err := triggerInvestigation(apiKey, appKey, events, from, !runBits); err != nil { |
There was a problem hiding this comment.
Respect --to for one-shot investigation time bounds
In one-shot mode, the investigation trigger is called with from only, and the payload end time is later derived from time.Now(), so the user-provided --to bound is ignored. This makes historical replays generate investigations over a much larger window than requested, which can skew analysis and increase noise/cost.
Useful? React with 👍 / 👎.
Go Package Import DifferencesBaseline: 681899c
|
Summary
cmd/anomaly_detection_investigatortool that polls the Datadog Events API for anomaly events emitted by the Q-branch observer (source:agent-q-branch-observer)--from/--to)--run_bits; defaults to dry-run displayTest plan
DD_API_KEY/DD_APP_KEYand run in live mode:go run ./cmd/anomaly_detection_investigator-run_bitsand confirm an investigation URL is returned-run_bits=falseand confirm dry-run output only🤖 Generated with Claude Code