Skip to content

Add ADR-0006: Error monitoring uses Datadog-native capabilities#6

Open
grigarr wants to merge 1 commit intomainfrom
adr-0006-error-monitoring-datadog-native
Open

Add ADR-0006: Error monitoring uses Datadog-native capabilities#6
grigarr wants to merge 1 commit intomainfrom
adr-0006-error-monitoring-datadog-native

Conversation

@grigarr
Copy link

@grigarr grigarr commented Mar 18, 2026

Summary

Proposes using Datadog Error Tracking, Watchdog, and Workflows for automated error detection and triage instead of deploying the custom lfx-error-monitor service.

CloudWatch logs already flow to Datadog — a separate service that polls CloudWatch, runs logs through Claude/LiteLLM, and maintains its own Postgres state duplicates an existing pipeline.

Related PRs to discuss before merge:

  • linuxfoundation/lfx-error-monitor#5
  • linuxfoundation/lfx-v2-argocd#405
  • linuxfoundation/lfx-v2-opentofu#131

🤖 Generated with Claude Code

@grigarr grigarr requested a review from emsearcy as a code owner March 18, 2026 03:14
Copilot AI review requested due to automatic review settings March 18, 2026 03:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Architecture/Engineering Decision Record documenting the intent to standardize error monitoring and automated triage on Datadog-native features (Error Tracking, Watchdog, Workflows) instead of deploying a separate lfx-error-monitor service that duplicates the existing CloudWatch→Datadog log pipeline.

Changes:

  • Introduces ADR-0006 describing Datadog-native error detection, grouping, and workflow automation as the required approach.
  • Explicitly prohibits deploying custom services that independently poll CloudWatch logs when those logs are already forwarded to Datadog.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Proposes using Datadog Error Tracking, Watchdog, and Workflows for
automated error detection and triage instead of deploying a custom
lfx-error-monitor service. CloudWatch logs already flow to Datadog,
making a separate polling service redundant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Rudy Grigar <rgrigar@linuxfoundation.org>
@thelinuxfoundation thelinuxfoundation force-pushed the adr-0006-error-monitoring-datadog-native branch from 8fba5c4 to 4643d0d Compare March 18, 2026 03:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants