Reference pattern for automatically creating AWS DevOps Agent investigations from CloudWatch alarms using EventBridge, Lambda, and CDK/Terraform.
This pattern enables automatic incident investigation by triggering AWS DevOps Agent webhooks when CloudWatch alarms enter ALARM state. Works with any CloudWatch alarm including those managed by Application Insights.
- Zero alarm modification - EventBridge captures all alarm state changes
- Application Insights compatible - Works with managed alarms
- HMAC v1 authentication - Production-ready security
- Tag-based configuration - Customize per-alarm behavior
- Dry-run mode - Test without consuming investigation quota
- Comprehensive logging - Full audit trail with payloads
- Dual deployment options - CDK (Python) or Terraform
CloudWatch Alarms → EventBridge → Lambda → DevOps Agent Webhook → Investigation
See Architecture Documentation for detailed diagrams and component descriptions.
- Setup:
cd cdk
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp cdk.context.json.example cdk.context.json
# Edit cdk.context.json with your values- Deploy:
cdk bootstrap # First time only
cdk deploySee CDK README for detailed CDK instructions.
- Configure deployment:
cd terraform
cp terraform.tfvars.example terraform.tfvars
# Edit with your webhook credentials and deployment context- Deploy:
terraform init
terraform applyDone! All alarms automatically trigger investigations when entering ALARM state.
See Deployment Guide for detailed scenarios and Customization Guide for advanced configuration.
Provide context about your deployment:
CDK (cdk.context.json):
{
"deployment_name": "production-api",
"deployment_description": "Production API with ECS, RDS, and DynamoDB",
"default_priority": "MEDIUM"
}Terraform (terraform.tfvars):
deployment_name = "production-api"
deployment_description = "Production API with ECS, RDS, and DynamoDB"
default_priority = "MEDIUM"Tag alarms to control behavior:
aws cloudwatch tag-resource \
--resource-arn <ALARM_ARN> \
--tags Key=DevOpsAgentEnabled,Value=true \
Key=DevOpsAgentPriority,Value=HIGH \
Key=DevOpsAgentService,Value=PaymentServiceAvailable Tags:
DevOpsAgentEnabled: "true|false" - Enable/disable webhook for this alarmDevOpsAgentPriority: "HIGH|MEDIUM|LOW" - Override default priorityDevOpsAgentService: "ServiceName" - Custom service name for investigation
Default priority rules (customizable):
- HIGH: RDS CPU, DynamoDB SystemErrors, ALB 4XX, Lambda Errors
- MEDIUM: ECS CPU/Memory, ALB 5XX, NAT Gateway errors
- LOW/DEFAULT: Everything else
Test without creating investigations:
CDK:
{
"dry_run_mode": true
}Terraform:
dry_run_mode = trueUse sample events without triggering alarms:
aws lambda invoke \
--function-name devops-agent-webhook-handler \
--payload file://test-events/eventbridge-alb-4xx.json \
/tmp/response.jsonaws logs filter-log-events \
--log-group-name /aws/lambda/devops-agent-webhook-handler \
--filter-pattern "Full webhook payload"- Architecture - Detailed architecture and data flow
- Deployment Guide - Step-by-step deployment for different scenarios
- Customization Guide - Customize for your alarm types and priorities
├── cdk/ # CDK Python deployment (Option 1)
│ ├── app.py # CDK app entry point
│ ├── stacks/ # CDK stack definitions
│ ├── cdk.json # CDK configuration
│ └── README.md # CDK-specific instructions
├── terraform/ # Terraform deployment (Option 2)
│ ├── main.tf # Provider and data sources
│ ├── lambda.tf # Lambda function and DLQ
│ ├── eventbridge.tf # EventBridge rule for alarm capture
│ ├── sns.tf # SNS topic (optional integration)
│ ├── iam.tf # IAM roles and policies
│ ├── secrets.tf # Secrets Manager for webhook credentials
│ ├── variables.tf # Input variables
│ └── outputs.tf # Stack outputs
├── lambda/ # Lambda function code (shared)
│ ├── handler.py # Main handler (SNS/EventBridge routing)
│ ├── webhook_client.py # HMAC webhook client
│ ├── context_enricher.py # Tag lookup and priority mapping
│ └── alarm_parser.py # Alarm message parser
├── test-events/ # Sample EventBridge events for testing
├── docs/ # Documentation
└── README.md
- Multi-service deployments - ECS, Lambda, RDS, DynamoDB alarms
- Application Insights - Works with managed alarms
- Multi-region - Deploy in each region
- Existing SNS topics - Integrate with current alarm actions
- Custom metrics - Support any CloudWatch namespace
- AWS CLI configured
- Terraform >= 1.0
- Python 3.13 (Lambda runtime)
- DevOps Agent webhook URL and secret
- CloudWatch alarms in your account
Typical monthly cost: <$1
- Lambda: ~$0.20 (assuming 1000 invocations/month)
- EventBridge: Free (included)
- Secrets Manager: $0.40/secret/month
- CloudWatch Logs: ~$0.50 (7-day retention)
MIT-0