Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #102 +/- ##
=======================================
Coverage 99.67% 99.67%
=======================================
Files 153 154 +1
Lines 3070 3108 +38
Branches 737 741 +4
=======================================
+ Hits 3060 3098 +38
Misses 9 9
Partials 1 1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| --env "EDL_PASSWORD=$bamboo_EDL_PASSWORD" \ | ||
| --env "CMR_BASE_URL=$bamboo_CMR_BASE_URL" \ | ||
| --env "BLOCK_PUBLISH_ON_KEYWORD_DIFF_FAILURE=${bamboo_BLOCK_PUBLISH_ON_KEYWORD_DIFF_FAILURE:-false}" \ | ||
| --env "KEYWORD_SYNC_ALARM_EMAILS=${bamboo_KEYWORD_SYNC_ALARM_EMAILS:-}" \ |
There was a problem hiding this comment.
Feels like if we are gonna do this we should use the support email which will go to Zendesk platform. I don't think KMS has a direct one but, we could have one made or otherwise use MMTs at least for prod
There was a problem hiding this comment.
Yes, that makes sense to me.
| namespace: KEYWORD_SYNC_METRIC_NAMESPACE, | ||
| metricName: KEYWORD_EVENT_PUBLISH_FAILURES_METRIC, | ||
| statistic: 'Sum', | ||
| period: cdk.Duration.minutes(5) |
There was a problem hiding this comment.
This might be noisy at 5 minutes
cdk.Duration.days(1)
Shouldn't it match the alarm time anyways?
There was a problem hiding this comment.
Good call. I changed the failure alarm to aggregate over a one-day period instead of five minutes, and pulled the period into a named constant so the intended alarm window is clearer.
| const countKeywordChanges = (keywordChangesMap) => Array.from(keywordChangesMap.values()).reduce( | ||
| (total, changes) => total | ||
| + changes.addedKeywords.size | ||
| + changes.removedKeywords.size | ||
| + changes.changedKeywords.size, | ||
| 0 | ||
| ) | ||
|
|
There was a problem hiding this comment.
| const countKeywordChanges = (keywordChangesMap) => Array.from(keywordChangesMap.values()).reduce( | |
| (total, changes) => total | |
| + changes.addedKeywords.size | |
| + changes.removedKeywords.size | |
| + changes.changedKeywords.size, | |
| 0 | |
| ) | |
| const countKeywordChanges = (keywordChangesMap) => Array.from(keywordChangesMap.values()).reduce( | |
| (total, { addedKeywords, removedKeywords, changedKeywords }) => total | |
| + addedKeywords.size | |
| + removedKeywords.size | |
| + changedKeywords.size, | |
| 0 | |
| ) |
I tried a for of which I thought was easier to read too but, the linter complained
There was a problem hiding this comment.
Good suggestion. Made this change
| await emitPublisherMetricsSafely( | ||
| [ | ||
| { | ||
| metricName: PUBLISHER_METRIC_NAMES.KEYWORD_EVENTS_GENERATED, | ||
| value: keywordEventsGenerated | ||
| } | ||
| ], | ||
| 'keyword event generation' | ||
| ) |
There was a problem hiding this comment.
should we just batch these together to have a single call only?
There was a problem hiding this comment.
Good call. I batched these into a single metric emission after the SNS publish summary is known, so all four keyword sync metrics are sent together in one CloudWatch call.
Overview
What is the feature?
Adds keyword sync observability to publisher by emitting CloudWatch metrics for keyword change detection and SNS keyword event publishing. This also adds a CloudWatch alarm for publish failures and optional email notifications when that alarm fires.
What is the Solution?
This branch adds keyword sync metrics under the
KMS/KeywordSyncnamespace and emits them from the publisher flow during three points in the publish lifecycle:KeywordChangesDetectedKeywordEventsGeneratedKeywordEventsPublishedKeywordEventPublishFailuresThe implementation includes:
cloudwatch:PutMetricDataKeywordEventPublishFailuresKEYWORD_SYNC_ALARM_EMAILSWhat areas of the application does this impact?
Testing
Start LocalStack and local KMS:
npm run localstack:start npm run start-local ./scripts/local/create_unique_keyword.sh curl -X POST 'http://127.0.0.1:3013/publish?name=v1.0.0' aws --endpoint-url=http://localhost:4566 cloudwatch list-metrics --namespace KMS/KeywordSync ./scripts/local/show_keyword_sync_metrics.jsAWS verification
Create or update a draft keyword in the target environment
Publish through the normal AWS flow
In CloudWatch, open KMS/KeywordSync -> Metrics with no dimensions
Verify these metric names appear and receive datapoints:
KeywordChangesDetected
KeywordEventsGenerated
KeywordEventsPublished
KeywordEventPublishFailures
Use Statistic = Sum and a time window that includes the publish run
Alarm verification
If KEYWORD_SYNC_ALARM_EMAILS is configured, confirm SNS email subscriptions are confirmed
You will get a email confirming you want this subscription as well, so look for that.
Checklist