-
Notifications
You must be signed in to change notification settings - Fork 40
Include Stage name in emitted metrics and dashboards #1488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
otlp: | ||
protocols: | ||
grpc: | ||
endpoint: :4317 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears the default here changed from :4317
to localhost:4317
which broke some of our inter-container communication without moving this back.
@@ -1,5 +1,5 @@ | |||
exporters: | |||
logging: | |||
debug: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to upgrade otel, logging
is deprecated/removed
@@ -1,3 +1,3 @@ | |||
FROM grafana/grafana:11.1.0 | |||
FROM grafana/grafana:11.6.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't needed, i can revert if we want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all tests are passing with this version update, then let's keep it.
@@ -1,4 +1,4 @@ | |||
FROM public.ecr.aws/aws-observability/aws-otel-collector:v0.38.0 | |||
FROM public.ecr.aws/aws-observability/aws-otel-collector:v0.43.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, does this version update have any breaking changes impact our codebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see earlier comment which i adjusted to fix, as well as logging
-> debug
It appears the default here changed from :4317 to localhost:4317 which broke some of our inter-container communication without moving this back.
action: delete | ||
extensions: | ||
health_check: | ||
endpoint: :13133 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will having an endpoint for health checks help us differentiate between an actual request vs health check requests in our kafka logs? If yes, then I think that Capture and Replay Dashboard widget for request/response mismatches should exclude health check requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is otel healthcheck endpoint, will not appear on any dashboards or kafka
'REGION': props.region, | ||
'MA_STAGE': props.stage | ||
'MA_STAGE': props.stage, | ||
'MA_QUALIFIER': props.stage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change this to props.qualifier ?? props.stage
? in case in the future we want to diverge the use-cases of qualifier from stage
exporters: | ||
awsemf: | ||
namespace: 'OpenSearchMigrations' | ||
dimension_rollup_option: NoDimensionRollup # Reduce number of metrics by only publishing with all dimensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this ? Our dashboards will make an assumption that dimensions are guaranteed to be present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only publishes with the most granular dimension, if this was excluded, as an example, we would be publishing the metrics as stage-specific metrics as well as not stage specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this PR as the changes are consistent and all tests appear to be passing.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1488 +/- ##
============================
============================
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
Add stage name to metrics with trace sampling.
Fix dependency ordering with otel collector and service container, fixes otel sdk connection error during task shutdown / startup when the service is running without the collector.
Issues Resolved
https://opensearch.atlassian.net/browse/MIGRATIONS-1949
Testing
Verified metrics with dimension and dashboards in jenkins environment
C&R Dashboard Example:

Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.