-
Notifications
You must be signed in to change notification settings - Fork 45
Live check metrics #728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Live check metrics #728
Conversation
crates/weaver_live_check/src/lib.rs
Outdated
/// A sample metric | ||
Metric(&'a SampleMetric), | ||
/// A sample number data point | ||
NumberDataPoint(&'a NumberDataPoint), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - Is the convention going to be check at the highest level, but higher-level things don't include lower level details?
E.g. NumberDataPoint is part of a Metric and has metric identifying things, but SampleMetric wouldn't?
Also, can you rename to SampledNumberDataPoint
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This area is not quite right yet. A NumberDataPoint without the information from the Metric is not so useful, but I need a way to pass it in around in a consistent way.
Yes, good call on the renaming. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking into the consistency of the framework. For each data_point, I want to be able to provide advice at that level (checking attributes required for example), but I need context from the parent group. I'm hoping to get this change done today. This will change rego so the input is a tuple of (sample,semconv).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed this with docs to explain and a rego example.
# This example shows how to use the registry_group provided in the input.
# If the metric's unit is "By" the value in this data-point must be an integer.
deny contains make_advice(advice_type, advice_level, value, message) if {
input.sample.number_data_point
value := input.sample.number_data_point.value
input.registry_group.unit == "By"
value != floor(value) # not a good type check, but serves as an example
advice_type := "invalid_data_point_value"
advice_level := "violation"
message := "Value must be an integer when unit is 'By'"
}
@@ -40,6 +40,16 @@ deny contains make_advice(advice_type, advice_level, value, message) if { | |||
message := "Does not match name formatting rules" | |||
} | |||
|
|||
# checks metric name format | |||
deny contains make_advice(advice_type, advice_level, value, message) if { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a check to make sure attributes on the metric match attributes defined on the group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already done internally in rust rather than in rego since I assumed this was a fundamental thing. See the attribute_required
advice under each data_point in the screenshot above. This is at the data_point level since the attributes are provided with each point.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #728 +/- ##
=======================================
+ Coverage 76.7% 77.1% +0.4%
=======================================
Files 65 66 +1
Lines 5036 5211 +175
=======================================
+ Hits 3863 4019 +156
- Misses 1173 1192 +19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…check and weaver_checker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR. Adding metrics support will significantly enhance the live-check feature. I’ve made several suggestions. There are certain types we can avoid duplicating (especially the leaf enums). For the other structures, I understand that you’re adding at least one custom field for the results. It’s frustrating to see all this duplicated code, but unfortunately, I don’t see a straightforward alternative solution.
It would also be good to remove instances of &Rc<T>
or Rc<Rc<T>>
, as they seem semantically redundant and not very idiomatic.
crates/weaver_live_check/README.md
Outdated
@@ -202,7 +211,9 @@ These should be self-explanatory, but: | |||
- `no_advice_count` is the number of samples that received no advice | |||
- `seen_registry_attributes` is a record of how many times each attribute in the registry was seen in the samples | |||
- `seen_non_registry_attributes` is a record of how many times each non-registry attribute was seen in the samples | |||
- `registry_coverage` is the fraction of seen registry attributes over the total registry attributes | |||
- `seen_registry_metrics` is a record of how many times each metric in the registry was seen in the samples | |||
- `seen_non_registry_attributes` is a record of how many times each non-registry metric was seen in the samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not that be seen_non_registry_metrics
? The comment talks about metrics and not attributes.
registry_attribute: Option<&Rc<Attribute>>, | ||
registry_group: Option<&Rc<ResolvedGroup>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but having a function signature with Option<&Rc<Attribute>>
feels a bit like having a double reference to something. It would seem more idiomatic to me to just have Option<Rc<Attribute>>
.
registry_attribute: Option<&Rc<Attribute>>, | ||
registry_group: Option<&Rc<ResolvedGroup>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment.
for required_attribute in required_attributes { | ||
// Check if the attribute is present in the sample | ||
if !attributes | ||
.iter() | ||
.any(|attribute| attribute.name == required_attribute.name) | ||
{ | ||
advice_list.push(Advice { | ||
advice_type: "attribute_required".to_owned(), | ||
value: Value::String(required_attribute.name.clone()), | ||
message: "Attribute is required".to_owned(), | ||
advice_level: AdviceLevel::Violation, | ||
}); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's n*m complexity when n is the number of required attributes and m is the number of sample attributes. It might be useful for protecting the live-check function to 1) convert one of these two lists into a map, and 2) check the size of the list coming from the network to avoid any issues.
templates_by_length.push((attribute.name.clone(), attribute.clone())); | ||
let _ = semconv_templates.insert(attribute.name.clone(), attribute.clone()); | ||
templates_by_length | ||
.push((attribute.name.clone(), Rc::clone(&attribute_rc))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.push((attribute.name.clone(), Rc::clone(&attribute_rc))); | |
.push((attribute.name.clone(), attribute_rc.clone())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attribute_rc
is already an Rc
, so I don't see the point of creating an Rc<&Rc<Attribute>>
.
pub enum SampleInstrument { | ||
/// An up-down counter metric. | ||
UpDownCounter, | ||
/// A counter metric. | ||
Counter, | ||
/// A gauge metric. | ||
Gauge, | ||
/// A histogram metric. | ||
Histogram, | ||
/// A summary metric. This is no longer used and will cause a violation. | ||
Summary, | ||
/// Unspecified instrument type. | ||
Unspecified, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the type level, do we have a way to keep in-sync the SampleInstrument and the InstrumentSpec enums? Why not using directly InstrumentSpec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InstrumentSpec does not support Summary, but OTLP does.
I'd recommend creating a wrapper enum where it's either SupportedMetricType(InstrumentSpec)
or UnsupportedMetricType(String)
match value { | ||
GrpcValue::StringValue(string) => Some(Value::String(string)), | ||
GrpcValue::IntValue(int_value) => Some(Value::Number(int_value.into())), | ||
GrpcValue::DoubleValue(double_value) => Some(json!(double_value)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity. Why json!(double_value)
and not Value::Number(double_value.into())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the trait bound
serde_json::Number: std::convert::From<f64>
is not satisfied
the following other types implement traitstd::convert::From<T>
:
serde_json::Number
implementsstd::convert::From<i16>
serde_json::Number
implementsstd::convert::From<i32>
serde_json::Number
implementsstd::convert::From<i64>
serde_json::Number
implementsstd::convert::From<i8>
serde_json::Number
implementsstd::convert::From<isize>
serde_json::Number
implementsstd::convert::From<serde_json::de::ParserNumber>
serde_json::Number
implementsstd::convert::From<u16>
serde_json::Number
implementsstd::convert::From<u32>
and 3 others
required forf64
to implementstd::convert::Into<serde_json::Number>
src/registry/otlp/conversion.rs
Outdated
pub fn span_kind_from_otlp_kind(kind: i32) -> SpanKindSpec { | ||
match kind { | ||
2 => SpanKindSpec::Server, | ||
3 => SpanKindSpec::Client, | ||
4 => SpanKindSpec::Producer, | ||
5 => SpanKindSpec::Consumer, | ||
_ => SpanKindSpec::Internal, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m surprised that a SpanKind enum doesn’t already exist in the code generated by Prost.
In general, when we can reuse existing definitions, we should do so to simplify the maintenance of the code.
src/registry/otlp/conversion.rs
Outdated
if let Some(status) = status { | ||
let code = match status.code { | ||
1 => StatusCode::Ok, | ||
2 => StatusCode::Error, | ||
_ => StatusCode::Unset, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here.
@lquerel - Thank you for your thorough review. I have addressed all your points. Please take another look so we can get this merged. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jeremy for these updates.
Adds support for Metrics to live-check. The nested nature of this data brought about further refactoring internally but also a change to the rego interface.
input
now hassample
andregistry_attribute
orregistry_group
- this allows the policy to work within the nested context. For example, providing some advice on an exemplar in a data-point in a metric by referring to the metric's defined unit. This is documented in the readme and an example policy is included in the tests.Includes support for Exemplars: