Add capability in query-engine to filter by type of signal#2754
Add capability in query-engine to filter by type of signal#2754albertlockett wants to merge 7 commits intoopen-telemetry:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project check has failed because the head coverage (52.45%) is below the target coverage (85.00%). You can increase the head coverage or adjust the target coverage.
Additional details and impacted files@@ Coverage Diff @@
## main #2754 +/- ##
===========================================
- Coverage 88.23% 52.45% -35.78%
===========================================
Files 639 89 -550
Lines 242568 11222 -231346
===========================================
- Hits 214018 5887 -208131
+ Misses 28026 4811 -23215
Partials 524 524
🚀 New features to boost your workflow:
|
| // IF there are two rules, we have an expression like (is <Type>) meaning we're checking | ||
| // that an element of the stream is some type | ||
| 2 => { | ||
| let type_check_expr = ScalarExpression::GetType(GetTypeScalarExpression::new( |
There was a problem hiding this comment.
This is really not what GetType was intended for. GetType should return something defined on enum Value. As in the system-type of the thing being accessed. When applied to the "source" it should return Map. We treat the root thing as a "map" even though it is special.
For determining "signal type" why not just introduce a sort of virtual thing? Assuming somewhere you have mapping for "known" things like serverity_text or severity_number couldn't you just return a static if you see signal_type off the root?
There was a problem hiding this comment.
This is really not what GetType was intended for. GetType should return something defined on enum Value. As in the system-type of the thing being accessed. When applied to the "source" it should return Map. We treat the root thing as a "map" even though it is special.
In Open-Telemetry, a map is a different data type than a Log, a Trace, or a Metric. Map would be for something like attributes. I believe in the OTel context we shouldn't just be treating the root as a 'map', so I don't actually think GetType should return 'map' when the source is one of the root telemetry signals.
For determining "signal type" why not just introduce a sort of virtual thing? Assuming somewhere you have mapping for "known" things like serverity_text or severity_number couldn't you just return a static if you see signal_type off the root?
How would we use what you're suggesting wrt to signal_type? e.g. are you suggesting we have a virtual field on the root, and use it like this?
logs | where signal_type == "Log"I see some issues with that::
- a) this seems strange because
signal_typeisn't part of the Open Telemetry data model - b) we'd need to have special handling for this field when it's used outside a filter context (e.g. it can't be assigned).
All that said, I'm open to not using GetType here if it's not being used as you intended. Would you find it acceptable if I just parsed this as into a special external function that the OTAP query engine knew how to interpret?
There was a problem hiding this comment.
Or were you thinking more like signal_type should be like an internal-only virtual column that is only used in the AST produced by the parser? E.g. internally we'd parse something like is Log into something like this:
LogicalExpression::EqualTo(EqualToLogicalExpression::new(
SourceScalarExpression::Source(SourceScalarExpression::new(
query_location,
ScalarExpression::Source(SourceScalarExpression::new(
query_location
ValueAccessor::new_with_selectors(vec![
StaticScalarExpression::String(StringScalarExpression::new(
query_location,
"signal_type"
)),
]),
)),
)),
ScalarExpression::Static(
StaticScalarExpression::String(StringScalarExpression::new(
to_query_location(&type_name_rule),
type_name_rule.as_str(),
)),
)
)And then the query-engine just knows to look for this special expression and treat it like checking the signal type?
From OTel perspective I agree with you. Except expression tree doesn't know anything about OTel 😄
I'm OK with this too. Seems like heavy lifting having to parse the tree and look for it, but not impossible. You could also introduce some new expression in AST if you wanted to. |
Could we not consider that
I'm reticent to add yet another expression type like It's not really too complex to parse this |
The idea is I can take some language. KQL, OPL, OTTL, MyCustomLang and spit out an expression tree. Then I can take that tree and run it using some engine. I could use KQL with RecordSet or OPL with QueryEngine. But what if I want to take KQL and use QueryEngine or take OPL and use RecordSet? The engines shouldn't have different behavior. If it will only ever be possible to use OPL + QueryEngine and never any other combination fine to define a different behavior for GetType in QueryEngine.
If you added |
Since there's a desire to maintain the consistent behaviour between engines, the safest behaviour will be to add this new |
Change Summary
Adds the capability to use a syntax like
is <signal type>in logical expressions in OPL. This can be used for example when we want to have a single program that does different operations to each signal:Alternatively, we can also do things like this to just keep/drop certain signal types:
What issue does this PR close?
How are these changes tested?
Unit tests
Are there any user-facing changes?
Yes this syntax is now available for use in transform processor
Future work:
In a followup, I'll add support for checking where a particular field is of some type.
There are some TODOs in this PR to add support for this in the near future.
Also, when we eventually add more capability to the parser/planner to be type aware, it may have the capability to reject the use of invalid field accesses for certain signal types, and the syntax in this PR will allow users to get around these compile time checks. However, that's not implemented as part of this PR.