Skip to content

Add capability in query-engine to filter by type of signal#2754

Open
albertlockett wants to merge 7 commits intoopen-telemetry:mainfrom
albertlockett:albert/opl-type-check-part1
Open

Add capability in query-engine to filter by type of signal#2754
albertlockett wants to merge 7 commits intoopen-telemetry:mainfrom
albertlockett:albert/opl-type-check-part1

Conversation

@albertlockett
Copy link
Copy Markdown
Member

@albertlockett albertlockett commented Apr 24, 2026

Change Summary

Adds the capability to use a syntax like is <signal type> in logical expressions in OPL. This can be used for example when we want to have a single program that does different operations to each signal:

signals |
if (is Log) {
 // ...
} else  if (is Metric) {
  // ...
} else if (is Span) {
  // ...
}

Alternatively, we can also do things like this to just keep/drop certain signal types:

signals | where is Log
signals | where not(is Log)

What issue does this PR close?

How are these changes tested?

Unit tests

Are there any user-facing changes?

Yes this syntax is now available for use in transform processor

Future work:

In a followup, I'll add support for checking where a particular field is of some type.

logs | if (attributes["x"] is String) {
  // ...

There are some TODOs in this PR to add support for this in the near future.

Also, when we eventually add more capability to the parser/planner to be type aware, it may have the capability to reject the use of invalid field accesses for certain signal types, and the syntax in this PR will allow users to get around these compile time checks. However, that's not implemented as part of this PR.

@github-actions github-actions Bot added rust Pull requests that update Rust code query-engine Query Engine / Transform related tasks query-engine-columnar Columnar query engine which uses DataFusion to process OTAP Batches opl-parser Work items related to OPL Parser labels Apr 24, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.45%. Comparing base (9c54c8e) to head (1f1c6bc).
⚠️ Report is 7 commits behind head on main.

❌ Your project check has failed because the head coverage (52.45%) is below the target coverage (85.00%). You can increase the head coverage or adjust the target coverage.

❗ There is a different number of reports uploaded between BASE (9c54c8e) and HEAD (1f1c6bc). Click for more details.

HEAD has 7 uploads less than BASE
Flag BASE (9c54c8e) HEAD (1f1c6bc)
8 1
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2754       +/-   ##
===========================================
- Coverage   88.23%   52.45%   -35.78%     
===========================================
  Files         639       89      -550     
  Lines      242568    11222   -231346     
===========================================
- Hits       214018     5887   -208131     
+ Misses      28026     4811    -23215     
  Partials      524      524               
Components Coverage Δ
otap-dataflow ∅ <ø> (∅)
query_abstraction ∅ <ø> (∅)
query_engine ∅ <ø> (∅)
otel-arrow-go 52.45% <ø> (ø)
quiver ∅ <ø> (∅)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@albertlockett albertlockett changed the title Add capability in query-engine to check Add capability in query-engine to filter by type of signal Apr 24, 2026
// IF there are two rules, we have an expression like (is <Type>) meaning we're checking
// that an element of the stream is some type
2 => {
let type_check_expr = ScalarExpression::GetType(GetTypeScalarExpression::new(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really not what GetType was intended for. GetType should return something defined on enum Value. As in the system-type of the thing being accessed. When applied to the "source" it should return Map. We treat the root thing as a "map" even though it is special.

For determining "signal type" why not just introduce a sort of virtual thing? Assuming somewhere you have mapping for "known" things like serverity_text or severity_number couldn't you just return a static if you see signal_type off the root?

Copy link
Copy Markdown
Member Author

@albertlockett albertlockett Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really not what GetType was intended for. GetType should return something defined on enum Value. As in the system-type of the thing being accessed. When applied to the "source" it should return Map. We treat the root thing as a "map" even though it is special.

In Open-Telemetry, a map is a different data type than a Log, a Trace, or a Metric. Map would be for something like attributes. I believe in the OTel context we shouldn't just be treating the root as a 'map', so I don't actually think GetType should return 'map' when the source is one of the root telemetry signals.

For determining "signal type" why not just introduce a sort of virtual thing? Assuming somewhere you have mapping for "known" things like serverity_text or severity_number couldn't you just return a static if you see signal_type off the root?

How would we use what you're suggesting wrt to signal_type? e.g. are you suggesting we have a virtual field on the root, and use it like this?

logs | where signal_type == "Log"

I see some issues with that::

  • a) this seems strange because signal_type isn't part of the Open Telemetry data model
  • b) we'd need to have special handling for this field when it's used outside a filter context (e.g. it can't be assigned).

All that said, I'm open to not using GetType here if it's not being used as you intended. Would you find it acceptable if I just parsed this as into a special external function that the OTAP query engine knew how to interpret?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or were you thinking more like signal_type should be like an internal-only virtual column that is only used in the AST produced by the parser? E.g. internally we'd parse something like is Log into something like this:

LogicalExpression::EqualTo(EqualToLogicalExpression::new(
    SourceScalarExpression::Source(SourceScalarExpression::new(
        query_location,
        ScalarExpression::Source(SourceScalarExpression::new(
            query_location
            ValueAccessor::new_with_selectors(vec![
                StaticScalarExpression::String(StringScalarExpression::new(
                    query_location,
                    "signal_type"
                )),
            ]),
        )),
    )),
    ScalarExpression::Static(
            StaticScalarExpression::String(StringScalarExpression::new(
                to_query_location(&type_name_rule),
                type_name_rule.as_str(),
            )),
     )
)

And then the query-engine just knows to look for this special expression and treat it like checking the signal type?

@CodeBlanch
Copy link
Copy Markdown
Member

CodeBlanch commented Apr 24, 2026

In Open-Telemetry, a map is a different data type than a Log, a Trace, or a Metric. Map would be for something like attributes. I believe in the OTel context we shouldn't just be treating the root as a 'map', so I don't actually think GetType should return 'map' when the source is one of the root telemetry signals.

From OTel perspective I agree with you. Except expression tree doesn't know anything about OTel 😄 GetType in AST means "give me the query type of x." GetType(severity_text) should return String. GetType(severity_number) should be Integer. GetType(attributes['attr1']) would be whatever type is resolved at runtime. The idea returning Map for GetType(source) is it best represents the root thing. Not a perfect fit for sure.

Would you find it acceptable if I just parsed this as into a special external function that the OTAP query engine knew how to interpret?

signal_type is an OPL thing so if you want to mount it to a special function all good with me.

And then the query-engine just knows to look for this special expression and treat it like checking the signal type?

I'm OK with this too. Seems like heavy lifting having to parse the tree and look for it, but not impossible.

You could also introduce some new expression in AST if you wanted to. GetRecordType() or something along those lines. Doesn't accept any parameters just implicitly uses the "current context" aka "source" aka "the thing being processed currently" and returns a string identifying some friendly name for the root ("log", "metric", "trace", or plural versions is probably OK too).

@albertlockett
Copy link
Copy Markdown
Member Author

From OTel perspective I agree with you. Except expression tree doesn't know anything about OTel 😄 GetType in AST means "give me the query type of x." GetType(severity_text) should return String. GetType(severity_number) should be Integer. GetType(attributes['attr1']) would be whatever type is resolved at runtime. The idea returning Map for GetType(source) is it best represents the root thing. Not a perfect fit for sure.

Could we not consider that GetType is meant to "give me the query type of x." in the context in which this expression is evaluated? If the record set engine considers that returning Map for GetType(source) is the best thing, then that's what it can do. If the OTAP query engine wants knows that its source has definite type (Log/Span/Metric), why could it not return that?

And then the query-engine just knows to look for this special expression and treat it like checking the signal type?

I'm OK with this too. Seems like heavy lifting having to parse the tree and look for it, but not impossible.

You could also introduce some new expression in AST if you wanted to. GetRecordType() or something along those lines.

I'm reticent to add yet another expression type like GetRecordType just for this purpose. Also would this not this have the same issue where, if the record set engine considers the source a Map, this GetRecordType expression would also return Map when it is evaluated over there?

It's not really too complex to parse this signal_type into a special LogicalExpression -- it's basically the same as what this PR is currently doing, I would just be looking for this virtual field name as opposed to looking for the GetType variant of ScalarExpression. I'll actually probably just move forward with this solution so we don't have to change what GetType(source) philosophically represents.

@CodeBlanch
Copy link
Copy Markdown
Member

Could we not consider that GetType is meant to "give me the query type of x." in the context in which this expression is evaluated? If the record set engine considers that returning Map for GetType(source) is the best thing, then that's what it can do. If the OTAP query engine wants knows that its source has definite type (Log/Span/Metric), why could it not return that?

The idea is I can take some language. KQL, OPL, OTTL, MyCustomLang and spit out an expression tree. Then I can take that tree and run it using some engine. I could use KQL with RecordSet or OPL with QueryEngine. But what if I want to take KQL and use QueryEngine or take OPL and use RecordSet? The engines shouldn't have different behavior. If it will only ever be possible to use OPL + QueryEngine and never any other combination fine to define a different behavior for GetType in QueryEngine.

I'm reticent to add yet another expression type like GetRecordType just for this purpose. Also would this not this have the same issue where, if the record set engine considers the source a Map, this GetRecordType expression would also return Map when it is evaluated over there?

If you added GetRecordType I assume it would throw not_supported\todo in RecordSet engine. If we wanted to add support, it wouldn't be too difficult. There is a trait Record (IIRC) in RecordSet it would need some API (fn get_record_type(&self) -> &str) which the engine would use to retrieve the value when it sees a GetRecordType() in the tree.

@albertlockett
Copy link
Copy Markdown
Member Author

Could we not consider that GetType is meant to "give me the query type of x." in the context in which this expression is evaluated? If the record set engine considers that returning Map for GetType(source) is the best thing, then that's what it can do. If the OTAP query engine wants knows that its source has definite type (Log/Span/Metric), why could it not return that?

The idea is I can take some language. KQL, OPL, OTTL, MyCustomLang and spit out an expression tree. Then I can take that tree and run it using some engine. I could use KQL with RecordSet or OPL with QueryEngine. But what if I want to take KQL and use QueryEngine or take OPL and use RecordSet? The engines shouldn't have different behavior. If it will only ever be possible to use OPL + QueryEngine and never any other combination fine to define a different behavior for GetType in QueryEngine.

I'm reticent to add yet another expression type like GetRecordType just for this purpose. Also would this not this have the same issue where, if the record set engine considers the source a Map, this GetRecordType expression would also return Map when it is evaluated over there?

If you added GetRecordType I assume it would throw not_supported\todo in RecordSet engine. If we wanted to add support, it wouldn't be too difficult. There is a trait Record (IIRC) in RecordSet it would need some API (fn get_record_type(&self) -> &str) which the engine would use to retrieve the value when it sees a GetRecordType() in the tree.

Since there's a desire to maintain the consistent behaviour between engines, the safest behaviour will be to add this new GetRecordType as you've described. This is what I will do. Otherwise, record set engine would need to also have logic for the virtual signal_type field.

@github-actions github-actions Bot added query-engine-kql KQL usage of Query Engine query-engine-recordset Reference query engine implementation processing over a set of records labels Apr 27, 2026
@albertlockett albertlockett marked this pull request as ready for review April 27, 2026 17:18
@albertlockett albertlockett requested a review from a team as a code owner April 27, 2026 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

opl-parser Work items related to OPL Parser query-engine Query Engine / Transform related tasks query-engine-columnar Columnar query engine which uses DataFusion to process OTAP Batches query-engine-kql KQL usage of Query Engine query-engine-recordset Reference query engine implementation processing over a set of records rust Pull requests that update Rust code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants