-
Notifications
You must be signed in to change notification settings - Fork 90
enhancement(match_datadog_query): add is_phrase flag to equals method #1334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Update the Filter trait's equals signature to include an `is_phrase` boolean flag. Without that information there is no way to distinguish between phrased and non-phrased queries.
src/datadog/filter/filter.rs
Outdated
&self, | ||
field: Field, | ||
to_match: &str, | ||
is_phrase: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which implementation needs this bool
?
Also, please avoid passing flags if possible e.g. we can introduce a new phrase_equals
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for matching on default fields (typically message
) in datadog. The matching behavior for phrases is different than non-phrased. In a phrase the tokens need to be in the same order. E.g.
Hello nice world
=> "Hello world"
no match
Hello nice world
=> Hello world
matches
Btw. there is an alternative solution where the tokenizer would emit multiple tokens.
Hello world
would expand to the equivalent of_default_:Hello
AND _default_:World
.
"Hello world"
would expand to the equivalent of_default_:"Hello World"
I considered adding a new method phrase_equals, but wasn't sure, since this behavior applies only to default fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to introduce phrase
fn callback
Thanks @PSeitz, this looks better now. Correct me if I am wrong, but this a breaking change? Since the filter matcher is now stricter. |
Field::Default(_) => { | ||
let re = word_regex(phrase); | ||
Ok(resolve_value( | ||
buf, | ||
Run::boxed(move |value| match value { | ||
Value::Bytes(val) => re.is_match(&String::from_utf8_lossy(val)), | ||
_ => false, | ||
}), | ||
)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from the condition in equals
below? It looks word-for-word identical but maybe I missed something. I'd also like to see some unit test cases that demonstrate the different behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same, since I didn't know the impact of the change, so currently it's a non-breaking behavior.
Currently the non-phrase behavior is also behaving incorrectly, they both would need to be adjusted to include tokenization (and probably some adaption on the query parser)
I implemented it partially here https://github.com/DataDog/pomsky/pull/79. For full compatibility I'd need to have a closer look on how percolation tokenizes.
I think yes, but nothing disruptive with what's done in datadog (we want to get more in line with the datadog log explorer behaviour) |
Summary
Update the Filter trait's equals signature to include an
is_phrase
boolean flag.Without that information there is no way to distinguish between phrased and non-phrased queries. They behave differently in datadog on default fields
This is for matching on default fields (typically
message
) in datadog. The matching behavior for phrases is different than non-phrased. In a phrase the tokens need to be in the same order. E.g.Hello nice world
=>"Hello world"
no matchHello nice world
=>Hello world
matchesAlternative Options
There is an alternative solution where the tokenizer would emit multiple tokens (only for default fields):
Hello world
would expand to the equivalent of_default_:Hello
AND_default_:World
."Hello world"
would expand to the equivalent of_default_:"Hello World"
Change Type
Is this a breaking change?
Does this PR include user facing changes?
our guidelines.
Checklist
run
dd-rust-license-tool write
and commit the changes. More details here.