Skip to content

enhancement(match_datadog_query): add is_phrase flag to equals method #1334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

PSeitz
Copy link

@PSeitz PSeitz commented Mar 11, 2025

Summary

Update the Filter trait's equals signature to include an is_phrase boolean flag.

Without that information there is no way to distinguish between phrased and non-phrased queries. They behave differently in datadog on default fields

This is for matching on default fields (typically message) in datadog. The matching behavior for phrases is different than non-phrased. In a phrase the tokens need to be in the same order. E.g.
Hello nice world => "Hello world" no match
Hello nice world => Hello world matches

Alternative Options

There is an alternative solution where the tokenizer would emit multiple tokens (only for default fields):
Hello world would expand to the equivalent of_default_:Hello AND _default_:World.
"Hello world" would expand to the equivalent of_default_:"Hello World"

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on
    our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • Our CONTRIBUTING.md is a good starting place.
  • If this PR introduces changes to LICENSE-3rdparty.csv, please
    run dd-rust-license-tool write and commit the changes. More details here.
  • For new VRL functions, please also create a sibling PR in Vector to document the new function.

Update the Filter trait's equals signature to include an `is_phrase` boolean flag.

Without that information there is no way to distinguish between phrased and non-phrased queries.
&self,
field: Field,
to_match: &str,
is_phrase: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which implementation needs this bool?
Also, please avoid passing flags if possible e.g. we can introduce a new phrase_equals.

Copy link
Author

@PSeitz PSeitz Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for matching on default fields (typically message) in datadog. The matching behavior for phrases is different than non-phrased. In a phrase the tokens need to be in the same order. E.g.
Hello nice world => "Hello world" no match
Hello nice world => Hello world matches

Btw. there is an alternative solution where the tokenizer would emit multiple tokens.
Hello world would expand to the equivalent of_default_:Hello AND _default_:World.
"Hello world" would expand to the equivalent of_default_:"Hello World"

I considered adding a new method phrase_equals, but wasn't sure, since this behavior applies only to default fields.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to introduce phrase fn callback

@PSeitz PSeitz requested a review from pront March 17, 2025 08:27
@pront pront changed the title enhancement(filter): add is_phrase flag to equals method enhancement(match_datadog_query): add is_phrase flag to equals method Mar 17, 2025
@pront
Copy link
Member

pront commented Mar 17, 2025

Thanks @PSeitz, this looks better now. Correct me if I am wrong, but this a breaking change? Since the filter matcher is now stricter.

Comment on lines +173 to +182
Field::Default(_) => {
let re = word_regex(phrase);
Ok(resolve_value(
buf,
Run::boxed(move |value| match value {
Value::Bytes(val) => re.is_match(&String::from_utf8_lossy(val)),
_ => false,
}),
))
}
Copy link
Member

@bruceg bruceg Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from the condition in equals below? It looks word-for-word identical but maybe I missed something. I'd also like to see some unit test cases that demonstrate the different behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same, since I didn't know the impact of the change, so currently it's a non-breaking behavior.
Currently the non-phrase behavior is also behaving incorrectly, they both would need to be adjusted to include tokenization (and probably some adaption on the query parser)

I implemented it partially here https://github.com/DataDog/pomsky/pull/79. For full compatibility I'd need to have a closer look on how percolation tokenizes.

@20agbekodo
Copy link

Thanks @PSeitz, this looks better now. Correct me if I am wrong, but this a breaking change? Since the filter matcher is now stricter.

I think yes, but nothing disruptive with what's done in datadog (we want to get more in line with the datadog log explorer behaviour)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants