-
Notifications
You must be signed in to change notification settings - Fork 59
feat: Filtering by the presence of a value in an array #873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I'm not sure if I understand the functionality of this issue correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds a new array_contains filter to VictoriaLogs, enabling users to filter log entries where a JSON array field contains a specific value. The implementation follows the established patterns in the codebase for filter implementations.
Key Changes
- Adds new
array_containsfilter with support for matching values within JSON arrays - Implements efficient filtering using dictionary-based optimization for repeated values
- Includes comprehensive unit tests covering various edge cases (empty arrays, mixed types, non-array values)
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
lib/logstorage/parser.go |
Integrates the new array_contains filter into the parser, adding it to both the filter selection logic and reserved keywords list |
lib/logstorage/filter_array_contains.go |
Implements the core filter logic with methods for row matching, block result filtering, and block search optimization using JSON parsing |
lib/logstorage/filter_array_contains_test.go |
Provides unit tests for the matchArrayContains function and filter behavior across different column types (const, dict, strings) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Tricky cases | ||
| f(`["foo bar"]`, "foo", false) // partial match | ||
| f(`["foobar"]`, "foo", false) // partial match | ||
| f(`["foo"]`, "fo", false) // partial match | ||
| } |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests should include edge cases for nested arrays and objects to document the expected behavior. Currently, the implementation skips array and object elements (line 166 uses continue for default case), but this behavior isn't explicitly tested.
Consider adding test cases like:
// Nested structures
f(`[["foo"]]`, "foo", false) // nested array - should not match
f(`[{"key":"foo"}]`, "foo", false) // object in array - should not match
f(`[null, "foo", null]`, "foo", true) // null values - should match "foo"This helps document the intended behavior and prevents regressions.
| // TODO: optimize by avoiding full JSON parsing. | ||
| // Maybe using fastjson.Parser pool? | ||
| // For now, use a temporary parser. | ||
| var p fastjson.Parser | ||
| v, err := p.Parse(s) | ||
| if err != nil { | ||
| return false | ||
| } |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The matchArrayContains function creates a new fastjson.Parser on every call, which can cause performance issues when filtering large datasets. The codebase already uses fastjson.ParserPool (see lib/logstorage/pipe_unroll.go:288) for efficient parser reuse.
Consider creating a global fastjson.ParserPool variable similar to the pattern used in pipe_unroll.go:
var jsonParserPool fastjson.ParserPoolThen modify matchArrayContains to use the pool:
p := jsonParserPool.Get()
defer jsonParserPool.Put(p)
v, err := p.Parse(s)This avoids repeated allocations and improves performance for frequently-called filter operations.
| // TODO: optimize by avoiding full JSON parsing. | ||
| // Maybe using fastjson.Parser pool? | ||
| // For now, use a temporary parser. |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment should be removed or addressed before merging, as it refers to the optimization already suggested (using fastjson.ParserPool). If the pool optimization is implemented, this TODO becomes obsolete. If not implemented now, consider creating a GitHub issue to track this improvement instead of leaving a TODO in the code.
| // TODO: optimize by avoiding full JSON parsing. | |
| // Maybe using fastjson.Parser pool? | |
| // For now, use a temporary parser. |
| if !strings.Contains(s, value) { | ||
| return false | ||
| } | ||
|
|
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an extra blank line at line 130 that should be removed to maintain consistency with the codebase style.
| case lex.isKeyword("array_contains"): | ||
| return parseFilterArrayContains(lex, fieldName) |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new array_contains filter is missing parser tests. All similar filters (e.g., contains_any, contains_all, ipv4_range) have corresponding parser tests in parser_test.go (see lines 866-924). These tests verify that the parser correctly creates the filter with expected field names and values.
Consider adding a test function like:
func TestParseFilterArrayContains(t *testing.T) {
f := func(s, fieldNameExpected, valueExpected string) {
t.Helper()
q, err := ParseQuery(s)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
fa, ok := q.f.(*filterArrayContains)
if !ok {
t.Fatalf("unexpected filter type; got %T; want *filterArrayContains; filter: %s", q.f, q.f)
}
if fa.fieldName != fieldNameExpected {
t.Fatalf("unexpected fieldName; got %q; want %q", fa.fieldName, fieldNameExpected)
}
if fa.value != valueExpected {
t.Fatalf("unexpected value; got %q; want %q", fa.value, valueExpected)
}
}
f(`array_contains("foo")`, `_msg`, "foo")
f(`tags:array_contains("prod")`, `tags`, "prod")
f(`array_contains(123)`, `_msg`, "123")
}| // functions | ||
| "contains_all", | ||
| "contains_any", | ||
| "array_contains", |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The array_contains filter is missing serialization tests in the main query parsing test. All similar filters have String() serialization tests (see lines 1579-1608 for contains_any and contains_all). These tests verify that the filter correctly serializes back to LogsQL format.
Consider adding tests like:
// array_contains filter
f(`array_contains(foo)`, `array_contains(foo)`)
f(`array_contains("foo bar")`, `array_contains("foo bar")`)
f(`tags:array_contains(prod)`, `tags:array_contains(prod)`)
f(`array_contains(123)`, `array_contains(123)`)These should be added in the appropriate section of the test that starts around line 1579.
006f77b to
8940cc1
Compare
Signed-off-by: cancaicai <[email protected]>
8940cc1 to
2ee8354
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Describe Your Changes
close: #853
Checklist
The following checks are mandatory: