Skip to content

Conversation

@caicancai
Copy link
Contributor

@caicancai caicancai commented Dec 8, 2025

Describe Your Changes

close: #853

image

Checklist

The following checks are mandatory:

@caicancai
Copy link
Contributor Author

I'm not sure if I understand the functionality of this issue correctly.

@caicancai caicancai marked this pull request as draft December 8, 2025 15:51
@caicancai caicancai marked this pull request as ready for review December 8, 2025 16:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a new array_contains filter to VictoriaLogs, enabling users to filter log entries where a JSON array field contains a specific value. The implementation follows the established patterns in the codebase for filter implementations.

Key Changes

  • Adds new array_contains filter with support for matching values within JSON arrays
  • Implements efficient filtering using dictionary-based optimization for repeated values
  • Includes comprehensive unit tests covering various edge cases (empty arrays, mixed types, non-array values)

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
lib/logstorage/parser.go Integrates the new array_contains filter into the parser, adding it to both the filter selection logic and reserved keywords list
lib/logstorage/filter_array_contains.go Implements the core filter logic with methods for row matching, block result filtering, and block search optimization using JSON parsing
lib/logstorage/filter_array_contains_test.go Provides unit tests for the matchArrayContains function and filter behavior across different column types (const, dict, strings)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 42 to 51
// Tricky cases
f(`["foo bar"]`, "foo", false) // partial match
f(`["foobar"]`, "foo", false) // partial match
f(`["foo"]`, "fo", false) // partial match
}
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests should include edge cases for nested arrays and objects to document the expected behavior. Currently, the implementation skips array and object elements (line 166 uses continue for default case), but this behavior isn't explicitly tested.

Consider adding test cases like:

// Nested structures
f(`[["foo"]]`, "foo", false)           // nested array - should not match
f(`[{"key":"foo"}]`, "foo", false)     // object in array - should not match
f(`[null, "foo", null]`, "foo", true)  // null values - should match "foo"

This helps document the intended behavior and prevents regressions.

Copilot uses AI. Check for mistakes.
Comment on lines 136 to 143
// TODO: optimize by avoiding full JSON parsing.
// Maybe using fastjson.Parser pool?
// For now, use a temporary parser.
var p fastjson.Parser
v, err := p.Parse(s)
if err != nil {
return false
}
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matchArrayContains function creates a new fastjson.Parser on every call, which can cause performance issues when filtering large datasets. The codebase already uses fastjson.ParserPool (see lib/logstorage/pipe_unroll.go:288) for efficient parser reuse.

Consider creating a global fastjson.ParserPool variable similar to the pattern used in pipe_unroll.go:

var jsonParserPool fastjson.ParserPool

Then modify matchArrayContains to use the pool:

p := jsonParserPool.Get()
defer jsonParserPool.Put(p)
v, err := p.Parse(s)

This avoids repeated allocations and improves performance for frequently-called filter operations.

Copilot uses AI. Check for mistakes.
Comment on lines 136 to 138
// TODO: optimize by avoiding full JSON parsing.
// Maybe using fastjson.Parser pool?
// For now, use a temporary parser.
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment should be removed or addressed before merging, as it refers to the optimization already suggested (using fastjson.ParserPool). If the pool optimization is implemented, this TODO becomes obsolete. If not implemented now, consider creating a GitHub issue to track this improvement instead of leaving a TODO in the code.

Suggested change
// TODO: optimize by avoiding full JSON parsing.
// Maybe using fastjson.Parser pool?
// For now, use a temporary parser.

Copilot uses AI. Check for mistakes.
if !strings.Contains(s, value) {
return false
}

Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an extra blank line at line 130 that should be removed to maintain consistency with the codebase style.

Suggested change

Copilot uses AI. Check for mistakes.
Comment on lines +2010 to +2011
case lex.isKeyword("array_contains"):
return parseFilterArrayContains(lex, fieldName)
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new array_contains filter is missing parser tests. All similar filters (e.g., contains_any, contains_all, ipv4_range) have corresponding parser tests in parser_test.go (see lines 866-924). These tests verify that the parser correctly creates the filter with expected field names and values.

Consider adding a test function like:

func TestParseFilterArrayContains(t *testing.T) {
	f := func(s, fieldNameExpected, valueExpected string) {
		t.Helper()
		q, err := ParseQuery(s)
		if err != nil {
			t.Fatalf("unexpected error: %s", err)
		}
		fa, ok := q.f.(*filterArrayContains)
		if !ok {
			t.Fatalf("unexpected filter type; got %T; want *filterArrayContains; filter: %s", q.f, q.f)
		}
		if fa.fieldName != fieldNameExpected {
			t.Fatalf("unexpected fieldName; got %q; want %q", fa.fieldName, fieldNameExpected)
		}
		if fa.value != valueExpected {
			t.Fatalf("unexpected value; got %q; want %q", fa.value, valueExpected)
		}
	}

	f(`array_contains("foo")`, `_msg`, "foo")
	f(`tags:array_contains("prod")`, `tags`, "prod")
	f(`array_contains(123)`, `_msg`, "123")
}

Copilot uses AI. Check for mistakes.
// functions
"contains_all",
"contains_any",
"array_contains",
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array_contains filter is missing serialization tests in the main query parsing test. All similar filters have String() serialization tests (see lines 1579-1608 for contains_any and contains_all). These tests verify that the filter correctly serializes back to LogsQL format.

Consider adding tests like:

// array_contains filter
f(`array_contains(foo)`, `array_contains(foo)`)
f(`array_contains("foo bar")`, `array_contains("foo bar")`)
f(`tags:array_contains(prod)`, `tags:array_contains(prod)`)
f(`array_contains(123)`, `array_contains(123)`)

These should be added in the appropriate section of the test that starts around line 1579.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Filtering by the presence of a value in an array

1 participant