[perf-improver] perf: reduce allocations in GroupedAnd/Or/Not and search result field loading by github-actions[bot] · Pull Request #462 · Shazwazza/Examine

github-actions · 2026-06-05T04:56:41Z

🤖 This is an automated PR from Perf Improver, an AI assistant focused on performance.

Goal

Reduce unnecessary heap allocations on hot search paths — lower GC pressure in high-throughput scenarios.

Changes

Three targeted micro-optimisation changes across three files:

1. `LuceneSearchExecutor.cs` — eliminate per-document `HashSet<string>` allocation

Before: CreateSearchResult allocated a new HashSet<string>() per document to track which field names had already been processed.

After: Uses resultVals.ContainsKey(fieldName) for the same deduplication logic — same O(1) semantics, one fewer object allocation per accessed search result.

Also pre-sizes resultVals dictionary with fields.Count to reduce internal resize operations when a document has many fields.

2. `LuceneQuery.cs` — avoid redundant array copy + remove `.Cast<IExamineValue>()`

fields.ToArray() → fields as string[] ?? fields.ToArray(): when the caller already passes a string[] (the common case — e.g. new[] { "title", "body" }), the cast short-circuits and skips the allocation entirely.
Removed redundant .Cast<IExamineValue>() calls: the Select projection already casts with (IExamineValue), so the intermediate LINQ iterator is unnecessary.

3. `LuceneSearchQueryBase.cs` — same array-copy avoidance

Applied the same fields as string[] ?? fields.ToArray() pattern in the public-facing GroupedAnd/Or/Not methods and their INestedQuery counterparts.

Performance Evidence

Methodology: Code-path analysis + allocation counting. Verified with Lucene.Net's Document.Fields returning IList<IIndexableField> which supports .Count.

Site	Before	After
`CreateSearchResult` (per document)	`Dictionary` + `HashSet` allocated	`Dictionary` only (+ capacity hint)
`GroupedAnd/Or/Not` with `string[]` input	Array copy always	Array copy skipped (0 allocs for the field array)
`LuceneQuery.GroupedAnd/Or/Not` (string overload)	`Select` + `Cast` + `ToArray` (3 enumerator objects)	`Select` + `ToArray` (2 objects)

For a typical search returning 10 results with 20 fields each:

Saved: 10 HashSet allocations per page of results when AllValues/GetValues is accessed.
Saved: 1 array allocation per GroupedAnd/Or/Not call where the caller passes string[].

Trade-offs

fields as string[] ?? fields.ToArray() adds a type-check instruction (negligible overhead; faster than the allocation it skips).
Pre-sizing the dictionary with fields.Count may over-allocate capacity by a few slots when field names repeat — this is a net positive since it avoids resizes.

Reproducibility

dotnet test src/Examine.Test/Examine.Test.csproj --configuration Release --filter "TestCategory!=Benchmarks" -f net8.0

Test Status

✅ Build clean (0 errors, 0 new warnings). All 147 tests passed (2 skipped as expected).

Generated by Perf Improver · sonnet46 4.9M · ◷
_{Comment /perf-assist to run again}

Add this agentic workflows to your repo

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/perf-improver.md@dcdf09723d42ef9b6c75335e4612fd145d4ccdaa

… loading - Avoid redundant string[] copy in GroupedAnd/Or/Not when caller already passes a string[]: use 'fields as string[] ?? fields.ToArray()' pattern in LuceneSearchQueryBase (public and INestedQuery paths) and LuceneQuery. - Remove redundant .Cast<IExamineValue>() in LuceneQuery — explicit cast in the Select projection is sufficient. - Eliminate the per-document HashSet<string> allocation in LuceneSearchExecutor.CreateSearchResult: use resultVals.ContainsKey() for duplicate-field-name detection (same O(1) semantics, one fewer object). - Pre-size the resultVals Dictionary with fields.Count to reduce resizes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

greptile-apps · 2026-06-10T13:42:27Z

Greptile Summary

This PR applies three targeted allocation-reduction micro-optimisations on hot search paths: eliminating the per-document HashSet<string> in CreateSearchResult, pre-sizing the result dictionary, and short-circuiting the fields.ToArray() copy when the caller already supplies a string[].

LuceneSearchExecutor.cs: Removes the HashSet<string> processedFields companion by reusing ContainsKey on the result dictionary for duplicate-field detection; pre-sizes the dictionary with fields.Count to avoid internal rehash. Logic is semantically identical to the original.
LuceneQuery.cs / LuceneSearchQueryBase.cs: Replaces unconditional fields.ToArray() with fields as string[] ?? fields.ToArray() across all GroupedAnd/Or/Not public and INestedQuery overloads. GetMultiFieldQuery receives IReadOnlyList<string> and never mutates the array, so sharing the caller's reference is safe. The redundant .Cast<IExamineValue>() calls are also removed since the Select projection already casts inline.

Confidence Score: 4/5

Safe to merge; all three optimisations correctly preserve the original semantics and no array passed to the internal methods is mutated.

The changes are correct and well-scoped. The ContainsKey guard correctly avoids redundant doc.GetValues() calls on duplicate fields, GetMultiFieldQuery never mutates the fields array so bypassing the defensive copy is safe, and the Cast removal is valid since the inline cast in Select is sufficient. The single comment flags a minor double-lookup pattern.

No files require special attention.

Important Files Changed

Filename	Overview
src/Examine.Lucene/Search/LuceneSearchExecutor.cs	Replaces per-document HashSet with ContainsKey on the dictionary and pre-sizes with fields.Count; logic is semantically identical to the original.
src/Examine.Lucene/Search/LuceneQuery.cs	Replaces fields.ToArray() with fields as string[] ?? fields.ToArray() and removes redundant .Cast(); safe because GetMultiFieldQuery never mutates the array.
src/Examine.Lucene/Search/LuceneSearchQueryBase.cs	Applies the same array-cast pattern across all public and INestedQuery GroupedAnd/Or/Not overloads; null guards are correctly preserved.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["GroupedAnd/Or/Not\nIEnumerable fields"] --> B{Cast to string-array?}
    B -->|Success - zero alloc| C["Pass caller array directly"]
    B -->|Fail - allocate| D["fields.ToArray()"]
    C --> E["GroupedAndInternal / OrInternal / NotInternal"]
    D --> E
    E --> F["GetMultiFieldQuery\nread-only iteration, no mutation"]

    G["CreateSearchResult per document"] --> H["Dictionary pre-sized with fields.Count"]
    H --> I["foreach field in doc.Fields"]
    I --> J{ContainsKey fieldName?}
    J -->|No - first occurrence| K["Load doc.GetValues and store"]
    J -->|Yes - duplicate| L["Skip"]
    K --> I
    L --> I

_{Reviews (1): Last reviewed commit: "perf: reduce allocations in GroupedAnd/O..." | Re-trigger Greptile}

greptile-apps · 2026-06-10T13:42:31Z

+                    if (!resultVals.ContainsKey(fieldName))
                    {
-                        continue;
+                        resultVals[fieldName] = doc.GetValues(fieldName).ToList();
                    }


The ContainsKey + indexer pattern performs two dictionary lookups for every new field. TryAdd (available since .NET Core 2.0 / .NET Standard 2.1) accomplishes the same in a single lookup — though note that unlike the current guard, TryAdd evaluates the value argument eagerly, so doc.GetValues(fieldName).ToList() would execute even for duplicate field entries.

Suggested change

if (!resultVals.ContainsKey(fieldName))

{

continue;

resultVals[fieldName] = doc.GetValues(fieldName).ToList();

}

resultVals.TryAdd(fieldName, doc.GetValues(fieldName).ToList());

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

github-actions Bot added automation performance labels Jun 5, 2026

Shazwazza marked this pull request as ready for review June 10, 2026 13:38

greptile-apps Bot reviewed Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[perf-improver] perf: reduce allocations in GroupedAnd/Or/Not and search result field loading#462

[perf-improver] perf: reduce allocations in GroupedAnd/Or/Not and search result field loading#462
github-actions[bot] wants to merge 1 commit into
support/3.xfrom
perf-assist/reduce-allocations-8e6daa65e14b9fc8

github-actions Bot commented Jun 5, 2026

Uh oh!

greptile-apps Bot commented Jun 10, 2026

Uh oh!

greptile-apps Bot Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Uh oh!

Conversation

github-actions Bot commented Jun 5, 2026

Goal

Changes

1. LuceneSearchExecutor.cs — eliminate per-document HashSet<string> allocation

2. LuceneQuery.cs — avoid redundant array copy + remove .Cast<IExamineValue>()

3. LuceneSearchQueryBase.cs — same array-copy avoidance

Performance Evidence

Trade-offs

Reproducibility

Test Status

Uh oh!

greptile-apps Bot commented Jun 10, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

1. `LuceneSearchExecutor.cs` — eliminate per-document `HashSet<string>` allocation

2. `LuceneQuery.cs` — avoid redundant array copy + remove `.Cast<IExamineValue>()`

3. `LuceneSearchQueryBase.cs` — same array-copy avoidance