Skip to content

[perf-improver] perf: reduce allocations in GroupedAnd/Or/Not and search result field loading#462

Open
github-actions[bot] wants to merge 1 commit into
support/3.xfrom
perf-assist/reduce-allocations-8e6daa65e14b9fc8
Open

[perf-improver] perf: reduce allocations in GroupedAnd/Or/Not and search result field loading#462
github-actions[bot] wants to merge 1 commit into
support/3.xfrom
perf-assist/reduce-allocations-8e6daa65e14b9fc8

Conversation

@github-actions

@github-actions github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

🤖 This is an automated PR from Perf Improver, an AI assistant focused on performance.

Goal

Reduce unnecessary heap allocations on hot search paths — lower GC pressure in high-throughput scenarios.

Changes

Three targeted micro-optimisation changes across three files:

1. LuceneSearchExecutor.cs — eliminate per-document HashSet<string> allocation

Before: CreateSearchResult allocated a new HashSet<string>() per document to track which field names had already been processed.

After: Uses resultVals.ContainsKey(fieldName) for the same deduplication logic — same O(1) semantics, one fewer object allocation per accessed search result.

Also pre-sizes resultVals dictionary with fields.Count to reduce internal resize operations when a document has many fields.

2. LuceneQuery.cs — avoid redundant array copy + remove .Cast<IExamineValue>()

  • fields.ToArray()fields as string[] ?? fields.ToArray(): when the caller already passes a string[] (the common case — e.g. new[] { "title", "body" }), the cast short-circuits and skips the allocation entirely.
  • Removed redundant .Cast<IExamineValue>() calls: the Select projection already casts with (IExamineValue), so the intermediate LINQ iterator is unnecessary.

3. LuceneSearchQueryBase.cs — same array-copy avoidance

Applied the same fields as string[] ?? fields.ToArray() pattern in the public-facing GroupedAnd/Or/Not methods and their INestedQuery counterparts.

Performance Evidence

Methodology: Code-path analysis + allocation counting. Verified with Lucene.Net's Document.Fields returning IList<IIndexableField> which supports .Count.

Site Before After
CreateSearchResult (per document) Dictionary + HashSet allocated Dictionary only (+ capacity hint)
GroupedAnd/Or/Not with string[] input Array copy always Array copy skipped (0 allocs for the field array)
LuceneQuery.GroupedAnd/Or/Not (string overload) Select + Cast + ToArray (3 enumerator objects) Select + ToArray (2 objects)

For a typical search returning 10 results with 20 fields each:

  • Saved: 10 HashSet allocations per page of results when AllValues/GetValues is accessed.
  • Saved: 1 array allocation per GroupedAnd/Or/Not call where the caller passes string[].

Trade-offs

  • fields as string[] ?? fields.ToArray() adds a type-check instruction (negligible overhead; faster than the allocation it skips).
  • Pre-sizing the dictionary with fields.Count may over-allocate capacity by a few slots when field names repeat — this is a net positive since it avoids resizes.

Reproducibility

dotnet test src/Examine.Test/Examine.Test.csproj --configuration Release --filter "TestCategory!=Benchmarks" -f net8.0

Test Status

✅ Build clean (0 errors, 0 new warnings). All 147 tests passed (2 skipped as expected).

Generated by Perf Improver · sonnet46 4.9M ·
Comment /perf-assist to run again

Add this agentic workflows to your repo

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/perf-improver.md@dcdf09723d42ef9b6c75335e4612fd145d4ccdaa

… loading

- Avoid redundant string[] copy in GroupedAnd/Or/Not when caller already
  passes a string[]: use 'fields as string[] ?? fields.ToArray()' pattern
  in LuceneSearchQueryBase (public and INestedQuery paths) and LuceneQuery.
- Remove redundant .Cast<IExamineValue>() in LuceneQuery — explicit cast in
  the Select projection is sufficient.
- Eliminate the per-document HashSet<string> allocation in
  LuceneSearchExecutor.CreateSearchResult: use resultVals.ContainsKey()
  for duplicate-field-name detection (same O(1) semantics, one fewer object).
- Pre-size the resultVals Dictionary with fields.Count to reduce resizes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Shazwazza Shazwazza marked this pull request as ready for review June 10, 2026 13:38
@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR applies three targeted allocation-reduction micro-optimisations on hot search paths: eliminating the per-document HashSet<string> in CreateSearchResult, pre-sizing the result dictionary, and short-circuiting the fields.ToArray() copy when the caller already supplies a string[].

  • LuceneSearchExecutor.cs: Removes the HashSet<string> processedFields companion by reusing ContainsKey on the result dictionary for duplicate-field detection; pre-sizes the dictionary with fields.Count to avoid internal rehash. Logic is semantically identical to the original.
  • LuceneQuery.cs / LuceneSearchQueryBase.cs: Replaces unconditional fields.ToArray() with fields as string[] ?? fields.ToArray() across all GroupedAnd/Or/Not public and INestedQuery overloads. GetMultiFieldQuery receives IReadOnlyList<string> and never mutates the array, so sharing the caller's reference is safe. The redundant .Cast<IExamineValue>() calls are also removed since the Select projection already casts inline.

Confidence Score: 4/5

Safe to merge; all three optimisations correctly preserve the original semantics and no array passed to the internal methods is mutated.

The changes are correct and well-scoped. The ContainsKey guard correctly avoids redundant doc.GetValues() calls on duplicate fields, GetMultiFieldQuery never mutates the fields array so bypassing the defensive copy is safe, and the Cast removal is valid since the inline cast in Select is sufficient. The single comment flags a minor double-lookup pattern.

No files require special attention.

Important Files Changed

Filename Overview
src/Examine.Lucene/Search/LuceneSearchExecutor.cs Replaces per-document HashSet with ContainsKey on the dictionary and pre-sizes with fields.Count; logic is semantically identical to the original.
src/Examine.Lucene/Search/LuceneQuery.cs Replaces fields.ToArray() with fields as string[] ?? fields.ToArray() and removes redundant .Cast(); safe because GetMultiFieldQuery never mutates the array.
src/Examine.Lucene/Search/LuceneSearchQueryBase.cs Applies the same array-cast pattern across all public and INestedQuery GroupedAnd/Or/Not overloads; null guards are correctly preserved.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["GroupedAnd/Or/Not\nIEnumerable fields"] --> B{Cast to string-array?}
    B -->|Success - zero alloc| C["Pass caller array directly"]
    B -->|Fail - allocate| D["fields.ToArray()"]
    C --> E["GroupedAndInternal / OrInternal / NotInternal"]
    D --> E
    E --> F["GetMultiFieldQuery\nread-only iteration, no mutation"]

    G["CreateSearchResult per document"] --> H["Dictionary pre-sized with fields.Count"]
    H --> I["foreach field in doc.Fields"]
    I --> J{ContainsKey fieldName?}
    J -->|No - first occurrence| K["Load doc.GetValues and store"]
    J -->|Yes - duplicate| L["Skip"]
    K --> I
    L --> I
Loading

Reviews (1): Last reviewed commit: "perf: reduce allocations in GroupedAnd/O..." | Re-trigger Greptile

Comment on lines +260 to 263
if (!resultVals.ContainsKey(fieldName))
{
continue;
resultVals[fieldName] = doc.GetValues(fieldName).ToList();
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The ContainsKey + indexer pattern performs two dictionary lookups for every new field. TryAdd (available since .NET Core 2.0 / .NET Standard 2.1) accomplishes the same in a single lookup — though note that unlike the current guard, TryAdd evaluates the value argument eagerly, so doc.GetValues(fieldName).ToList() would execute even for duplicate field entries.

Suggested change
if (!resultVals.ContainsKey(fieldName))
{
continue;
resultVals[fieldName] = doc.GetValues(fieldName).ToList();
}
resultVals.TryAdd(fieldName, doc.GetValues(fieldName).ToList());

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants