Skip to content

[efficiency-improver] perf: eliminate redundant field processing in CreateSearchResult#438

Merged
Shazwazza merged 1 commit into
support/3.xfrom
efficiency/create-search-result-dedup-74955ea26c7e6a4c
May 25, 2026
Merged

[efficiency-improver] perf: eliminate redundant field processing in CreateSearchResult#438
Shazwazza merged 1 commit into
support/3.xfrom
efficiency/create-search-result-dedup-74955ea26c7e6a4c

Conversation

@github-actions

Copy link
Copy Markdown
Contributor

🤖 This is a draft PR from Efficiency Improver, an automated AI assistant focused on reducing the energy consumption and computational footprint of this repository.


Goal and Rationale

LuceneSearchExecutor.CreateSearchResult is in the hot path of every search result materialisation. When a document has multi-valued fields (fields indexed with multiple values), Lucene's doc.Fields enumerates the same field name once per stored value. The previous implementation called doc.GetValues(fieldName) on each occurrence and performed O(n) List.Contains checks to avoid duplicates.

Since doc.GetValues(fieldName) already returns all values for a field in a single call, subsequent iterations for the same field name are entirely redundant. Eliminating them reduces CPU work and memory allocation proportional to the degree of multi-valued fields.

Focus Area

Code-Level Efficiency — removing redundant computation in a hot search path.

Approach

Track processed field names with a HashSet<string>. For each field instance in doc.Fields, if the name has already been processed, continue. Otherwise, record it and call doc.GetValues() exactly once.

Before:

foreach (var field in fields)
{
    var fieldName = field.Name;
    var values = doc.GetValues(fieldName);          // called once per field *instance*

    if (resultVals.TryGetValue(fieldName, out var resultFieldVals))
    {
        foreach (var value in values)
        {
            if (!resultFieldVals.Contains(value))   // O(n) Contains on List
                resultFieldVals.Add(value);
        }
    }
    else
    {
        resultVals[fieldName] = values.ToList();
    }
}

After:

var processedFields = new HashSet<string>();

foreach (var field in fields)
{
    var fieldName = field.Name;
    if (!processedFields.Add(fieldName))
        continue;                                    // skip: all values already captured

    resultVals[fieldName] = doc.GetValues(fieldName).ToList();
}

Energy Efficiency Evidence

Proxy metric used: CPU cycles / allocations (proxy for energy draw — fewer instructions and GC pressure = lower power consumption per query).

For a document with field F having N stored values, the previous code:

  • Called doc.GetValues("F")N times (O(fields) each)
  • Ran inner Contains checks → O(N2) total comparisons for that field

After the change:

  • doc.GetValues("F") called once
  • Zero Contains checks

This is measurably better for documents with multi-valued fields, which are common in full-text search scenarios (e.g., tags, categories, multi-select facets).

Green Software Foundation context: Hardware Efficiency — making better use of CPU cycles by eliminating provably unnecessary work.

Trade-offs

  • Readability: The new code is actually simpler and shorter (−5 lines net).
  • Correctness: No change in output — doc.GetValues(fieldName) returns the complete set of stored values for a field regardless of iteration order; no dedup needed.
  • HashSet overhead: A small HashSet<string> allocation per result. For documents with few unique fields (typical), this is negligible and less than the allocations it eliminates.

Reproducibility

dotnet test src/Examine.Test/Examine.Test.csproj --configuration Release -f net8.0 --filter "TestCategory!~Benchmarks"

For benchmarking: src/Examine.Benchmarks/ contains ConcurrentSearchBenchmarks.cs and SearchVersionComparison.cs using BenchmarkDotNet.

Test Status

✅ Build: 0 errors, 3 warnings (pre-existing net6.0 EOL warning)
✅ Tests: 98 passed, 1 skipped, 0 failed (net8.0)

Note

🔒 Integrity filter blocked 5 items

The following items were blocked because they don't meet the GitHub integrity level.

  • #367 list_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #366 list_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #364 list_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #338 list_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #334 list_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by Efficiency Improver · ● 29.4M ·

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/efficiency-improver.md@dcdf09723d42ef9b6c75335e4612fd145d4ccdaa

For documents with multi-valued fields, doc.Fields lists each field
instance once per stored value. The previous code called doc.GetValues()
on every field instance and did O(n) Contains checks to dedup.

Since doc.GetValues() already returns all values for a field in one call,
tracking processed field names with a HashSet lets us call GetValues()
exactly once per unique field name and skip all redundant iterations.

This reduces allocations and CPU work proportional to the number of
multi-valued fields per document in every search result materialisation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Shazwazza Shazwazza marked this pull request as ready for review May 25, 2026 19:32
@greptile-apps

greptile-apps Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR simplifies the field-processing loop in LuceneSearchExecutor.CreateSearchResult by using a HashSet<string> to skip redundant iterations for multi-valued fields. Since doc.GetValues(fieldName) already returns all stored values for a field in a single call, the previous code was calling it once per field instance (N times for N stored values) and running unnecessary O(N²) Contains checks in the merge path.

  • The optimization is provably correct: doc.GetValues returns the same complete value set regardless of which field instance triggers the call, so the old merge path was always a no-op (every value was already present from the first call).
  • The net code change is −5 lines and eliminates redundant allocations and CPU work proportional to the multi-value cardinality of fields in each result document.

Confidence Score: 5/5

The change is a straightforward, logically equivalent simplification of the field-iteration loop — safe to merge.

The old merge path (the if branch on repeated field names) was always a no-op because doc.GetValues returns the full value set on every call; no values from a second call could be absent from the first. The new code produces identical output for all realistic inputs, including documents with duplicate stored values, while calling GetValues exactly once per unique field name.

No files require special attention.

Important Files Changed

Filename Overview
src/Examine.Lucene/Search/LuceneSearchExecutor.cs Replaces redundant per-instance GetValues calls and O(N²) Contains dedup with a single GetValues call per unique field name, guarded by a HashSet. Logically equivalent to the original for all inputs; simpler and faster.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["foreach field in doc.Fields"] --> B{"processedFields.Add(fieldName)?"}
    B -- "false (already seen)" --> C["continue — skip"]
    B -- "true (first occurrence)" --> D["doc.GetValues(fieldName).ToList()"]
    D --> E["resultVals[fieldName] = values"]
    E --> A
    C --> A
    A -- "done" --> F["return resultVals"]
Loading

Reviews (1): Last reviewed commit: "perf: eliminate redundant field processi..." | Re-trigger Greptile

@Shazwazza Shazwazza merged commit d94393a into support/3.x May 25, 2026
@Shazwazza Shazwazza deleted the efficiency/create-search-result-dedup-74955ea26c7e6a4c branch May 25, 2026 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant