[efficiency-improver] perf: eliminate redundant field processing in CreateSearchResult#438
Conversation
For documents with multi-valued fields, doc.Fields lists each field instance once per stored value. The previous code called doc.GetValues() on every field instance and did O(n) Contains checks to dedup. Since doc.GetValues() already returns all values for a field in one call, tracking processed field names with a HashSet lets us call GetValues() exactly once per unique field name and skip all redundant iterations. This reduces allocations and CPU work proportional to the number of multi-valued fields per document in every search result materialisation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Greptile SummaryThis PR simplifies the field-processing loop in
Confidence Score: 5/5The change is a straightforward, logically equivalent simplification of the field-iteration loop — safe to merge. The old merge path (the if branch on repeated field names) was always a no-op because doc.GetValues returns the full value set on every call; no values from a second call could be absent from the first. The new code produces identical output for all realistic inputs, including documents with duplicate stored values, while calling GetValues exactly once per unique field name. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["foreach field in doc.Fields"] --> B{"processedFields.Add(fieldName)?"}
B -- "false (already seen)" --> C["continue — skip"]
B -- "true (first occurrence)" --> D["doc.GetValues(fieldName).ToList()"]
D --> E["resultVals[fieldName] = values"]
E --> A
C --> A
A -- "done" --> F["return resultVals"]
Reviews (1): Last reviewed commit: "perf: eliminate redundant field processi..." | Re-trigger Greptile |
🤖 This is a draft PR from Efficiency Improver, an automated AI assistant focused on reducing the energy consumption and computational footprint of this repository.
Goal and Rationale
LuceneSearchExecutor.CreateSearchResultis in the hot path of every search result materialisation. When a document has multi-valued fields (fields indexed with multiple values), Lucene'sdoc.Fieldsenumerates the same field name once per stored value. The previous implementation calleddoc.GetValues(fieldName)on each occurrence and performed O(n)List.Containschecks to avoid duplicates.Since
doc.GetValues(fieldName)already returns all values for a field in a single call, subsequent iterations for the same field name are entirely redundant. Eliminating them reduces CPU work and memory allocation proportional to the degree of multi-valued fields.Focus Area
Code-Level Efficiency — removing redundant computation in a hot search path.
Approach
Track processed field names with a
HashSet<string>. For each field instance indoc.Fields, if the name has already been processed,continue. Otherwise, record it and calldoc.GetValues()exactly once.Before:
After:
Energy Efficiency Evidence
Proxy metric used: CPU cycles / allocations (proxy for energy draw — fewer instructions and GC pressure = lower power consumption per query).
For a document with field
FhavingNstored values, the previous code:doc.GetValues("F")→ N times (O(fields) each)Containschecks → O(N2) total comparisons for that fieldAfter the change:
doc.GetValues("F")called onceContainschecksThis is measurably better for documents with multi-valued fields, which are common in full-text search scenarios (e.g., tags, categories, multi-select facets).
Green Software Foundation context: Hardware Efficiency — making better use of CPU cycles by eliminating provably unnecessary work.
Trade-offs
doc.GetValues(fieldName)returns the complete set of stored values for a field regardless of iteration order; no dedup needed.HashSet<string>allocation per result. For documents with few unique fields (typical), this is negligible and less than the allocations it eliminates.Reproducibility
For benchmarking:
src/Examine.Benchmarks/containsConcurrentSearchBenchmarks.csandSearchVersionComparison.csusing BenchmarkDotNet.Test Status
✅ Build: 0 errors, 3 warnings (pre-existing net6.0 EOL warning)
✅ Tests: 98 passed, 1 skipped, 0 failed (net8.0)
Note
🔒 Integrity filter blocked 5 items
The following items were blocked because they don't meet the GitHub integrity level.
list_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".To allow these resources, lower
min-integrityin your GitHub frontmatter: