Fix GROUPBY to implicitly load group key fields without explicit LOAD#997
Fix GROUPBY to implicitly load group key fields without explicit LOAD#997cnuthalapati wants to merge 2 commits into
Conversation
|
I wonder if this is a more generic problem. Will fields used in an expression (like in an APPLY) also require being in the LOAD? |
|
Yes, APPLY expressions referencing fields not in LOAD will evaluate against empty values. However, the two cases warrant different treatment. GROUPBY keys are required to interpret the data. They define what each output row represents. Without the grouping key in output, results are uninterpretable: you get aggregates with no labels. Implicitly loading them is the correct behavior because the user cannot make sense of the output otherwise. APPLY inputs are only computational. The user asked for the computed result (the AS field), not the source fields. Whether source fields also appear in output is an explicit choice via LOAD. This PR fixes the GROUPBY case. The APPLY behavior (requiring explicit LOAD for expression inputs) is by design. We should not implicitly load computation fields. |
When no LOAD clause covers GROUPBY key fields, the serializer skips them because ManipulateReturnsClause sets no_content=true. This adds GROUPBY key fields to the load list automatically, matching expected behavior. Only schema fields are implicitly loaded, so chained GROUPBYs using derived fields (reducer outputs) are handled safely. Fixes valkey-io#919 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Chaitanya Nuthalapati <cnu@amazon.com>
3311b48 to
70ba8f4
Compare
Greptile SummaryThis PR fixes a bug where
Confidence Score: 5/5Safe to merge — the change is narrowly scoped to ManipulateReturnsClause and correctly handles all described edge cases. The fix is logically sound: deduplication via std::find prevents double-processing when a field appears in both LOAD and GROUPBY; the GetIndex guard correctly skips reducer outputs in chained GROUPBYs; and AddRecordAttribute's idempotency check ensures fields already registered during parsing are not re-registered with conflicting state. The existing processing loop at line 90 already calls GetIndex unconditionally on every entry in loads_to_process, so the implicit entries added by the new code go through the same validation path as explicit LOAD entries. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[ManipulateReturnsClause called] --> B{loadall_?}
B -->|yes| C[Return OK — all fields fetched]
B -->|no| D[loads_to_process = copy of params.loads_]
D --> E[Iterate all pipeline stages]
E --> F{Stage is GroupBy?}
F -->|no| G[Skip stage]
G --> E
F -->|yes| H[Iterate group key attributes]
H --> I{name == __key or score_as?}
I -->|yes| J[Skip attribute]
J --> H
I -->|no| K{index_schema→GetIndex ok?}
K -->|no — derived/reducer field| L[Skip attribute]
L --> H
K -->|yes — real indexed field| M{Already in loads_to_process?}
M -->|yes| N[Skip — no duplicate]
N --> H
M -->|no| O[Append name to loads_to_process]
O --> H
H -->|done| E
E -->|done| P[Process loads_to_process loop]
P --> Q[For each load: GetIndex + build return_attributes]
Q --> R[Set params.no_content accordingly]
R --> S[Return OK]
Reviews (2): Last reviewed commit: "Merge branch 'main' into bugfix/groupby-..." | Re-trigger Greptile |
| if (!params.index_schema->GetIndex(name).ok()) continue; | ||
| if (std::find(loads_to_process.begin(), loads_to_process.end(), name) == | ||
| loads_to_process.end()) { | ||
| loads_to_process.push_back(name); | ||
| } |
There was a problem hiding this comment.
Redundant
GetIndex lookup per implicitly-added field
GetIndex(name) is called here to guard against non-schema fields, but it is called a second time unconditionally at line 90 (VMSDK_ASSIGN_OR_RETURN(auto indexer, params.index_schema->GetIndex(load))) for the same field. Because every name added to loads_to_process in this block has already passed the ok() check, the second lookup is guaranteed to succeed and is wasted work. Consider caching the result from the first call or restructuring so the lookup is performed only once.
|
Can we add a test in |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR fixes a bug where ChangesGROUPBY without LOAD returns
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Summary
Test plan
FT.AGGREGATE idx * GROUPBY 1 @field REDUCE COUNT 0 AS countreturns bothfieldandcountFT.AGGREGATE idx * LOAD 1 @price GROUPBY 1 @category REDUCE COUNT 0 AS countreturns all three fieldsFT.AGGREGATE idx * GROUPBY 1 @category REDUCE COUNT 0 AS count GROUPBY 1 @count REDUCE COUNT 0 AS numdoesn't errorFixes #919