Fix GROUPBY to implicitly load group key fields without explicit LOAD by cnuthalapati · Pull Request #997 · valkey-io/valkey-search

cnuthalapati · 2026-04-29T21:03:28Z

Summary

Fix FT.AGGREGATE GROUPBY to implicitly load group key fields into the output when no explicit LOAD clause covers them
Only implicitly loads fields that exist in the index schema, so chained GROUPBYs using derived fields (reducer outputs) don't break
Without this fix, GROUPBY results omit the group key fields unless the user redundantly specifies them in LOAD

Test plan

Run existing compatibility tests with regenerated answers from the issue's test additions
Verify FT.AGGREGATE idx * GROUPBY 1 @field REDUCE COUNT 0 AS count returns both field and count
Verify partial LOAD case: FT.AGGREGATE idx * LOAD 1 @price GROUPBY 1 @category REDUCE COUNT 0 AS count returns all three fields
Verify chained GROUPBY: FT.AGGREGATE idx * GROUPBY 1 @category REDUCE COUNT 0 AS count GROUPBY 1 @count REDUCE COUNT 0 AS num doesn't error
Verify explicit LOAD of GROUPBY key still works (no duplication)

Fixes #919

allenss-amazon · 2026-04-29T21:15:02Z

I wonder if this is a more generic problem. Will fields used in an expression (like in an APPLY) also require being in the LOAD?

cnuthalapati · 2026-04-29T22:19:50Z

Yes, APPLY expressions referencing fields not in LOAD will evaluate against empty values. However, the two cases warrant different treatment.

GROUPBY keys are required to interpret the data. They define what each output row represents. Without the grouping key in output, results are uninterpretable: you get aggregates with no labels. Implicitly loading them is the correct behavior because the user cannot make sense of the output otherwise.

APPLY inputs are only computational. The user asked for the computed result (the AS field), not the source fields. Whether source fields also appear in output is an explicit choice via LOAD.

This PR fixes the GROUPBY case. The APPLY behavior (requiring explicit LOAD for expression inputs) is by design. We should not implicitly load computation fields.

When no LOAD clause covers GROUPBY key fields, the serializer skips them because ManipulateReturnsClause sets no_content=true. This adds GROUPBY key fields to the load list automatically, matching expected behavior. Only schema fields are implicitly loaded, so chained GROUPBYs using derived fields (reducer outputs) are handled safely. Fixes valkey-io#919 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Chaitanya Nuthalapati <cnu@amazon.com>

greptile-apps · 2026-05-11T18:28:26Z

Greptile Summary

This PR fixes a bug where FT.AGGREGATE ... GROUPBY 1 @field would silently drop the group key field from the output unless the user also specified an explicit LOAD @field clause. The fix pre-populates loads_to_process with any GROUPBY group key that (a) exists in the index schema and (b) is not already covered by an explicit LOAD.

Iterates all pipeline stages, collects group key names from every GroupBy stage, and appends them to the existing loads_ vector (with deduplication via std::find) before the attribute-loading loop runs.
Guards against chained-GROUPBY reducer outputs (e.g., AS count) by checking index_schema->GetIndex(name).ok() — derived fields not in the schema are correctly skipped, preventing parse errors on the second GROUPBY.
Explicit LOAD of a GROUPBY key is handled without duplication by the std::find deduplication check.

Confidence Score: 5/5

Safe to merge — the change is narrowly scoped to ManipulateReturnsClause and correctly handles all described edge cases.

The fix is logically sound: deduplication via std::find prevents double-processing when a field appears in both LOAD and GROUPBY; the GetIndex guard correctly skips reducer outputs in chained GROUPBYs; and AddRecordAttribute's idempotency check ensures fields already registered during parsing are not re-registered with conflicting state. The existing processing loop at line 90 already calls GetIndex unconditionally on every entry in loads_to_process, so the implicit entries added by the new code go through the same validation path as explicit LOAD entries.

No files require special attention.

Important Files Changed

Filename	Overview
src/commands/ft_aggregate.cc	Adds implicit loading of GROUPBY group key fields in ManipulateReturnsClause; logic is correct, deduplication and schema-guard work as intended.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ManipulateReturnsClause called] --> B{loadall_?}
    B -->|yes| C[Return OK — all fields fetched]
    B -->|no| D[loads_to_process = copy of params.loads_]
    D --> E[Iterate all pipeline stages]
    E --> F{Stage is GroupBy?}
    F -->|no| G[Skip stage]
    G --> E
    F -->|yes| H[Iterate group key attributes]
    H --> I{name == __key or score_as?}
    I -->|yes| J[Skip attribute]
    J --> H
    I -->|no| K{index_schema→GetIndex ok?}
    K -->|no — derived/reducer field| L[Skip attribute]
    L --> H
    K -->|yes — real indexed field| M{Already in loads_to_process?}
    M -->|yes| N[Skip — no duplicate]
    N --> H
    M -->|no| O[Append name to loads_to_process]
    O --> H
    H -->|done| E
    E -->|done| P[Process loads_to_process loop]
    P --> Q[For each load: GetIndex + build return_attributes]
    Q --> R[Set params.no_content accordingly]
    R --> S[Return OK]

_{Reviews (2): Last reviewed commit: "Merge branch 'main' into bugfix/groupby-..." | Re-trigger Greptile}

greptile-apps · 2026-05-11T18:28:30Z

+        if (!params.index_schema->GetIndex(name).ok()) continue;
+        if (std::find(loads_to_process.begin(), loads_to_process.end(), name) ==
+            loads_to_process.end()) {
+          loads_to_process.push_back(name);
+        }


Redundant GetIndex lookup per implicitly-added field

GetIndex(name) is called here to guard against non-schema fields, but it is called a second time unconditionally at line 90 (VMSDK_ASSIGN_OR_RETURN(auto indexer, params.index_schema->GetIndex(load))) for the same field. Because every name added to loads_to_process in this block has already passed the ok() check, the second lookup is guaranteed to succeed and is wasted work. Consider caching the result from the first call or restructuring so the lookup is performed only once.

KarthikSubbarao · 2026-05-12T23:37:57Z

Can we add a test in test_non_vector.py with the commands we want to run here? @cnuthalapati

coderabbitai · 2026-05-17T18:31:58Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e6c5c55e-bc2e-4955-a5dd-a2e8bfe2ee9b

📥 Commits

Reviewing files that changed from the base of the PR and between da3872f and 5ddb98c.

📒 Files selected for processing (1)

src/commands/ft_aggregate.cc

📝 Walkthrough

Walkthrough

This PR fixes a bug where FT.AGGREGATE ... GROUPBY without an explicit LOAD clause omits group key fields from output. The fix augments ManipulateReturnsClause to implicitly load GROUPBY key attributes by scanning aggregation stages and adding group field names to the return set, matching Redis Stack behavior.

Changes

GROUPBY without LOAD returns

Layer / File(s)	Summary
Implicit GROUPBY key field loading `src/commands/ft_aggregate.cc`	Added `<algorithm>` and `<vector>` headers. Enhanced `ManipulateReturnsClause` to construct a `loads_to_process` list from `params.loads_` and augment it with group attributes from `GroupBy` stages (excluding `__key`, score alias, and attributes without available indices), then deduplicate and use it for return attribute population.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: fixing GROUPBY to implicitly load group key fields without explicit LOAD, which directly matches the code modification in the raw summary.
Description check	✅ Passed	The description is directly related to the changeset, providing clear context about what was fixed, test scenarios, and the referenced issue `#919`.
Linked Issues check	✅ Passed	The code changes fully implement the proposed fix from issue `#919`: detecting GROUPBY stages, extracting group key field names, and adding them to loads_to_process when no LOAD clause exists, with proper exclusions for __key and score fields.
Out of Scope Changes check	✅ Passed	All changes are within scope of issue `#919`: adding headers for container operations, extracting group keys from GROUPBY stages, and deduplicating the loads list—no unrelated modifications detected.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions Bot assigned cnuthalapati Apr 29, 2026

madolson requested review from KarthikSubbarao and allenss-amazon April 29, 2026 21:08

cnuthalapati force-pushed the bugfix/groupby-implicit-load-keys branch from 3311b48 to 70ba8f4 Compare April 29, 2026 22:22

greptile-apps Bot reviewed May 11, 2026

View reviewed changes

Merge branch 'main' into bugfix/groupby-implicit-load-keys

5ddb98c

allenss-amazon added this to Valkey-search 1.2 and Valkey-search 1.3 Jun 10, 2026

github-project-automation Bot moved this to Backlog in Valkey-search 1.2 Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GROUPBY to implicitly load group key fields without explicit LOAD#997

Fix GROUPBY to implicitly load group key fields without explicit LOAD#997
cnuthalapati wants to merge 2 commits into
valkey-io:mainfrom
cnuthalapati:bugfix/groupby-implicit-load-keys

cnuthalapati commented Apr 29, 2026

Uh oh!

allenss-amazon commented Apr 29, 2026

Uh oh!

cnuthalapati commented Apr 29, 2026

Uh oh!

greptile-apps Bot commented May 11, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot May 11, 2026

Uh oh!

KarthikSubbarao commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 17, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cnuthalapati commented Apr 29, 2026

Summary

Test plan

Uh oh!

allenss-amazon commented Apr 29, 2026

Uh oh!

cnuthalapati commented Apr 29, 2026

Uh oh!

greptile-apps Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

KarthikSubbarao commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps Bot commented May 11, 2026 •

edited

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading