You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf(inline-agg): add BoolAnd and BoolOr accumulator types (#6984)
## Summary
Implements BoolAnd and BoolOr accumulators from #6585 (item 7) for the
inline grouped aggregation path. Each accumulator holds a per-group
`Option<bool>` state; first non-null value seeds the state, subsequent
non-null values combine via `&&` (BoolAnd) or `||` (BoolOr). Output
dtype is Boolean. Grouping semantics and final query results are
unchanged.
## Why
`AggExpr::BoolAnd` and `AggExpr::BoolOr` already exist in the DSL and
are wired in the fallback path (`src/daft-recordbatch/src/lib.rs` →
`Series::bool_and(groups)` / `Series::bool_or(groups)`), but currently
fall back to `make_groups + eval_agg_expression` even when the rest of
the query qualifies for the inline path. Adding them to the inline
accumulator framework completes inline coverage of the standard
reducer-style aggregates (Count / Sum / Min / Max / Product / BoolAnd /
BoolOr) that all share the same `Vec<Option<T>>` per-group state shape.
## Changes Made
- `src/daft-recordbatch/src/ops/inline_agg.rs`:
- New `define_bool_and_accum!` and `define_bool_or_accum!` macros (kept
separate per the Sum/Product precedent — these are semantically distinct
ops with different identity and absorbing elements).
- `define_agg_accumulator_enum!` extended with `BoolAnd` and `BoolOr`
variants.
- `try_create_accumulator` dispatches `AggExpr::BoolAnd(expr)` and
`AggExpr::BoolOr(expr)` on `DataType::Boolean`.
- `can_inline_agg` adds a separate Boolean-only dtype arm; existing
numeric arm for Sum/Min/Max/Product is unchanged.
- 5 new tests + 4 helpers.
**Implementation note:** `BooleanArray::values()` doesn't expose a
`&[bool]` slice because Arrow stores bools bit-packed. The null-free
tight loop uses `self.source.to_bitmap()` + `bitmap.value(row_idx)`
instead of the `.zip(values().iter())` pattern Sum/Product use over
primitive slices. Functionally equivalent, just a different access
pattern forced by the storage layout.
## Behavior
- Queries with `BoolAnd` / `BoolOr` over Boolean columns now take the
inline path instead of falling back to `make_groups +
eval_agg_expression`.
- Output values identical to the fallback path (verified by
inline-vs-fallback tests).
- All other agg types and dispatch paths are unchanged.
- **Not implemented (deferred):** short-circuit optimization (stop
scanning a group once BoolAnd hits `false` / BoolOr hits `true`). Adding
a per-row branch to the hot loop would regress non-short-circuiting
groups; Sum/Min/Max have analogous opportunities and intentionally don't
take them. Revisit if benchmarks show it matters.
## Test Plan
- `cargo test -p daft-recordbatch --release inline_agg` — 37 passed (32
pre-existing + 5 new).
- `cargo fmt -p daft-recordbatch --check` — clean.
- `cargo clippy -p daft-recordbatch --release --features python` —
clean, no `#[allow]`s added.
New test cases:
- `test_inline_bool_and_matches_fallback` — Utf8 keys + Boolean vals
(no-null tight loop).
- `test_inline_bool_or_matches_fallback` — Utf8 keys + Boolean vals
(no-null tight loop, OR semantics).
- `test_inline_int_key_bool_and_matches_fallback` — Int64 keys + Boolean
vals (FNV int-key fast path).
- `test_inline_bool_and_with_nulls_matches_fallback` — Boolean vals with
`None` interspersed (exercises null-value branch).
- `test_inline_all_null_bool_or_matches_fallback` — all-null vals
(exercises empty `Option<bool>` finalize path).
## Related Issues
- Part of #6585 (item 7).
0 commit comments