Skip to content

Add Case-When Expression and tests#12

Merged
lukekim merged 9 commits into
spiceai-51from
spiceai-51-patches
Jan 22, 2026
Merged

Add Case-When Expression and tests#12
lukekim merged 9 commits into
spiceai-51from
spiceai-51-patches

Conversation

@lukekim
Copy link
Copy Markdown

@lukekim lukekim commented Jan 22, 2026

This pull request adds support for the SQL CASE WHEN expression to the Vortex engine, enabling both conversion from DataFusion's CaseExpr and correct pushdown analysis. The implementation ensures only the "searched CASE" form is supported (i.e., CASE WHEN ... THEN ... [ELSE ...] END), not the "simple CASE" form (CASE expr WHEN ...). Additionally, the pull request introduces related protocol options and improves expression conversion and pushdown logic.

Support for CASE WHEN expressions:

  • Added new case_when and case_when_no_else modules and functions to handle CASE WHEN expressions in the Vortex expression system (vortex-array/src/expr/exprs/case_when.rs, mod.rs, session.rs). [1] [2] [3] [4]
  • Implemented conversion from DataFusion's CaseExpr to Vortex expressions, supporting only the "searched CASE" form and handling both with and without ELSE branches (vortex-datafusion/src/convert/exprs.rs). [1] [2] [3]
  • Added pushdown support for CASE WHEN expressions, including logic to check the pushability of all WHEN, THEN, and ELSE branches and to reject unsupported forms (vortex-datafusion/src/convert/exprs.rs).

Improvements to expression conversion and pushdown logic:

  • Refined cast expression pushdown logic to ensure child expressions are convertible and added a helper to check if an expression type is convertible (vortex-datafusion/src/convert/exprs.rs).
  • Updated scalar function pushdown logic to check that all arguments are themselves pushable (vortex-datafusion/src/convert/exprs.rs).

Protocol and miscellaneous updates:

  • Added a new CaseWhenOpts message to the protocol buffer definitions for CASE WHEN options (vortex-proto/proto/expr.proto).
  • Minor formatting improvement in the debug implementation for WriteStrategyBuilder (vortex-file/src/strategy.rs).

… in can_be_pushed_down

The can_be_pushed_down function was returning true for CastExpr and CastColumnExpr
without checking if their child expressions are convertible. This caused runtime
errors when the child contained expressions like CaseExpr that convert() cannot
handle.

Also fixed ScalarFunctionExpr to recursively check its arguments.

Fixes spiceai/spiceai#9037
@lukekim lukekim self-assigned this Jan 22, 2026
@lukekim lukekim added the enhancement New feature or request label Jan 22, 2026
Vortex's CaseWhen expression evaluates all branches before checking
conditions, which causes divide-by-zero errors when division is
protected by a CASE WHEN guard (e.g. CASE WHEN x > 0 THEN y/x END).

This is a regression workaround - the proper fix would be to implement
lazy evaluation in CaseWhen, evaluating the THEN branch only for rows
where the condition is true.
…nevaluated branches

This implements proper lazy evaluation in CaseWhen expression to ensure
that THEN/ELSE branches are only evaluated for rows where they apply.
This is critical for correctness when expressions have side effects like
divide-by-zero panics.

The implementation:
1. Evaluates conditions in order, tracking which rows have been matched
2. For each condition, computes an effective mask (condition AND NOT matched)
3. Uses filter() to create a scoped array with only matching rows
4. Evaluates THEN expression only on the filtered scope
5. Uses scatter_with_mask() to expand results back to original positions
6. Short-circuits when all rows are matched or all conditions fail

This fixes TPC-DS Q73 which has a pattern like:
  CASE WHEN hd_vehicle_count > 0
       THEN hd_dep_count/hd_vehicle_count
       ELSE NULL END

Previously, the division would be evaluated for all rows including those
where hd_vehicle_count=0, causing a divide-by-zero panic. Now the division
is only evaluated for rows where the condition is true.

Added test: test_evaluate_divide_by_zero_protected_by_case_when
@lukekim lukekim merged commit 4e9896b into spiceai-51 Jan 22, 2026
13 of 41 checks passed
@lukekim lukekim deleted the spiceai-51-patches branch January 22, 2026 03:52
Comment on lines +50 to +55
pub struct CaseWhenOptions {
/// Number of WHEN/THEN pairs (each pair contributes 2 children)
pub num_when_then_pairs: u32,
/// Whether an ELSE clause is present (contributes 1 child at the end)
pub has_else: bool,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you compose these instead of flattening a tree? case COND then EXPR else EXPR.

e.g. case a then col else (case b then col2 else col3)

lukekim added a commit that referenced this pull request Feb 6, 2026
* fix: Ensure CastExpr/CastColumnExpr/ScalarFunctionExpr check children in can_be_pushed_down

The can_be_pushed_down function was returning true for CastExpr and CastColumnExpr
without checking if their child expressions are convertible. This caused runtime
errors when the child contained expressions like CaseExpr that convert() cannot
handle.

Also fixed ScalarFunctionExpr to recursively check its arguments.

Fixes spiceai/spiceai#9037

* Add Case-When Expression and tests

* Implement execute()

* Add additional tests and fix type issue

* Fix toml lint

* feat(case_when): implement lazy evaluation to avoid side effects in unevaluated branches

This implements proper lazy evaluation in CaseWhen expression to ensure
that THEN/ELSE branches are only evaluated for rows where they apply.
This is critical for correctness when expressions have side effects like
divide-by-zero panics.

The implementation:
1. Evaluates conditions in order, tracking which rows have been matched
2. For each condition, computes an effective mask (condition AND NOT matched)
3. Uses filter() to create a scoped array with only matching rows
4. Evaluates THEN expression only on the filtered scope
5. Uses scatter_with_mask() to expand results back to original positions
6. Short-circuits when all rows are matched or all conditions fail

This fixes TPC-DS Q73 which has a pattern like:
  CASE WHEN hd_vehicle_count > 0
       THEN hd_dep_count/hd_vehicle_count
       ELSE NULL END

Previously, the division would be evaluated for all rows including those
where hd_vehicle_count=0, causing a divide-by-zero panic. Now the division
is only evaluated for rows where the condition is true.

Added test: test_evaluate_divide_by_zero_protected_by_case_when

* Formatting
lukekim added a commit that referenced this pull request Feb 27, 2026
* fix: Ensure CastExpr/CastColumnExpr/ScalarFunctionExpr check children in can_be_pushed_down

The can_be_pushed_down function was returning true for CastExpr and CastColumnExpr
without checking if their child expressions are convertible. This caused runtime
errors when the child contained expressions like CaseExpr that convert() cannot
handle.

Also fixed ScalarFunctionExpr to recursively check its arguments.

Fixes spiceai/spiceai#9037

* Add Case-When Expression and tests

* Implement execute()

* Add additional tests and fix type issue

* Fix toml lint

* feat(case_when): implement lazy evaluation to avoid side effects in unevaluated branches

This implements proper lazy evaluation in CaseWhen expression to ensure
that THEN/ELSE branches are only evaluated for rows where they apply.
This is critical for correctness when expressions have side effects like
divide-by-zero panics.

The implementation:
1. Evaluates conditions in order, tracking which rows have been matched
2. For each condition, computes an effective mask (condition AND NOT matched)
3. Uses filter() to create a scoped array with only matching rows
4. Evaluates THEN expression only on the filtered scope
5. Uses scatter_with_mask() to expand results back to original positions
6. Short-circuits when all rows are matched or all conditions fail

This fixes TPC-DS Q73 which has a pattern like:
  CASE WHEN hd_vehicle_count > 0
       THEN hd_dep_count/hd_vehicle_count
       ELSE NULL END

Previously, the division would be evaluated for all rows including those
where hd_vehicle_count=0, causing a divide-by-zero panic. Now the division
is only evaluated for rows where the condition is true.

Added test: test_evaluate_divide_by_zero_protected_by_case_when

* Formatting
lukekim added a commit that referenced this pull request Feb 27, 2026
…Fusion demuxer (#23)

* Disable child metric registration under parent to prevent OOM (#10)

* Use tokio::sync::oneshot to prevent FuturesUnordered reentrant drop crash (#9)

* Don't return an error when we have an unsupported node, bubble up "TRUE" as in keep for that node, up to any `and` or `or` node and handle empty IN list. (#8)

* feat: Support retrieving WriterStrategyBuilder from VortexSession (#6)

* Fix session get-or-default (vortex-data#5662)

The comments described this get-or-default, but instead it was a panic

---------

Signed-off-by: Nicholas Gates <nick@nickgates.com>

* feat: Support retrieving writer strategy builder from vortex session

---------

Signed-off-by: Nicholas Gates <nick@nickgates.com>
Co-authored-by: Nicholas Gates <gatesn@users.noreply.github.com>

* fix: Handle UncompressedSizeInBytes statistic correctly (#3)

* fix: Add log dependency to vortex-io

* fix: Add Debug impl for WriteStrategyBuilder and log dep for vortex-io

* fix: Update persistent module to use simplified expression handling from PR #8

* revert: Restore spiceai-51 versions of persistent module (incompatible with DF51)

* fix: Restore format.rs to spiceai-51 version

* Add Case-When Expression and tests (#12)

* fix: Ensure CastExpr/CastColumnExpr/ScalarFunctionExpr check children in can_be_pushed_down

The can_be_pushed_down function was returning true for CastExpr and CastColumnExpr
without checking if their child expressions are convertible. This caused runtime
errors when the child contained expressions like CaseExpr that convert() cannot
handle.

Also fixed ScalarFunctionExpr to recursively check its arguments.

Fixes spiceai/spiceai#9037

* Add Case-When Expression and tests

* Implement execute()

* Add additional tests and fix type issue

* Fix toml lint

* feat(case_when): implement lazy evaluation to avoid side effects in unevaluated branches

This implements proper lazy evaluation in CaseWhen expression to ensure
that THEN/ELSE branches are only evaluated for rows where they apply.
This is critical for correctness when expressions have side effects like
divide-by-zero panics.

The implementation:
1. Evaluates conditions in order, tracking which rows have been matched
2. For each condition, computes an effective mask (condition AND NOT matched)
3. Uses filter() to create a scoped array with only matching rows
4. Evaluates THEN expression only on the filtered scope
5. Uses scatter_with_mask() to expand results back to original positions
6. Short-circuits when all rows are matched or all conditions fail

This fixes TPC-DS Q73 which has a pattern like:
  CASE WHEN hd_vehicle_count > 0
       THEN hd_dep_count/hd_vehicle_count
       ELSE NULL END

Previously, the division would be evaluated for all rows including those
where hd_vehicle_count=0, causing a divide-by-zero panic. Now the division
is only evaluated for rows where the condition is true.

Added test: test_evaluate_divide_by_zero_protected_by_case_when

* Formatting

* feat(datafusion): Export DefaultExpressionConvertor (#19)

* feat: Add option for writing to target vortex file size

* fix: Actually respect target file size

* fix: Set default file size to 16mb

* fix: Update use of unwrap in tests

* test: Update tests

* fix: Always write with custom sink

* fix: Stream directly into target file

* fix: Improve safety arounded ended writer streams with remaining source

* feat: add target file size configuration for Vortex file output

- Introduced `target_file_size_mb` option to `VortexFormat` for controlling the size of output files in megabytes.
- Updated `VortexSink` to handle writing files based on the specified target size, bypassing DataFusion's default row count-based splitting.
- Implemented logic to split output files when the buffered data exceeds the target file size during the write process.
- Added tests to verify the new target file size configuration.

* refactor: clean up case expression handling and improve file size configuration logic

---------

Signed-off-by: Nicholas Gates <nick@nickgates.com>
Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com>
Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Co-authored-by: Nicholas Gates <gatesn@users.noreply.github.com>
lukekim pushed a commit that referenced this pull request May 18, 2026
## Summary

Fix for the second part of: vortex-data#7808 

```
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
    at ./nptl/pthread_kill.c:89
#3  0x000076a38cc4527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x000076a38cc288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x000076a38cc297b6 in __libc_message_impl (fmt=fmt@entry=0x76a38cdce8d7 "%s\n")
    at ../sysdeps/posix/libc_fatal.c:134
#6  0x000076a38cca8ff5 in malloc_printerr (
    str=str@entry=0x76a38cdd1bf0 "free(): double free detected in tcache 2")
    at ./malloc/malloc.c:5775
#7  0x000076a38ccab55f in _int_free (av=0x76a38ce03ac0 <main_arena>, p=<optimized out>, 
    have_lock=0) at ./malloc/malloc.c:4541
#8  0x000076a38ccaddce in __GI___libc_free (mem=0x5be5cd9632c0) at ./malloc/malloc.c:3398
#9  0x000076a38eb6807e in alloc::alloc::dealloc (ptr=0x5be5cd9632c0, layout=...)
    at /home/ubuntu/.rustup/toolchains/1.91.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:114
#10 alloc::alloc::{impl#1}::deallocate (self=0x5be5cd95f708, ptr=..., layout=...)
    at /home/ubuntu/.rustup/toolchains/1.91.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:271
#11 0x000076a38ead9a64 in alloc::boxed::{impl#8}::drop<dyn vortex_scan::Partition, alloc::alloc::Global> (self=0x5be5cd95f6f8)
    at /home/ubuntu/.rustup/toolchains/1.91.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1666
#12 0x000076a38ead349e in core::ptr::drop_in_place<alloc::boxed::Box<dyn vortex_scan::Partition, alloc::alloc::Global>> ()
    at /home/ubuntu/.rustup/toolchains/1.91.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:804
#13 0x000076a38e8764de in core::ptr::drop_in_place<vortex_ffi::scan::VxPartitionScan> ()
    at /home/ubuntu/.rustup/toolchains/1.91.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:804
#14 0x000076a38e876fb8 in core::ptr::drop_in_place<alloc::boxed::Box<vortex_ffi::scan::VxPartitionScan, alloc::alloc::Global>> ()
    at /home/ubuntu/.rustup/toolchains/1.91.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:804
#15 0x000076a38e87f2f5 in core::mem::drop<alloc::boxed::Box<vortex_ffi::scan::VxPartitionScan, alloc::alloc::Global>> (_x=0x5be5cd95f6f0)
    at /home/ubuntu/.rustup/toolchains/1.91.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/mem/mod.rs:961
#16 0x000076a38e84efa7 in vortex_ffi::scan::vx_partition_free (ptr=0x5be5cd95f6f0)
    at vortex-ffi/src/macros.rs:295
#17 0x00005be5b0c81126 in operator() (__closure=0x7fff2208a8b0)
    at /home/ubuntu/vortex/vortex-ffi/test/scan.cpp:940
```

## Testing

Verifying existing behavior is maintained.

Signed-off-by: Dergousov Maksim <dergousovmaxim99@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants