Commit df23c37
authored
feat[expr]: N-ary CASE WHEN expression (vortex-data#6786)
## Summary
Introduces CASE WHEN expression support in Vortex as a scalar function,
implementing a true n-ary CASE WHEN cond1 THEN val1 WHEN cond2 THEN val2
... ELSE default END expression.
## Changes included
This pull request introduces support for SQL-style CASE WHEN expressions
in Vortex, including both the expression logic and integration with
DataFusion. The changes add new expression constructors, conversion
logic, and pushdown support for CASE WHEN, as well as comprehensive
benchmarks and tests to ensure correctness and performance.
**CASE WHEN expression support:**
* Added new constructors `case_when`, `case_when_no_else`, and
`nested_case_when` to build CASE WHEN expressions in `exprs.rs`,
leveraging the new `CaseWhen` scalar function and its options.
[[1]](diffhunk://#diff-d1b532c524303a975ccf98376ed880c0b6e4bab2fce91e99157aac49f9c30e66R24-R25)
[[2]](diffhunk://#diff-d1b532c524303a975ccf98376ed880c0b6e4bab2fce91e99157aac49f9c30e66R115-R168)
* Introduced `CaseWhenOpts` protobuf message to encode CASE WHEN options
for serialization.
* Registered the new `case_when` scalar function module.
**DataFusion integration:**
* Implemented conversion from DataFusion's `CaseExpr` to Vortex's CASE
WHEN expressions, supporting the "searched CASE" form and mapping
WHEN/THEN/ELSE clauses.
[[1]](diffhunk://#diff-ca68b66d97eff4b97ef5e70b3c324078ee441d91422f8d3a426f47882a61a202R148-R186)
[[2]](diffhunk://#diff-ca68b66d97eff4b97ef5e70b3c324078ee441d91422f8d3a426f47882a61a202R278-R281)
* Enhanced pushdown logic to support CASE WHEN expressions, including
recursive checks for child expressions and ELSE clauses.
[[1]](diffhunk://#diff-ca68b66d97eff4b97ef5e70b3c324078ee441d91422f8d3a426f47882a61a202L383-R432)
[[2]](diffhunk://#diff-ca68b66d97eff4b97ef5e70b3c324078ee441d91422f8d3a426f47882a61a202R445-R510)
**Benchmarking and testing:**
* Added comprehensive benchmarks for various CASE WHEN scenarios in
`case_when_bench.rs` and registered them in `Cargo.toml`.
[[1]](diffhunk://#diff-c745f3a7a34d4a4f657c8e2e9e47197ec0f36e2886656366721559f36b76eca9R1-R210)
[[2]](diffhunk://#diff-b1d55e82b9c9a25ba02540f4e1d46def7467de866587345f75b63f9ac51c04a6R130-R134)
* Added an equivalence test to ensure that DataFusion CASE WHEN results
match Vortex's results for the same input data.
**Other improvements:**
* Refined scalar function pushdown checks and utility logic for
expression convertibility.
* Minor code quality improvements and error handling in the expression
module.
## Testing
* Added a comprehensive test to verify equivalence between DataFusion
and Vortex results for a `CASE WHEN` expression applied to an Arrow
`RecordBatch`.
## Benchmarking
* Introduced a new benchmark for various `CASE WHEN` scenarios,
including simple, nested, all-true, and all-false cases, in
`benches/expr/case_when_bench.rs`, and registered it in `Cargo.toml`.
[[1]](diffhunk://#diff-c745f3a7a34d4a4f657c8e2e9e47197ec0f36e2886656366721559f36b76eca9R1-R144)
[[2]](diffhunk://#diff-b1d55e82b9c9a25ba02540f4e1d46def7467de866587345f75b63f9ac51c04a6R130-R134)
Benchmark | 1K rows | 10K rows | 100K rows
-- | -- | -- | --
case_when_simple | 5.4 µs | 8.4 µs | 29.5 µs
case_when_without_else | 5.5 µs | 8.7 µs | 29.4 µs
case_when_all_true | 4.2 µs | 6.3 µs | 20.7 µs
case_when_all_false | 4.3 µs | 6.4 µs | 20.1 µs
case_when_nary_3_conditions | 15.1 µs | 25.8 µs | 87.3 µs
case_when_nary_equality_lookup (5) | 25.2 µs | 39.7 µs | 125.4 µs
case_when_nary_10_conditions | 45.5 µs | 73.1 µs | 257.5 µs
---------
Signed-off-by: Luke Kim <80174+lukekim@users.noreply.github.com>1 parent 4a0f17d commit df23c37
10 files changed
Lines changed: 1572 additions & 5 deletions
File tree
- vortex-array
- benches/expr
- src
- expr
- scalar_fn/fns
- vortex-datafusion/src/convert
- vortex-proto
- proto
- src/generated
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
130 | 135 | | |
131 | 136 | | |
132 | 137 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10106 | 10106 | | |
10107 | 10107 | | |
10108 | 10108 | | |
| 10109 | + | |
| 10110 | + | |
| 10111 | + | |
| 10112 | + | |
10109 | 10113 | | |
10110 | 10114 | | |
10111 | 10115 | | |
| |||
10160 | 10164 | | |
10161 | 10165 | | |
10162 | 10166 | | |
| 10167 | + | |
| 10168 | + | |
10163 | 10169 | | |
10164 | 10170 | | |
10165 | 10171 | | |
| |||
13210 | 13216 | | |
13211 | 13217 | | |
13212 | 13218 | | |
| 13219 | + | |
| 13220 | + | |
| 13221 | + | |
| 13222 | + | |
| 13223 | + | |
| 13224 | + | |
| 13225 | + | |
| 13226 | + | |
| 13227 | + | |
| 13228 | + | |
| 13229 | + | |
| 13230 | + | |
| 13231 | + | |
| 13232 | + | |
| 13233 | + | |
| 13234 | + | |
| 13235 | + | |
| 13236 | + | |
| 13237 | + | |
| 13238 | + | |
| 13239 | + | |
| 13240 | + | |
| 13241 | + | |
| 13242 | + | |
| 13243 | + | |
| 13244 | + | |
| 13245 | + | |
| 13246 | + | |
| 13247 | + | |
| 13248 | + | |
| 13249 | + | |
| 13250 | + | |
| 13251 | + | |
| 13252 | + | |
| 13253 | + | |
| 13254 | + | |
| 13255 | + | |
| 13256 | + | |
| 13257 | + | |
| 13258 | + | |
| 13259 | + | |
| 13260 | + | |
| 13261 | + | |
| 13262 | + | |
| 13263 | + | |
| 13264 | + | |
| 13265 | + | |
| 13266 | + | |
| 13267 | + | |
| 13268 | + | |
| 13269 | + | |
| 13270 | + | |
| 13271 | + | |
| 13272 | + | |
| 13273 | + | |
| 13274 | + | |
| 13275 | + | |
| 13276 | + | |
| 13277 | + | |
| 13278 | + | |
| 13279 | + | |
| 13280 | + | |
| 13281 | + | |
| 13282 | + | |
| 13283 | + | |
| 13284 | + | |
| 13285 | + | |
| 13286 | + | |
| 13287 | + | |
| 13288 | + | |
| 13289 | + | |
| 13290 | + | |
| 13291 | + | |
| 13292 | + | |
| 13293 | + | |
| 13294 | + | |
| 13295 | + | |
| 13296 | + | |
| 13297 | + | |
| 13298 | + | |
13213 | 13299 | | |
13214 | 13300 | | |
13215 | 13301 | | |
| |||
15042 | 15128 | | |
15043 | 15129 | | |
15044 | 15130 | | |
| 15131 | + | |
| 15132 | + | |
| 15133 | + | |
| 15134 | + | |
| 15135 | + | |
| 15136 | + | |
| 15137 | + | |
| 15138 | + | |
| 15139 | + | |
| 15140 | + | |
| 15141 | + | |
| 15142 | + | |
| 15143 | + | |
| 15144 | + | |
| 15145 | + | |
| 15146 | + | |
| 15147 | + | |
| 15148 | + | |
| 15149 | + | |
| 15150 | + | |
| 15151 | + | |
| 15152 | + | |
| 15153 | + | |
| 15154 | + | |
| 15155 | + | |
| 15156 | + | |
| 15157 | + | |
| 15158 | + | |
| 15159 | + | |
| 15160 | + | |
| 15161 | + | |
| 15162 | + | |
| 15163 | + | |
| 15164 | + | |
| 15165 | + | |
| 15166 | + | |
15045 | 15167 | | |
15046 | 15168 | | |
15047 | 15169 | | |
| |||
0 commit comments