Commit 0b34e6b
apacheGH-46777: [C++] Use SimplifyIsIn only when the value_set of the expression is lower than a threshold (apache#46859)
### Rationale for this change
Using `SimplifyIsIn` when the value set is large has a substantial performance penalty.
### What changes are included in this PR?
Ensure we do not use the simplification when the value_set on the expression is higher than a threshold (50).
### Are these changes tested?
I've tested locally that the reproducer goes back to pre change levels.
```
$ python read.py
=== PYARROW VERSION 20 ===
Retrieved 10,000,000 rows in 3.08 seconds.
```
I have added a test for large sets and validate the expression is not being modified.
### Are there any user-facing changes?
No
* GitHub Issue: apache#46777
Lead-authored-by: Raúl Cumplido <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>1 parent 1e01c52 commit 0b34e6b
2 files changed
+29
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1267 | 1267 | | |
1268 | 1268 | | |
1269 | 1269 | | |
| 1270 | + | |
| 1271 | + | |
| 1272 | + | |
| 1273 | + | |
| 1274 | + | |
| 1275 | + | |
| 1276 | + | |
| 1277 | + | |
| 1278 | + | |
1270 | 1279 | | |
1271 | 1280 | | |
1272 | 1281 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
80 | 80 | | |
81 | 81 | | |
82 | 82 | | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
83 | 93 | | |
84 | 94 | | |
85 | 95 | | |
| |||
1679 | 1689 | | |
1680 | 1690 | | |
1681 | 1691 | | |
| 1692 | + | |
| 1693 | + | |
| 1694 | + | |
| 1695 | + | |
| 1696 | + | |
| 1697 | + | |
| 1698 | + | |
| 1699 | + | |
| 1700 | + | |
| 1701 | + | |
1682 | 1702 | | |
1683 | 1703 | | |
1684 | 1704 | | |
| |||
0 commit comments