perf: Optimize `lower`, `upper` for ASCII inputs by neilconway · Pull Request #21980 · apache/datafusion

neilconway · 2026-05-01T22:27:02Z

Which issue does this PR close?

Closes lower, upper could be further optimized for ASCII-only inputs #21813.

Rationale for this change

This PR implements two optimizations for lower and upper on ASCII strings:

For the Utf8/LargeUtf8 code path, we previously did the case conversion via str::to_uppercase or str::to_lowercase. For ASCII inputs, it is a bit faster to use map(u8::to_ascii_lowercase).collect() over the bytes of the string directly: although the stdlib functions are well-optimized, they need to check again on every string to see if it is ASCII. Since we know the input is all-ASCII, we can avoid that check.
The Utf8View code path previously wasn't optimized for ASCII strings; add a new code path that is.

Benchmarks (ARM64):

upper

upper_all_values_are_ascii: 5.4 → 4.1 µs (−24.1%)

lower — all-ASCII (the optimized paths)

lower_all_values_are_ascii: 1024: 5.4 → 4.0 µs (−25.9%)
lower_all_values_are_ascii: 4096: 22.6 → 15.6 µs (−31.0%)
lower_all_values_are_ascii: 8192: 41.9 → 30.8 µs (−26.5%)
string_views size:4096 str_len:10 null:0 mixed:false: 151.0 → 75.3 µs (−50.1%)
string_views size:4096 str_len:10 null:0 mixed:true: 175.9 → 134.6 µs (−23.5%)
string_views size:4096 str_len:10 null:0.1 mixed:false: 143.5 → 76.8 µs (−46.5%)
string_views size:4096 str_len:10 null:0.1 mixed:true: 166.6 → 125.0 µs (−25.0%)
string_views size:4096 str_len:64 null:0 mixed:false: 150.1 → 92.7 µs (−38.2%)
string_views size:4096 str_len:64 null:0 mixed:true: 185.2 → 140.1 µs (−24.4%)
string_views size:4096 str_len:64 null:0.1 mixed:false: 136.7 → 97.0 µs (−29.0%)
string_views size:4096 str_len:64 null:0.1 mixed:true: 173.7 → 131.2 µs (−24.5%)
string_views size:4096 str_len:128 null:0 mixed:false: 190.3 → 141.7 µs (−25.5%)
string_views size:4096 str_len:128 null:0 mixed:true: 197.0 → 153.7 µs (−22.0%)
string_views size:4096 str_len:128 null:0.1 mixed:false: 173.3 → 141.7 µs (−18.2%)
string_views size:4096 str_len:128 null:0.1 mixed:true: 184.0 → 142.8 µs (−22.4%)
string_views size:8192 str_len:10 null:0 mixed:false: 302.9 → 150.2 µs (−50.4%)
string_views size:8192 str_len:10 null:0 mixed:true: 352.9 → 279.0 µs (−20.9%)
string_views size:8192 str_len:10 null:0.1 mixed:false: 285.0 → 154.3 µs (−45.9%)
string_views size:8192 str_len:10 null:0.1 mixed:true: 334.2 → 266.4 µs (−20.3%)
string_views size:8192 str_len:64 null:0 mixed:false: 295.6 → 184.4 µs (−37.6%)
string_views size:8192 str_len:64 null:0 mixed:true: 371.4 → 290.7 µs (−21.7%)
string_views size:8192 str_len:64 null:0.1 mixed:false: 273.7 → 195.1 µs (−28.7%)
string_views size:8192 str_len:64 null:0.1 mixed:true: 347.0 → 279.6 µs (−19.4%)
string_views size:8192 str_len:128 null:0 mixed:false: 379.6 → 285.6 µs (−24.8%)
string_views size:8192 str_len:128 null:0 mixed:true: 397.1 → 317.4 µs (−20.1%)
string_views size:8192 str_len:128 null:0.1 mixed:false: 364.1 → 285.1 µs (−21.7%)
string_views size:8192 str_len:128 null:0.1 mixed:true: 379.3 → 302.3 µs (−20.3%)
lower_sliced_ascii parent=65536 slice=128 str_len=32: 980.2 → 797.9 ns (−18.6%)

lower — some non-ASCII string_views (mostly noise)

size:4096 str_len:10 null:0 mixed:false: 374.5 → 362.2 µs (−3.3%)
size:4096 str_len:10 null:0 mixed:true: 374.6 → 380.5 µs (+1.6%)
size:4096 str_len:10 null:0.1 mixed:false: 340.8 → 356.5 µs (+4.6%)
size:4096 str_len:10 null:0.1 mixed:true: 344.0 → 352.5 µs (+2.5%)
size:4096 str_len:64 null:0 mixed:false: 377.5 → 373.5 µs (−1.1%)
size:4096 str_len:64 null:0 mixed:true: 380.6 → 375.0 µs (−1.5%)
size:4096 str_len:64 null:0.1 mixed:false: 330.7 → 341.8 µs (+3.4%)
size:4096 str_len:64 null:0.1 mixed:true: 341.8 → 354.2 µs (+3.6%)
size:4096 str_len:128 null:0 mixed:false: 371.8 → 356.2 µs (−4.2%)
size:4096 str_len:128 null:0 mixed:true: 378.9 → 386.0 µs (+1.9%)
size:4096 str_len:128 null:0.1 mixed:false: 350.5 → 350.3 µs (−0.1%)
size:4096 str_len:128 null:0.1 mixed:true: 351.0 → 337.9 µs (−3.7%)
size:8192 str_len:10 null:0 mixed:false: 740.0 → 757.2 µs (+2.3%)
size:8192 str_len:10 null:0 mixed:true: 781.3 → 750.2 µs (−4.0%)
size:8192 str_len:10 null:0.1 mixed:false: 693.7 → 693.7 µs (0.0%)
size:8192 str_len:10 null:0.1 mixed:true: 681.5 → 705.2 µs (+3.5%)
size:8192 str_len:64 null:0 mixed:false: 755.5 → 768.6 µs (+1.7%)
size:8192 str_len:64 null:0 mixed:true: 759.6 → 754.3 µs (−0.7%)
size:8192 str_len:64 null:0.1 mixed:false: 711.5 → 667.8 µs (−6.1%)
size:8192 str_len:64 null:0.1 mixed:true: 682.1 → 688.2 µs (+0.9%)
size:8192 str_len:128 null:0 mixed:false: 771.5 → 765.9 µs (−0.7%)
size:8192 str_len:128 null:0 mixed:true: 747.7 → 792.6 µs (+6.0%)
size:8192 str_len:128 null:0.1 mixed:false: 687.1 → 701.3 µs (+2.1%)
size:8192 str_len:128 null:0.1 mixed:true: 679.2 → 696.8 µs (+2.6%)

lower — first/middle non-ASCII (flat)

lower_the_first_value_is_nonascii: 1024: 42.1 → 42.4 µs (+0.7%)
lower_the_first_value_is_nonascii: 4096: 173.9 → 173.3 µs (−0.3%)
lower_the_first_value_is_nonascii: 8192: 350.8 → 349.3 µs (−0.4%)
lower_the_middle_value_is_nonascii: 1024: 42.9 → 42.8 µs (−0.2%)
lower_the_middle_value_is_nonascii: 4096: 175.1 → 176.3 µs (+0.7%)
lower_the_middle_value_is_nonascii: 8192: 353.6 → 354.6 µs (+0.3%)

What changes are included in this PR?

Implement optimizations
Share StringViewArray buffer size constants with the bulk-NULL builders

Are these changes tested?

Covered by existing tests.

Are there any user-facing changes?

No.

comphead · 2026-05-02T04:08:53Z

+            continue;
+        }
+        let mut bytes = view.to_le_bytes();
+        if len <= 12 {


lets add some comments, I assume 12 len is the fast path for german strings?

comphead

Thanks @neilconway
I believe the optimization tightly connected to german strings specifics and it would be nice to comment the byte level work

.

e6c07fc

github-actions Bot added the functions Changes to functions implementation label May 1, 2026

comphead reviewed May 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimize `lower`, `upper` for ASCII inputs#21980

perf: Optimize `lower`, `upper` for ASCII inputs#21980
neilconway wants to merge 1 commit intoapache:mainfrom
neilconway:neilc/perf-case-conv

neilconway commented May 1, 2026 •

edited

Loading

Uh oh!

comphead May 2, 2026

Uh oh!

comphead left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

neilconway commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

comphead May 2, 2026

Choose a reason for hiding this comment

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

neilconway commented May 1, 2026 •

edited

Loading