Skip to content

perf: Optimize lower, upper for ASCII inputs#21980

Open
neilconway wants to merge 1 commit intoapache:mainfrom
neilconway:neilc/perf-case-conv
Open

perf: Optimize lower, upper for ASCII inputs#21980
neilconway wants to merge 1 commit intoapache:mainfrom
neilconway:neilc/perf-case-conv

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

@neilconway neilconway commented May 1, 2026

Which issue does this PR close?

Rationale for this change

This PR implements two optimizations for lower and upper on ASCII strings:

  1. For the Utf8/LargeUtf8 code path, we previously did the case conversion via str::to_uppercase or str::to_lowercase. For ASCII inputs, it is a bit faster to use map(u8::to_ascii_lowercase).collect() over the bytes of the string directly: although the stdlib functions are well-optimized, they need to check again on every string to see if it is ASCII. Since we know the input is all-ASCII, we can avoid that check.
  2. The Utf8View code path previously wasn't optimized for ASCII strings; add a new code path that is.

Benchmarks (ARM64):

upper

  • upper_all_values_are_ascii: 5.4 → 4.1 µs (−24.1%)

lower — all-ASCII (the optimized paths)

  • lower_all_values_are_ascii: 1024: 5.4 → 4.0 µs (−25.9%)
  • lower_all_values_are_ascii: 4096: 22.6 → 15.6 µs (−31.0%)
  • lower_all_values_are_ascii: 8192: 41.9 → 30.8 µs (−26.5%)
  • string_views size:4096 str_len:10 null:0 mixed:false: 151.0 → 75.3 µs (−50.1%)
  • string_views size:4096 str_len:10 null:0 mixed:true: 175.9 → 134.6 µs (−23.5%)
  • string_views size:4096 str_len:10 null:0.1 mixed:false: 143.5 → 76.8 µs (−46.5%)
  • string_views size:4096 str_len:10 null:0.1 mixed:true: 166.6 → 125.0 µs (−25.0%)
  • string_views size:4096 str_len:64 null:0 mixed:false: 150.1 → 92.7 µs (−38.2%)
  • string_views size:4096 str_len:64 null:0 mixed:true: 185.2 → 140.1 µs (−24.4%)
  • string_views size:4096 str_len:64 null:0.1 mixed:false: 136.7 → 97.0 µs (−29.0%)
  • string_views size:4096 str_len:64 null:0.1 mixed:true: 173.7 → 131.2 µs (−24.5%)
  • string_views size:4096 str_len:128 null:0 mixed:false: 190.3 → 141.7 µs (−25.5%)
  • string_views size:4096 str_len:128 null:0 mixed:true: 197.0 → 153.7 µs (−22.0%)
  • string_views size:4096 str_len:128 null:0.1 mixed:false: 173.3 → 141.7 µs (−18.2%)
  • string_views size:4096 str_len:128 null:0.1 mixed:true: 184.0 → 142.8 µs (−22.4%)
  • string_views size:8192 str_len:10 null:0 mixed:false: 302.9 → 150.2 µs (−50.4%)
  • string_views size:8192 str_len:10 null:0 mixed:true: 352.9 → 279.0 µs (−20.9%)
  • string_views size:8192 str_len:10 null:0.1 mixed:false: 285.0 → 154.3 µs (−45.9%)
  • string_views size:8192 str_len:10 null:0.1 mixed:true: 334.2 → 266.4 µs (−20.3%)
  • string_views size:8192 str_len:64 null:0 mixed:false: 295.6 → 184.4 µs (−37.6%)
  • string_views size:8192 str_len:64 null:0 mixed:true: 371.4 → 290.7 µs (−21.7%)
  • string_views size:8192 str_len:64 null:0.1 mixed:false: 273.7 → 195.1 µs (−28.7%)
  • string_views size:8192 str_len:64 null:0.1 mixed:true: 347.0 → 279.6 µs (−19.4%)
  • string_views size:8192 str_len:128 null:0 mixed:false: 379.6 → 285.6 µs (−24.8%)
  • string_views size:8192 str_len:128 null:0 mixed:true: 397.1 → 317.4 µs (−20.1%)
  • string_views size:8192 str_len:128 null:0.1 mixed:false: 364.1 → 285.1 µs (−21.7%)
  • string_views size:8192 str_len:128 null:0.1 mixed:true: 379.3 → 302.3 µs (−20.3%)
  • lower_sliced_ascii parent=65536 slice=128 str_len=32: 980.2 → 797.9 ns (−18.6%)

lower — some non-ASCII string_views (mostly noise)

  • size:4096 str_len:10 null:0 mixed:false: 374.5 → 362.2 µs (−3.3%)
  • size:4096 str_len:10 null:0 mixed:true: 374.6 → 380.5 µs (+1.6%)
  • size:4096 str_len:10 null:0.1 mixed:false: 340.8 → 356.5 µs (+4.6%)
  • size:4096 str_len:10 null:0.1 mixed:true: 344.0 → 352.5 µs (+2.5%)
  • size:4096 str_len:64 null:0 mixed:false: 377.5 → 373.5 µs (−1.1%)
  • size:4096 str_len:64 null:0 mixed:true: 380.6 → 375.0 µs (−1.5%)
  • size:4096 str_len:64 null:0.1 mixed:false: 330.7 → 341.8 µs (+3.4%)
  • size:4096 str_len:64 null:0.1 mixed:true: 341.8 → 354.2 µs (+3.6%)
  • size:4096 str_len:128 null:0 mixed:false: 371.8 → 356.2 µs (−4.2%)
  • size:4096 str_len:128 null:0 mixed:true: 378.9 → 386.0 µs (+1.9%)
  • size:4096 str_len:128 null:0.1 mixed:false: 350.5 → 350.3 µs (−0.1%)
  • size:4096 str_len:128 null:0.1 mixed:true: 351.0 → 337.9 µs (−3.7%)
  • size:8192 str_len:10 null:0 mixed:false: 740.0 → 757.2 µs (+2.3%)
  • size:8192 str_len:10 null:0 mixed:true: 781.3 → 750.2 µs (−4.0%)
  • size:8192 str_len:10 null:0.1 mixed:false: 693.7 → 693.7 µs (0.0%)
  • size:8192 str_len:10 null:0.1 mixed:true: 681.5 → 705.2 µs (+3.5%)
  • size:8192 str_len:64 null:0 mixed:false: 755.5 → 768.6 µs (+1.7%)
  • size:8192 str_len:64 null:0 mixed:true: 759.6 → 754.3 µs (−0.7%)
  • size:8192 str_len:64 null:0.1 mixed:false: 711.5 → 667.8 µs (−6.1%)
  • size:8192 str_len:64 null:0.1 mixed:true: 682.1 → 688.2 µs (+0.9%)
  • size:8192 str_len:128 null:0 mixed:false: 771.5 → 765.9 µs (−0.7%)
  • size:8192 str_len:128 null:0 mixed:true: 747.7 → 792.6 µs (+6.0%)
  • size:8192 str_len:128 null:0.1 mixed:false: 687.1 → 701.3 µs (+2.1%)
  • size:8192 str_len:128 null:0.1 mixed:true: 679.2 → 696.8 µs (+2.6%)

lower — first/middle non-ASCII (flat)

  • lower_the_first_value_is_nonascii: 1024: 42.1 → 42.4 µs (+0.7%)
  • lower_the_first_value_is_nonascii: 4096: 173.9 → 173.3 µs (−0.3%)
  • lower_the_first_value_is_nonascii: 8192: 350.8 → 349.3 µs (−0.4%)
  • lower_the_middle_value_is_nonascii: 1024: 42.9 → 42.8 µs (−0.2%)
  • lower_the_middle_value_is_nonascii: 4096: 175.1 → 176.3 µs (+0.7%)
  • lower_the_middle_value_is_nonascii: 8192: 353.6 → 354.6 µs (+0.3%)

What changes are included in this PR?

  • Implement optimizations
  • Share StringViewArray buffer size constants with the bulk-NULL builders

Are these changes tested?

Covered by existing tests.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the functions Changes to functions implementation label May 1, 2026
continue;
}
let mut bytes = view.to_le_bytes();
if len <= 12 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add some comments, I assume 12 len is the fast path for german strings?

Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neilconway
I believe the optimization tightly connected to german strings specifics and it would be nice to comment the byte level work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lower, upper could be further optimized for ASCII-only inputs

2 participants