Skip to content

Conversation

@tomcur
Copy link
Member

@tomcur tomcur commented Jan 6, 2026

By using a const generic to dispatch statically between bilinear and bicubic sampling, on x86 we get a 7% timing decrease for bilinear sampling (medium quality) in the f32 pipeline. It appears not to impact timings for bicubic sampling (high quality).

The benchmark was performed by temporarily adding medium and high quality variants of transform::rotate in
sparse_strips/vello_bench/src/fine/image.rs.

bench

Benchmarking fine/image/transform/rotate_medium_f32_scalar: Collecting 100 samples in estimated 5.0177 s (495k itfine/image/transform/rotate_medium_f32_scalar
                        time:   [10.137 µs 10.144 µs 10.151 µs]
                        change: [-7.5548% -7.4067% -7.2867%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  6 (6.00%) high severe

Benchmarking fine/image/transform/rotate_high_f32_scalar: Collecting 100 samples in estimated 5.0365 s (162k iterfine/image/transform/rotate_high_f32_scalar
                        time:   [31.163 µs 31.293 µs 31.556 µs]
                        change: [-0.1547% +0.4795% +1.2307%] (p = 0.19 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) high mild
  10 (10.00%) high severe

By using a `const` generic to dispatch statically between bilinear and
bicubic sampling, we get a 7% timing decrease for bilinear sampling
(medium quality) in the f32 pipeline. It appears not to impact bicubic
sampling (high quality).

The benchmark was performed by temporarily adding medium and high
quality variants of `transform::rotate` in
`sparse_strips/vello_bench/src/fine/image.rs`.

```
fine/image/transform/rotate_medium_f32_scalar
                        time:   [10.142 µs 10.149 µs 10.157 µs]
                        change: [-7.5294% -7.3741% -7.2353%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 26 outliers among 100 measurements (26.00%)
  4 (4.00%) low severe
  9 (9.00%) low mild
  6 (6.00%) high mild
  7 (7.00%) high severe

Benchmarking fine/image/transform/rotate_high_f32_scalar: Collecting 100 samples in estimated 5.0357 s (162k iterfine/image/transform/rotate_high_f32_scalar
                        time:   [31.144 µs 31.175 µs 31.215 µs]
                        change: [-0.3454% +0.1490% +0.4994%] (p = 0.57 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) high mild
  12 (12.00%) high severe
```
@nicoburns nicoburns added the C-cpu Applies to the vello_cpu crate label Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-cpu Applies to the vello_cpu crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants