Skip to content

Improved nearest interpolation#13

Merged
awxkee merged 1 commit into
awxkee:masterfrom
RunDevelopment:faster-nearest
Feb 10, 2026
Merged

Improved nearest interpolation#13
awxkee merged 1 commit into
awxkee:masterfrom
RunDevelopment:faster-nearest

Conversation

@RunDevelopment

@RunDevelopment RunDevelopment commented Feb 10, 2026

Copy link
Copy Markdown
Contributor

I changed the implementation of NN interpolation to improve it in two ways: fixed point and better rounding.

Better rounding

I fixed a minor rounding issue in the way coordinates were mapped to the source image. It used to do the following (using the X coord here, Y has the same issue):

let x_scale = src_width as f32 / dst_width as f32;
let clip_width = src_width as f32 - 1f32;
let src_x = ((dst_x as f32 + 0.5f32) * x_scale - 0.5f32).min(clip_width).max(0f32) as usize;

which is mathematically equivalent to:

$$ src_x = \lfloor \max(0, \min(src_w-1, (dst_x + 0.5) * src_w / dst_w - 0.5)) \rfloor $$

The problem is that as usize performs truncation (or here flooring) to get an integer. This is incorrect, because we want the nearest source pixel. In other words, it should have been .round() as usize.

Of course, rounding to nearest for numbers $r>-0.5$ is just $\lfloor r +0.5 \rfloor$, so we actually just remove the - 0.5 to get rounding to the nearest integer. So:

let x_scale = src_width as f32 / dst_width as f32;
let clip_width = src_width as f32 - 1f32;
let src_x = ((dst_x as f32 + 0.5f32) * x_scale).min(clip_width) as usize;

This would be correct.

Fixed point

Firstly, I switched to a fixed-point arithmetic approach to avoid floating point. This means that FP rounding error is a non-issue and that costly FP operations can be avoided. This alone should make nearest interpolation a fair bit faster now.

The code for coords works similar to the FP version. Instead of a scale variable, it's k and k_half.

let k_y: u64 = ((src_height as u64) << SHIFT) / dst_height as u64;
let k_y_half: u64 = k_y >> 1;
let src_y = ((dst_y as u64 * k_y + k_y_half) >> SHIFT) as usize;

Ignoring the SHIFT for now, this is correct because:

$$ src_y = round((dst_y + 0.5) * k_y - 0.5) = \lfloor (dst_y + 0.5) * k_y \rfloor = \lfloor dst_y * k_y + \frac{k_y}{2} \rfloor $$

Back to SHIFT: I used 32 bits for the fractional part. This should be accurate enough so that all results are exact for all images with dimensions <2^31. For dimensions >=2^31 and <2^32, I expect minor rounding issues. For dimensions >=2^32, overflow will occur and results will be wildly incorrect. I think this is acceptable, since the f32 version suffered from rounding issues for dimensions >2^24 (and I'm not sure if dimensions >u32::MAX are worth supporting).

Also, the computation for src_x is a bit more obfuscated, because I avoided a multiplication by just adding k_x on every loop iteration to make things faster.


As for tests and benchmarks... how do they work in this repo?

@awxkee

awxkee commented Feb 10, 2026

Copy link
Copy Markdown
Owner

This is not related to rounding, at least not directly. Nearest neighbor sampling works by moving pixel center and after scale it's required to map coordinate back in grid. This behavior is not achieved by simple rounding.

You can verify that the results are simply wrong if the center is not shifted back:

fn main() {
    let src_width = 900usize;
    let dst_width = 315usize;
    let x_scale = src_width as f32 / dst_width as f32;

    let clip_width = src_width as f32 - 1f32;

    let mut points = Vec::new();
    let mut rpoints = Vec::new();

    for x in 0..dst_width {
        let src_x = ((x as f32 + 0.5f32) * x_scale - 0.5f32)
            .min(clip_width)
            .max(0f32) as usize;
        points.push(src_x);

        let r_src_x = ((x as f32 + 0.5f32) * x_scale).min(clip_width).max(0f32) as usize;
        rpoints.push(r_src_x);
    }

    println!("{:?} end {:?}", &points[0..10], &points[points.len() - 10..]);
    println!("rpoints {:?} end {:?}", &rpoints[0..10], &rpoints[rpoints.len() - 10..]);
}

Where results are

[0, 3, 6, 9, 12, 15, 18, 20, 23, 26] end [872, 875, 878, 880, 883, 886, 889, 892, 895, 898]
rpoints [1, 4, 7, 10, 12, 15, 18, 21, 24, 27] end [872, 875, 878, 881, 884, 887, 890, 892, 895, 898]

So the image will have offset by half sample because the computation used pixel center instead of pixel itself. Therefore applying floor or subtracting the center bias is a required part of the algorithm not rounding to the nearest value.
You can validate this logic is assumed other libraries for example in OpenCV.

@awxkee

awxkee commented Feb 10, 2026

Copy link
Copy Markdown
Owner

As for tests and benchmarks... how do they work in this repo?

Benchmarks embedded in separate project in workspace.

cargo bench --bench resize_rgb --manifest-path ./app/Cargo.toml

As for tests, there are currently no tests here.

@RunDevelopment

RunDevelopment commented Feb 10, 2026

Copy link
Copy Markdown
Contributor Author

Okay, so I'm confused about what you want to say here. The OpenCV code you linked:

size_t src_y = static_cast<size_t>(floorf((dst_y + 0.5f) * hr));

literally does exactly what I said correct rounding should do. To repeat my suggested code (ignore the unnecessary .min):

let src_x = ((dst_x as f32 + 0.5f32) * x_scale).min(clip_width) as usize;

f32 to int casts in Rust are defined as rounding toward zero. In other words, they truncate. In other other words, they floor for all numbers >= -0.5. The OpenCV code and my suggested Rust code do the same thing.

So did you link the OpenCV code as an example for how NN is done correctly or not?

Nearest neighbor sampling works by moving pixel center and after scale it's required to map coordinate back in grid. This behavior is not achieved by simple rounding.

I'm well aware. The formula for a mapped coord is $(dst_x + 0.5) * src_w / dst_w - 0.5$. But you're forgetting that the mapped coord is rarely an integer, so rounding function has to be applied to get an index into the pixel grid.

What you implemented in PSS is this:

$$ src_x = \lfloor (dst_x + 0.5) * src_w / dst_w - 0.5 \rfloor $$

But it should have been this:

$$ src_x = round( (dst_x + 0.5) * src_w / dst_w - 0.5 ) = \lfloor (dst_x + 0.5) * src_w / dst_w \rfloor $$

That's what I meant with better/correct rounding.

@awxkee

awxkee commented Feb 10, 2026

Copy link
Copy Markdown
Owner

Ah yeah, I think its better align this with OpenCV, and I checked ITU-R they do this is the same.

Thanks for the PR!

@awxkee awxkee merged commit 612b194 into awxkee:master Feb 10, 2026
1 check passed
@RunDevelopment RunDevelopment deleted the faster-nearest branch February 10, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants