running this script: https://github.com/zarr-developers/cast-value.py/blob/main/examples/benchmarks/bench_numpy_vs_rust.py
will compare a python / NumPy implementation of cast_value against the implementation defined in this repo. On my AMD machine, I see numbers like this:
uv run examples/benchmarks/bench_numpy_vs_rust.py
Array size: 1,000,000 elements
Configuration Impl Throughput Memory
-----------------------------------------------------------------------------
float64 -> float32 (simple narrowing) numpy 6.4G/s 3.8 MB
float64 -> float32 (simple narrowing) rust 972.0M/s 3.8 MB
float64 -> int32 (round nearest-even) numpy 354.3M/s 16.2 MB
float64 -> int32 (round nearest-even) rust 141.1M/s 3.8 MB
float64 -> float32 (round towards-zero) numpy 18.8M/s 72.5 MB
float64 -> float32 (round towards-zero) rust 119.7M/s 3.8 MB
float64 -> uint8 (clamp, SIMD path) numpy 318.0M/s 17.2 MB
float64 -> uint8 (clamp, SIMD path) rust 5.3G/s 976.9 KB
float64 -> int32 (scalar_map: NaN/Inf/-Inf) numpy 228.9M/s 16.2 MB
float64 -> int32 (scalar_map: NaN/Inf/-Inf) rust 172.5M/s 3.8 MB
in every case where we are slower than NumPy, we should see if there are optimization opportunities. memory usage looks good across the board though.
running this script: https://github.com/zarr-developers/cast-value.py/blob/main/examples/benchmarks/bench_numpy_vs_rust.py
will compare a python / NumPy implementation of cast_value against the implementation defined in this repo. On my AMD machine, I see numbers like this:
in every case where we are slower than NumPy, we should see if there are optimization opportunities. memory usage looks good across the board though.