performance improvements

running this script: https://github.com/zarr-developers/cast-value.py/blob/main/examples/benchmarks/bench_numpy_vs_rust.py

will compare a python / NumPy implementation of cast_value against the implementation defined in this repo. On my AMD machine, I see numbers like this:

```
uv run examples/benchmarks/bench_numpy_vs_rust.py
Array size: 1,000,000 elements

Configuration                                    Impl   Throughput     Memory
-----------------------------------------------------------------------------
float64 -> float32 (simple narrowing)           numpy       6.4G/s     3.8 MB
float64 -> float32 (simple narrowing)            rust     972.0M/s     3.8 MB

float64 -> int32 (round nearest-even)           numpy     354.3M/s    16.2 MB
float64 -> int32 (round nearest-even)            rust     141.1M/s     3.8 MB

float64 -> float32 (round towards-zero)         numpy      18.8M/s    72.5 MB
float64 -> float32 (round towards-zero)          rust     119.7M/s     3.8 MB

float64 -> uint8 (clamp, SIMD path)             numpy     318.0M/s    17.2 MB
float64 -> uint8 (clamp, SIMD path)              rust       5.3G/s   976.9 KB

float64 -> int32 (scalar_map: NaN/Inf/-Inf)     numpy     228.9M/s    16.2 MB
float64 -> int32 (scalar_map: NaN/Inf/-Inf)      rust     172.5M/s     3.8 MB
```
in every case where we are slower than NumPy, we should see if there are optimization opportunities. memory usage looks good across the board though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance improvements #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

performance improvements #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions