Skip to content

optimize bit unpack on arm64 using neon instructions#4

Merged
fpetkovski merged 1 commit intoarm64-2from
neon
Oct 26, 2025
Merged

optimize bit unpack on arm64 using neon instructions#4
fpetkovski merged 1 commit intoarm64-2from
neon

Conversation

@achille-roussel
Copy link
Contributor

Follow up to #3, for bit width 1, 2, 4, and 8, the code now uses vectorized algorithms with NEON instructions to yield much higher throughput:

goos: darwin
goarch: arm64
pkg: github.com/parquet-go/bitpack
cpu: Apple M2 Pro
                       │ /tmp/bench_purego.txt │        /tmp/bench_scalar.txt        │        /tmp/bench_vector.txt        │
                       │        sec/op         │   sec/op     vs base                │   sec/op     vs base                │
UnpackInt32/bitWidth=1            124.40n ± 6%   24.11n ± 4%  -80.62% (p=0.000 n=10)   12.16n ± 4%  -90.23% (p=0.000 n=10)
UnpackInt32/bitWidth=2            124.60n ± 2%   25.89n ± 1%  -79.23% (p=0.000 n=10)   11.77n ± 0%  -90.55% (p=0.000 n=10)
UnpackInt32/bitWidth=4            124.05n ± 0%   26.38n ± 2%  -78.73% (p=0.000 n=10)   11.11n ± 1%  -91.05% (p=0.000 n=10)
UnpackInt32/bitWidth=8            124.25n ± 1%   25.47n ± 2%  -79.50% (p=0.000 n=10)   12.40n ± 3%  -90.02% (p=0.000 n=10)
UnpackInt64/bitWidth=1            125.80n ± 2%   24.45n ± 1%  -80.56% (p=0.000 n=10)   17.88n ± 1%  -85.79% (p=0.000 n=10)
UnpackInt64/bitWidth=2            124.45n ± 5%   24.54n ± 3%  -80.28% (p=0.000 n=10)   20.90n ± 3%  -83.21% (p=0.000 n=10)
UnpackInt64/bitWidth=4            123.85n ± 0%   24.54n ± 2%  -80.19% (p=0.000 n=10)   20.38n ± 0%  -83.54% (p=0.000 n=10)
UnpackInt64/bitWidth=8            124.75n ± 2%   24.61n ± 1%  -80.28% (p=0.000 n=10)   17.47n ± 2%  -86.00% (p=0.000 n=10)
geomean                            124.5n        24.99n       -79.93%                  15.04n       -87.92%

                       │ /tmp/bench_purego.txt │         /tmp/bench_scalar.txt          │          /tmp/bench_vector.txt          │
                       │          B/s          │      B/s       vs base                 │      B/s       vs base                  │
UnpackInt32/bitWidth=1            3.834Gi ± 6%   19.777Gi ± 4%  +415.88% (p=0.000 n=10)   39.230Gi ± 4%   +923.29% (p=0.000 n=10)
UnpackInt32/bitWidth=2            3.828Gi ± 2%   18.420Gi ± 1%  +381.23% (p=0.000 n=10)   40.513Gi ± 0%   +958.42% (p=0.000 n=10)
UnpackInt32/bitWidth=4            3.843Gi ± 0%   18.070Gi ± 2%  +370.22% (p=0.000 n=10)   42.948Gi ± 1%  +1017.59% (p=0.000 n=10)
UnpackInt32/bitWidth=8            3.837Gi ± 1%   18.722Gi ± 2%  +387.94% (p=0.000 n=10)   38.437Gi ± 3%   +901.76% (p=0.000 n=10)
UnpackInt64/bitWidth=1            3.790Gi ± 2%   19.502Gi ± 1%  +414.53% (p=0.000 n=10)   26.679Gi ± 1%   +603.89% (p=0.000 n=10)
UnpackInt64/bitWidth=2            3.833Gi ± 5%   19.431Gi ± 3%  +407.00% (p=0.000 n=10)   22.816Gi ± 3%   +495.32% (p=0.000 n=10)
UnpackInt64/bitWidth=4            3.852Gi ± 0%   19.432Gi ± 2%  +404.53% (p=0.000 n=10)   23.391Gi ± 0%   +507.33% (p=0.000 n=10)
UnpackInt64/bitWidth=8            3.823Gi ± 2%   19.382Gi ± 1%  +407.01% (p=0.000 n=10)   27.297Gi ± 2%   +614.07% (p=0.000 n=10)
geomean                           3.830Gi         19.08Gi       +398.29%                   31.70Gi        +727.73%

Signed-off-by: Achille Roussel <achille.roussel@gmail.com>
@achille-roussel achille-roussel self-assigned this Oct 25, 2025
@achille-roussel achille-roussel added the enhancement New feature or request label Oct 25, 2025
@fpetkovski
Copy link
Collaborator

🚀

@fpetkovski fpetkovski merged commit fee6f0c into arm64-2 Oct 26, 2025
4 checks passed
@fpetkovski fpetkovski deleted the neon branch October 26, 2025 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants