Skip to content

Conversation

@pjbgf
Copy link
Owner

@pjbgf pjbgf commented Sep 5, 2025

The native non-SIMD implementation provides minor improvement when compared to the Go version. Given that SHANI is widely available these days, that version is being removed to decrease codebase complexity.

This implementation is roughly 30% faster:

goos: linux
goarch: amd64
pkg: github.com/pjbgf/sha1cd/test
cpu: AMD Ryzen 7 PRO 8840HS w/ Radeon 780M Graphics
                                    │     /tmp/before      │                   /tmp/after                   │
                                    │        sec/op        │        sec/op         vs base                  │
CalculateDvMask/generic-16            0.0000002000n ± 100%
CalculateDvMask/native-16             0.0000001500n ± 167%
CalculateDvMask/cgo-16                0.0000005500n ± 100%   0.0000004500n ±  56%        ~ (p=0.586 n=10)
Hash8Bytes/sha1-16                           102.5n ±   1%          100.8n ±   3%        ~ (p=0.085 n=10)
Hash8Bytes/sha1cd_native-16                  172.7n ±   1%          122.3n ±   1%  -29.15% (p=0.000 n=10)
Hash8Bytes/sha1cd_generic-16                 206.3n ±   1%          209.0n ±   2%   +1.28% (p=0.008 n=10)
Hash8Bytes/sha1cd_cgo-16                     709.0n ±   2%          712.4n ±   1%        ~ (p=0.971 n=10)
Hash320Bytes/sha1-16                         394.9n ±   1%          399.2n ±   1%   +1.08% (p=0.034 n=10)
Hash320Bytes/sha1cd_native-16                888.7n ±   2%          563.1n ±   3%  -36.63% (p=0.000 n=10)
Hash320Bytes/sha1cd_generic-16               1.096µ ±   1%          1.065µ ±   3%   -2.83% (p=0.003 n=10)
Hash320Bytes/sha1cd_cgo-16                   1.261µ ±   2%          1.269µ ±   1%        ~ (p=0.353 n=10)
Hash1K/sha1-16                               792.9n ±   1%          798.9n ±   2%   +0.76% (p=0.007 n=10)
Hash1K/sha1cd_native-16                      2.245µ ±   3%          1.523µ ±   1%  -32.18% (p=0.000 n=10)
Hash1K/sha1cd_generic-16                     2.979µ ±   1%          2.946µ ±   2%        ~ (p=0.190 n=10)
Hash1K/sha1cd_cgo-16                         2.284µ ±   3%          2.357µ ±   2%   +3.24% (p=0.002 n=10)
Hash8K/sha1-16                               5.090µ ±   2%          5.080µ ±   1%        ~ (p=0.897 n=10)
Hash8K/sha1cd_native-16                      17.09µ ±   2%          11.43µ ±   2%  -33.09% (p=0.000 n=10)
Hash8K/sha1cd_generic-16                     22.10µ ±   1%          22.23µ ±   1%        ~ (p=0.105 n=10)
Hash8K/sha1cd_cgo-16                         12.21µ ±   1%          12.29µ ±   2%        ~ (p=0.051 n=10)
HashWithCollision/sha1cd_native-16           2.786µ ±   0%          2.194µ ±   1%  -21.25% (p=0.000 n=10)
HashWithCollision/sha1cd_generic-16          3.084µ ±   1%          3.128µ ±   3%   +1.44% (p=0.017 n=10)
HashWithCollision/sha1cd_cgo-16              2.538µ ±   4%          2.506µ ±   2%   -1.24% (p=0.016 n=10)
CalculateDvMask/go-16                                        0.0000001500n ± 100%
geomean                                      73.89n                 171.9n          -9.52%                ¹
¹ benchmark set differs from baseline; geomeans may not be comparable


                                    │ /tmp/before  │              /tmp/after              │
                                    │     B/s      │     B/s       vs base                │
Hash8Bytes/sha1-16                    74.45Mi ± 1%   75.64Mi ± 3%        ~ (p=0.089 n=10)
Hash8Bytes/sha1cd_native-16           44.18Mi ± 1%   62.35Mi ± 1%  +41.13% (p=0.000 n=10)
Hash8Bytes/sha1cd_generic-16          36.99Mi ± 1%   36.51Mi ± 2%   -1.29% (p=0.008 n=10)
Hash8Bytes/sha1cd_cgo-16              10.76Mi ± 2%   10.71Mi ± 1%        ~ (p=0.926 n=10)
Hash320Bytes/sha1-16                  772.7Mi ± 1%   764.4Mi ± 1%   -1.07% (p=0.043 n=10)
Hash320Bytes/sha1cd_native-16         343.4Mi ± 2%   542.0Mi ± 3%  +57.81% (p=0.000 n=10)
Hash320Bytes/sha1cd_generic-16        278.5Mi ± 1%   286.6Mi ± 3%   +2.90% (p=0.003 n=10)
Hash320Bytes/sha1cd_cgo-16            242.0Mi ± 2%   240.4Mi ± 2%        ~ (p=0.353 n=10)
Hash1K/sha1-16                        1.203Gi ± 1%   1.194Gi ± 1%   -0.76% (p=0.007 n=10)
Hash1K/sha1cd_native-16               435.0Mi ± 2%   641.5Mi ± 1%  +47.48% (p=0.000 n=10)
Hash1K/sha1cd_generic-16              327.8Mi ± 1%   331.5Mi ± 2%        ~ (p=0.190 n=10)
Hash1K/sha1cd_cgo-16                  427.7Mi ± 3%   414.3Mi ± 2%   -3.14% (p=0.002 n=10)
Hash8K/sha1-16                        1.499Gi ± 2%   1.502Gi ± 1%        ~ (p=0.912 n=10)
Hash8K/sha1cd_native-16               457.2Mi ± 2%   683.3Mi ± 2%  +49.45% (p=0.000 n=10)
Hash8K/sha1cd_generic-16              353.6Mi ± 1%   351.4Mi ± 1%        ~ (p=0.105 n=10)
Hash8K/sha1cd_cgo-16                  639.7Mi ± 1%   635.6Mi ± 2%        ~ (p=0.052 n=10)
HashWithCollision/sha1cd_native-16    219.1Mi ± 0%   278.3Mi ± 1%  +26.99% (p=0.000 n=10)
HashWithCollision/sha1cd_generic-16   198.0Mi ± 1%   195.1Mi ± 2%   -1.44% (p=0.023 n=10)
HashWithCollision/sha1cd_cgo-16       240.5Mi ± 4%   243.5Mi ± 2%   +1.25% (p=0.015 n=10)
geomean                               247.1Mi        271.6Mi        +9.93%

⚠️ Note that the UBC native implementations have been removed as part of this PR, and ubc.CalculateDvMaskGeneric is no longer available.

The native non-SIMD implementation provides minor improvement
when compared to the Go version. Given that SHANI is widely available
these days, that version is being removed to decrease codebase complexity.

Signed-off-by: Paulo Gomes <[email protected]>
The initial motivation for introducing native implementation for the UBC
checks, was to act as a stepping stone on a fully native implementation of
sha1cd.

That is no longer being pursued, for two main reasons:
- The non-trivial complexity that comes with a full-on goasm version with
collision detection, which would make it harder to maintain/debug the project.
- The recently added SIMD implementations made the project reasonably fast
for the core supported architectures (arm64 and amd64).

It is important to also note that the performance gained with the native UBC
versions were minor, certainly not worth the complexity they added to the project.

Signed-off-by: Paulo Gomes <[email protected]>
@pjbgf pjbgf added the breaking label Sep 5, 2025
@pjbgf pjbgf merged commit 85c7a3d into main Sep 5, 2025
13 checks passed
@pjbgf pjbgf deleted the amd64-simd branch September 5, 2025 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant