Skip to content

Conversation

@pjbgf
Copy link
Owner

@pjbgf pjbgf commented Jul 11, 2025

The previous field holding the block function was causing two allocations per operation. The logic only existed for testing
purposes, and therefore had no reason to be kept based on the performance impact.

As a result, the performance specially for smaller payloads increased up to 24% for AMD64. Likewise, the processing time
dropped by up to 33% in similar input sizes.

The benchstat results for AMD64 can be seen below:

goos: linux
goarch: amd64
pkg: github.com/pjbgf/sha1cd/test
cpu: AMD Ryzen 7 PRO 8840HS w/ Radeon 780M Graphics
                                    │     /tmp/before      │                  /tmp/after                  │
                                    │        sec/op        │        sec/op         vs base                │
CalculateDvMask/generic-16            0.0000001500n ± 167%   0.0000001500n ±  33%        ~ (p=0.807 n=10)
CalculateDvMask/native-16             0.0000002500n ± 100%   0.0000001500n ± 100%        ~ (p=0.107 n=10)
CalculateDvMask/cgo-16                0.0000004000n ± 250%   0.0000005500n ±  45%        ~ (p=0.205 n=10)
Hash8Bytes/sha1-16                           102.1n ±   2%          103.4n ±   1%        ~ (p=0.255 n=10)
Hash8Bytes/sha1cd_native-16                  247.3n ±   2%          185.7n ±   2%  -24.89% (p=0.000 n=10)
Hash8Bytes/sha1cd_generic-16                 273.1n ±   1%          214.8n ±   1%  -21.36% (p=0.000 n=10)
Hash8Bytes/sha1cd_cgo-16                     723.2n ±   2%          710.4n ±   1%   -1.76% (p=0.006 n=10)
Hash320Bytes/sha1-16                         391.2n ±   1%          388.9n ±   0%   -0.58% (p=0.004 n=10)
Hash320Bytes/sha1cd_native-16                975.9n ±   1%          904.2n ±   1%   -7.35% (p=0.000 n=10)
Hash320Bytes/sha1cd_generic-16               1.143µ ±   1%          1.084µ ±   0%   -5.16% (p=0.000 n=10)
Hash320Bytes/sha1cd_cgo-16                   1.272µ ±   1%          1.273µ ±   1%        ~ (p=0.754 n=10)
Hash1K/sha1-16                               803.6n ±   3%          781.7n ±   0%   -2.73% (p=0.014 n=10)
Hash1K/sha1cd_native-16                      2.797µ ±   4%          2.507µ ±   1%  -10.37% (p=0.000 n=10)
Hash1K/sha1cd_generic-16                     3.286µ ±   6%          2.971µ ±   0%   -9.57% (p=0.000 n=10)
Hash1K/sha1cd_cgo-16                         2.381µ ±   3%          2.287µ ±   2%   -3.95% (p=0.000 n=10)
Hash8K/sha1-16                               5.164µ ±   2%          5.014µ ±   1%   -2.91% (p=0.000 n=10)
Hash8K/sha1cd_native-16                      19.89µ ±   1%          18.71µ ±   0%   -5.95% (p=0.000 n=10)
Hash8K/sha1cd_generic-16                     23.80µ ±   1%          22.33µ ±   2%   -6.17% (p=0.000 n=10)
Hash8K/sha1cd_cgo-16                         13.23µ ±   3%          12.43µ ±   1%   -6.00% (p=0.000 n=10)
HashWithCollision/sha1cd_native-16           2.994µ ±   2%          2.818µ ±   1%   -5.88% (p=0.000 n=10)
HashWithCollision/sha1cd_generic-16          3.242µ ±   2%          3.108µ ±   0%   -4.15% (p=0.000 n=10)
HashWithCollision/sha1cd_cgo-16              2.519µ ±   1%          2.493µ ±   2%        ~ (p=0.138 n=10)
geomean                                      79.11n                 74.01n          -6.45%

                                    │  /tmp/before   │                /tmp/after                 │
                                    │      B/op      │     B/op      vs base                     │
CalculateDvMask/generic-16              0.000 ± 0%       0.000 ± 0%         ~ (p=1.000 n=10) ¹
CalculateDvMask/native-16               0.000 ± 0%       0.000 ± 0%         ~ (p=1.000 n=10) ¹
CalculateDvMask/cgo-16                  0.000 ± 0%       0.000 ± 0%         ~ (p=1.000 n=10) ¹
Hash8Bytes/sha1-16                      0.000 ± 0%       0.000 ± 0%         ~ (p=1.000 n=10) ¹
Hash8Bytes/sha1cd_native-16             208.0 ± 0%         0.0 ± 0%  -100.00% (p=0.000 n=10)
Hash8Bytes/sha1cd_generic-16            208.0 ± 0%         0.0 ± 0%  -100.00% (p=0.000 n=10)
Hash8Bytes/sha1cd_cgo-16              2.625Ki ± 0%     2.625Ki ± 0%         ~ (p=1.000 n=10) ¹
Hash320Bytes/sha1-16                    0.000 ± 0%       0.000 ± 0%         ~ (p=1.000 n=10) ¹
Hash320Bytes/sha1cd_native-16           208.0 ± 0%         0.0 ± 0%  -100.00% (p=0.000 n=10)
Hash320Bytes/sha1cd_generic-16          208.0 ± 0%         0.0 ± 0%  -100.00% (p=0.000 n=10)
Hash320Bytes/sha1cd_cgo-16            2.625Ki ± 0%     2.625Ki ± 0%         ~ (p=1.000 n=10) ¹
Hash1K/sha1-16                          0.000 ± 0%       0.000 ± 0%         ~ (p=1.000 n=10) ¹
Hash1K/sha1cd_native-16                 208.0 ± 0%         0.0 ± 0%  -100.00% (p=0.000 n=10)
Hash1K/sha1cd_generic-16                208.0 ± 0%         0.0 ± 0%  -100.00% (p=0.000 n=10)
Hash1K/sha1cd_cgo-16                  2.625Ki ± 0%     2.625Ki ± 0%         ~ (p=1.000 n=10) ¹
Hash8K/sha1-16                          0.000 ± 0%       0.000 ± 0%         ~ (p=1.000 n=10) ¹
Hash8K/sha1cd_native-16                 208.0 ± 0%         0.0 ± 0%  -100.00% (p=0.000 n=10)
Hash8K/sha1cd_generic-16                208.0 ± 0%         0.0 ± 0%  -100.00% (p=0.000 n=10)
Hash8K/sha1cd_cgo-16                  2.625Ki ± 0%     2.625Ki ± 0%         ~ (p=1.000 n=10) ¹
HashWithCollision/sha1cd_native-16      208.0 ± 0%         0.0 ± 0%  -100.00% (p=0.000 n=10)
HashWithCollision/sha1cd_generic-16     208.0 ± 0%         0.0 ± 0%  -100.00% (p=0.000 n=10)
HashWithCollision/sha1cd_cgo-16       2.625Ki ± 0%     2.625Ki ± 0%         ~ (p=1.000 n=10) ¹
geomean                                            ²                 ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                                    │ /tmp/before  │               /tmp/after                │
                                    │  allocs/op   │ allocs/op   vs base                     │
CalculateDvMask/generic-16            0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
CalculateDvMask/native-16             0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
CalculateDvMask/cgo-16                0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
Hash8Bytes/sha1-16                    0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
Hash8Bytes/sha1cd_native-16           2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash8Bytes/sha1cd_generic-16          2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash8Bytes/sha1cd_cgo-16              1.000 ± 0%     1.000 ± 0%         ~ (p=1.000 n=10) ¹
Hash320Bytes/sha1-16                  0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
Hash320Bytes/sha1cd_native-16         2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash320Bytes/sha1cd_generic-16        2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash320Bytes/sha1cd_cgo-16            1.000 ± 0%     1.000 ± 0%         ~ (p=1.000 n=10) ¹
Hash1K/sha1-16                        0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
Hash1K/sha1cd_native-16               2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash1K/sha1cd_generic-16              2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash1K/sha1cd_cgo-16                  1.000 ± 0%     1.000 ± 0%         ~ (p=1.000 n=10) ¹
Hash8K/sha1-16                        0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
Hash8K/sha1cd_native-16               2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash8K/sha1cd_generic-16              2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash8K/sha1cd_cgo-16                  1.000 ± 0%     1.000 ± 0%         ~ (p=1.000 n=10) ¹
HashWithCollision/sha1cd_native-16    2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
HashWithCollision/sha1cd_generic-16   2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
HashWithCollision/sha1cd_cgo-16       1.000 ± 0%     1.000 ± 0%         ~ (p=1.000 n=10) ¹
geomean                                          ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                                    │ /tmp/before  │              /tmp/after              │
                                    │     B/s      │     B/s       vs base                │
Hash8Bytes/sha1-16                    74.71Mi ± 2%   73.79Mi ± 1%        ~ (p=0.271 n=10)
Hash8Bytes/sha1cd_native-16           30.86Mi ± 2%   41.07Mi ± 1%  +33.12% (p=0.000 n=10)
Hash8Bytes/sha1cd_generic-16          27.93Mi ± 1%   35.52Mi ± 1%  +27.16% (p=0.000 n=10)
Hash8Bytes/sha1cd_cgo-16              10.55Mi ± 2%   10.74Mi ± 1%   +1.76% (p=0.007 n=10)
Hash320Bytes/sha1-16                  780.1Mi ± 1%   784.6Mi ± 0%   +0.57% (p=0.005 n=10)
Hash320Bytes/sha1cd_native-16         312.7Mi ± 1%   337.5Mi ± 1%   +7.93% (p=0.000 n=10)
Hash320Bytes/sha1cd_generic-16        267.0Mi ± 1%   281.6Mi ± 0%   +5.45% (p=0.000 n=10)
Hash320Bytes/sha1cd_cgo-16            240.0Mi ± 1%   239.8Mi ± 1%        ~ (p=0.796 n=10)
Hash1K/sha1-16                        1.187Gi ± 3%   1.220Gi ± 0%   +2.81% (p=0.015 n=10)
Hash1K/sha1cd_native-16               349.2Mi ± 4%   389.5Mi ± 1%  +11.57% (p=0.000 n=10)
Hash1K/sha1cd_generic-16              297.3Mi ± 6%   328.7Mi ± 0%  +10.57% (p=0.000 n=10)
Hash1K/sha1cd_cgo-16                  410.2Mi ± 3%   427.1Mi ± 1%   +4.11% (p=0.000 n=10)
Hash8K/sha1-16                        1.478Gi ± 2%   1.522Gi ± 1%   +3.00% (p=0.000 n=10)
Hash8K/sha1cd_native-16               392.7Mi ± 1%   417.6Mi ± 0%   +6.33% (p=0.000 n=10)
Hash8K/sha1cd_generic-16              328.3Mi ± 1%   349.9Mi ± 2%   +6.58% (p=0.000 n=10)
Hash8K/sha1cd_cgo-16                  590.7Mi ± 3%   628.4Mi ± 1%   +6.38% (p=0.000 n=10)
HashWithCollision/sha1cd_native-16    203.9Mi ± 2%   216.7Mi ± 1%   +6.27% (p=0.000 n=10)
HashWithCollision/sha1cd_generic-16   188.2Mi ± 2%   196.4Mi ± 0%   +4.34% (p=0.000 n=10)
HashWithCollision/sha1cd_cgo-16       242.3Mi ± 1%   244.9Mi ± 2%        ~ (p=0.143 n=10)
geomean                               227.2Mi        242.9Mi        +6.93%

@pjbgf pjbgf added the breaking label Jul 11, 2025
The previous field holding the block function was causing two
allocations per operation. The logic only existed for testing
purposes, and therefore had no reason to be kept based on the
performance impact.

As a result, the performance specially for smaller payloads
increased up to 24% for AMD64. Likewise, the processing time
dropped by up to 33% in similar input sizes.

A short version of the benchstat can be seen below:

                                    │     /tmp/before      │                  /tmp/after                  │
                                    │        sec/op        │        sec/op         vs base
Hash8Bytes/sha1cd_native-16                  247.3n ±   2%          185.7n ±   2%  -24.89% (p=0.000 n=10)
Hash8Bytes/sha1cd_generic-16                 273.1n ±   1%          214.8n ±   1%  -21.36% (p=0.000 n=10)
Hash320Bytes/sha1cd_native-16                975.9n ±   1%          904.2n ±   1%   -7.35% (p=0.000 n=10)
Hash320Bytes/sha1cd_generic-16               1.143µ ±   1%          1.084µ ±   0%   -5.16% (p=0.000 n=10)
Hash1K/sha1cd_native-16                      2.797µ ±   4%          2.507µ ±   1%  -10.37% (p=0.000 n=10)
Hash1K/sha1cd_generic-16                     3.286µ ±   6%          2.971µ ±   0%   -9.57% (p=0.000 n=10)
Hash8K/sha1cd_native-16                      19.89µ ±   1%          18.71µ ±   0%   -5.95% (p=0.000 n=10)
Hash8K/sha1cd_generic-16                     23.80µ ±   1%          22.33µ ±   2%   -6.17% (p=0.000 n=10)
HashWithCollision/sha1cd_native-16           2.994µ ±   2%          2.818µ ±   1%   -5.88% (p=0.000 n=10)
HashWithCollision/sha1cd_generic-16          3.242µ ±   2%          3.108µ ±   0%   -4.15% (p=0.000 n=10)

                                    │ /tmp/before  │               /tmp/after                │
                                    │  allocs/op   │ allocs/op   vs base                     │
Hash8Bytes/sha1cd_native-16           2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash8Bytes/sha1cd_generic-16          2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash320Bytes/sha1cd_native-16         2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash320Bytes/sha1cd_generic-16        2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash1K/sha1cd_native-16               2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash1K/sha1cd_generic-16              2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash8K/sha1cd_native-16               2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Hash8K/sha1cd_generic-16              2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
HashWithCollision/sha1cd_native-16    2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
HashWithCollision/sha1cd_generic-16   2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)

                                    │ /tmp/before  │              /tmp/after              │
                                    │     B/s      │     B/s       vs base                │
Hash8Bytes/sha1cd_native-16           30.86Mi ± 2%   41.07Mi ± 1%  +33.12% (p=0.000 n=10)
Hash8Bytes/sha1cd_generic-16          27.93Mi ± 1%   35.52Mi ± 1%  +27.16% (p=0.000 n=10)
Hash320Bytes/sha1cd_native-16         312.7Mi ± 1%   337.5Mi ± 1%   +7.93% (p=0.000 n=10)
Hash320Bytes/sha1cd_generic-16        267.0Mi ± 1%   281.6Mi ± 0%   +5.45% (p=0.000 n=10)
Hash1K/sha1cd_native-16               349.2Mi ± 4%   389.5Mi ± 1%  +11.57% (p=0.000 n=10)
Hash1K/sha1cd_generic-16              297.3Mi ± 6%   328.7Mi ± 0%  +10.57% (p=0.000 n=10)
Hash8K/sha1cd_native-16               392.7Mi ± 1%   417.6Mi ± 0%   +6.33% (p=0.000 n=10)
Hash8K/sha1cd_generic-16              328.3Mi ± 1%   349.9Mi ± 2%   +6.58% (p=0.000 n=10)
HashWithCollision/sha1cd_native-16    203.9Mi ± 2%   216.7Mi ± 1%   +6.27% (p=0.000 n=10)
HashWithCollision/sha1cd_generic-16   188.2Mi ± 2%   196.4Mi ± 0%   +4.34% (p=0.000 n=10)

Signed-off-by: Paulo Gomes <[email protected]>
@pjbgf pjbgf merged commit 59a70f1 into main Jul 11, 2025
13 checks passed
@pjbgf pjbgf deleted the refactor branch July 11, 2025 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant