Handle chunk loop in assembly while aligning both `arm64` and `amd64` implementations #196

pjbgf · 2025-07-27T13:00:17Z

The changes are a stepping stone for future SIMD optimisations. New comments have been added throughout the assembly code to make it easier to review and maintain the code in the future, with special attention to the stack layout.

The block implementation for both architecture was handling the chunk loop in Go, as well as calling the checkCollision. Both have now been moved to assembly.

Signed-off-by: Paulo Gomes <[email protected]>

The native implementation for the DV mask calculation were missing the noescape directives. For further optimisation, the wrapping funcs are now marked with nosplit. Signed-off-by: Paulo Gomes <[email protected]>

Signed-off-by: Paulo Gomes <[email protected]>

The recent changes seem to have caused a bug when -race is enabled, additional tests are under way to understand where the problem lies. Once the SIMD implementation is in place, this needs to be reverted. Signed-off-by: Paulo Gomes <[email protected]>

pjbgf · 2025-08-19T12:47:38Z

Superseded by #198 due to the additional complexity this was introducing.

pjbgf added 6 commits July 27, 2025 13:52

cgo: Align digest creation with purego implementation

aae428e

Signed-off-by: Paulo Gomes <[email protected]>

ubc: Add noescape and nosplit directives

c095718

The native implementation for the DV mask calculation were missing the noescape directives. For further optimisation, the wrapping funcs are now marked with nosplit. Signed-off-by: Paulo Gomes <[email protected]>

Run golden tests on generic and native implementations

631bc42

Signed-off-by: Paulo Gomes <[email protected]>

amd64: Move chunk loop to assembly code

5db538a

Signed-off-by: Paulo Gomes <[email protected]>

arm64: Align implementation with amd64

5defa48

Signed-off-by: Paulo Gomes <[email protected]>

build: Temporary removal of -race

06c8507

The recent changes seem to have caused a bug when -race is enabled, additional tests are under way to understand where the problem lies. Once the SIMD implementation is in place, this needs to be reverted. Signed-off-by: Paulo Gomes <[email protected]>

pjbgf closed this Aug 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle chunk loop in assembly while aligning both `arm64` and `amd64` implementations #196

Handle chunk loop in assembly while aligning both `arm64` and `amd64` implementations #196

Uh oh!

pjbgf commented Jul 27, 2025

Uh oh!

pjbgf commented Aug 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Handle chunk loop in assembly while aligning both arm64 and amd64 implementations #196

Handle chunk loop in assembly while aligning both arm64 and amd64 implementations #196

Uh oh!

Conversation

pjbgf commented Jul 27, 2025

Uh oh!

pjbgf commented Aug 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Handle chunk loop in assembly while aligning both `arm64` and `amd64` implementations #196

Handle chunk loop in assembly while aligning both `arm64` and `amd64` implementations #196