perf: optimize `PartialEq` implementation by klkvr · Pull Request #54 · alloy-rs/nybbles

klkvr · 2026-02-10T19:35:09Z

Due to most significant limbs being filled first comparison is suboptimal right now for smaller values

codspeed-hq · 2026-02-10T19:37:00Z

Merging this PR will improve performance by 29.41%

⚡ 4 improved benchmarks
✅ 102 untouched benchmarks
⏩ 111 skipped benchmarks¹

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	`eq/8`	538.9 µs	416.4 µs	+29.41%
⚡	`eq/16`	538.9 µs	416.4 µs	+29.41%
⚡	`eq/32`	511.1 µs	416.4 µs	+22.73%
⚡	`eq/64`	511.1 µs	416.4 µs	+22.73%

_{Comparing klkvr/partial-eq-perf (8df38fb) with main (ff87f1c)}

111 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

DaniPopes · 2026-02-10T19:40:04Z

@klkvr can you show asm diff

klkvr · 2026-02-10T19:50:44Z

@DaniPopes amp got me this

--- a/a.s
+++ b/b.s
@@ -4,13 +4,32 @@
 	.type	nibbles_eq,@function
 nibbles_eq:
 	.cfi_startproc
-	vmovdqu	(%rdi), %ymm0
-	vmovq	32(%rdi), %xmm1
-	vmovq	32(%rsi), %xmm2
-	vpxor	%ymm2, %ymm1, %ymm1
-	vpxor	(%rsi), %ymm0, %ymm0
-	vpor	%ymm1, %ymm0, %ymm0
-	vptest	%ymm0, %ymm0
-	sete	%al
-	vzeroupper
+	mov	rcx, qword ptr [rdi]
+	cmp	rcx, qword ptr [rsi]
+	jne	.LBB0_4
+	mov	rax, qword ptr [rdi + 32]
+	cmp	rax, qword ptr [rsi + 32]
+	jne	.LBB0_4
+	mov	al, 1
+	cmp	rcx, 17
+	jae	.LBB0_6
+.LBB0_3:
 	ret
+.LBB0_6:
+	mov	rdx, qword ptr [rdi + 24]
+	cmp	rdx, qword ptr [rsi + 24]
+	jne	.LBB0_4
+	cmp	rcx, 33
+	jb	.LBB0_3
+	mov	rdx, qword ptr [rdi + 16]
+	cmp	rdx, qword ptr [rsi + 16]
+	jne	.LBB0_4
+	cmp	rcx, 49
+	jb	.LBB0_3
+	mov	rax, qword ptr [rdi + 8]
+	cmp	rax, qword ptr [rsi + 8]
+	sete	al
+	ret
+.LBB0_4:
+	xor	eax, eax
+	ret

DaniPopes · 2026-02-10T20:11:45Z

currently it's branchless and doesn't matter how long the value is. with this change it's probably slightly faster for when the workload is only small values, even then i doubt it

perf: optimize PartialEq implementation

8df38fb

klkvr requested review from DaniPopes, mattsse, mediocregopher and shekhirin as code owners February 10, 2026 19:35

DaniPopes approved these changes Feb 10, 2026

View reviewed changes

klkvr closed this Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize `PartialEq` implementation#54

perf: optimize `PartialEq` implementation#54
klkvr wants to merge 1 commit into
mainfrom
klkvr/partial-eq-perf

klkvr commented Feb 10, 2026

Uh oh!

codspeed-hq Bot commented Feb 10, 2026

Uh oh!

DaniPopes commented Feb 10, 2026

Uh oh!

klkvr commented Feb 10, 2026

Uh oh!

DaniPopes commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

klkvr commented Feb 10, 2026

Uh oh!

codspeed-hq Bot commented Feb 10, 2026

Merging this PR will improve performance by 29.41%

Performance Changes

Footnotes

Uh oh!

DaniPopes commented Feb 10, 2026

Uh oh!

klkvr commented Feb 10, 2026

Uh oh!

DaniPopes commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants