Question: Architecture-specific performance differences between blake3 and blake2

### Summary  
I’ve been evaluating the Rust [`blake3`](https://crates.io/crates/blake3) crate for potential use. Data, code, and benchmarks are available [here](https://github.com/device-mapper-utils/blk-archive/issues/45#issuecomment-3255376966).

### Results  
- **x86:** `blake3` is consistently faster than `blake2`.  
- **ppcle / s390x / aarch64:** performance is generally slower than `blake2`.  
  - Rayon parallelism sometimes improves results.  
  - In some cases, performance is still worse ([example](https://github.com/device-mapper-utils/blk-archive/issues/45#issuecomment-3255485598)).  

### Questions  
- Is this expected behavior on non-x86 architectures (e.g., SIMD gaps, missing intrinsics)?  
- Or is my sample code / benchmarking harness flawed?  
- Are there recommended tuning options or build flags for ppcle, s390x, and aarch64?  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question: Architecture-specific performance differences between blake3 and blake2 #516

Summary

Results

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question: Architecture-specific performance differences between blake3 and blake2 #516

Description

Summary

Results

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions