Generated: 2026-02-21 16:02
Platform: Darwin arm64
- Summary
- Matrix Multiplication
- Element-wise Operations
- Unary Operations
- Linear Algebra
- FFT Operations
- Fusion Patterns
- Test Environment
Comprehensive performance comparison across tensor operations.
Performance comparison for square matrix multiplication (GFLOPS, higher is better).
| Size | Axiom | Eigen3 | PyTorch | NumPy | Armadillo |
|---|---|---|---|---|---|
| 32×32 | 55.2 | 88.4 | 62.2 | 56.4 | 98.3 |
| 64×64 | 371 | 440 | 336 | 332 | 42.1 |
| 128×128 | 923 | 954 | 930 | 951 | 44.7 |
| 256×256 | 1,421 | 1,251 | 1,412 | 1,488 | 162 |
| 512×512 | 2,389 | 2,345 | 1,310 | 2,358 | 434 |
| 1024×1024 | 2,820 | 2,445 | 2,423 | 2,299 | 524 |
| 2048×2048 | 3,218 | 2,982 | 2,801 | 2,795 | 608 |
| 4096×4096 | 3,087 | 2,961 | 2,961 | 2,959 | 754 |
Binary element-wise operations (add, sub, mul, div) measured in GB/s throughput.
Results at 4096×4096 (GB/s)
| Operation | Axiom | Eigen3 | PyTorch | NumPy |
|---|---|---|---|---|
| add | 92.4 | 121 | 94.2 | 40.0 |
| sub | 112 | 119 | 90.4 | 40.5 |
| mul | 117 | 119 | 96.1 | 42.9 |
| div | 99.1 | 120 | 95.4 | 41.2 |
Unary operations (exp, log, sqrt, sin, cos, tanh, abs, neg, relu, sigmoid) measured in GB/s.
Results at 4096×4096 (GB/s)
| Operation | Axiom | Eigen3 | PyTorch | NumPy |
|---|---|---|---|---|
| exp | 23.0 | 16.4 | 50.7 | 5.69 |
| log | 17.0 | 13.1 | 33.7 | 4.93 |
| sqrt | 39.0 | 66.3 | 73.2 | 29.7 |
| sin | 26.0 | 11.0 | 39.1 | 6.73 |
| cos | 25.2 | 11.0 | 33.7 | 6.46 |
| tanh | 14.3 | 21.5 | 21.0 | 9.43 |
| abs | 104 | 118 | 75.1 | 27.3 |
| neg | 109 | 66.1 | 75.4 | 32.7 |
| relu | 121 | 122 | 74.2 | 18.5 |
| sigmoid | 14.3 | 15.9 | 47.5 | 3.71 |
Linear algebra operations (SVD, QR, solve, Cholesky, eigendecomposition, inverse, determinant). Measured in milliseconds (lower is better).
Results at 512×512 (time_ms)
| Operation | Axiom | Eigen3 | PyTorch | NumPy |
|---|---|---|---|---|
| svd | 18.2 | 2,200 | 16.2 | 25.9 |
| qr | 4.72 | 1.50 | 4.06 | 7.89 |
| solve | 0.96 | 2.18 | 0.47 | 1.20 |
| cholesky | 0.69 | 0.24 | 0.22 | 1.43 |
| eig | 151 | 22.5 | 9.50 | 15.4 |
| inv | 1.55 | 2.34 | 1.06 | 3.83 |
| det | 1.01 | 1.46 | 0.57 | 1.69 |
Fast Fourier Transform operations (fft, ifft, rfft, fft2, ifft2, rfft2). Measured in milliseconds (lower is better).
Results at 2048×2048 (time_ms)
| Operation | Axiom | PyTorch | NumPy |
|---|---|---|---|
| fft | 0.00 | 0.01 | 0.01 |
| ifft | 0.00 | 0.01 | 0.01 |
| rfft | 0.00 | 0.01 | 0.01 |
| fft2 | 14.3 | 27.2 | 60.6 |
| ifft2 | 14.3 | 27.6 | 29.6 |
| rfft2 | 10.0 | 7.82 | 22.8 |
Lazy evaluation with operation fusion vs eager mode execution.
Run make benchmark-fusion to generate fusion data.
OS: Darwin 25.3.0
Architecture: arm64
Python: 3.12.7
Timestamp: 2026-02-21T15:56:58.080886
- All benchmarks run on CPU
- Axiom uses Accelerate framework (BLAS) on macOS
- Higher GFLOPS/GB/s = better for throughput metrics
- Lower ms = better for time metrics
- Results may vary based on system load and thermal conditions










