Skip to content

Latest commit

 

History

History
188 lines (120 loc) · 4.46 KB

File metadata and controls

188 lines (120 loc) · 4.46 KB

Axiom Benchmark Results

Generated: 2026-02-21 16:02

Platform: Darwin arm64

Contents


Summary

Comprehensive performance comparison across tensor operations.

Comprehensive Summary


Matrix Multiplication

Performance comparison for square matrix multiplication (GFLOPS, higher is better).

Size Axiom Eigen3 PyTorch NumPy Armadillo
32×32 55.2 88.4 62.2 56.4 98.3
64×64 371 440 336 332 42.1
128×128 923 954 930 951 44.7
256×256 1,421 1,251 1,412 1,488 162
512×512 2,389 2,345 1,310 2,358 434
1024×1024 2,820 2,445 2,423 2,299 524
2048×2048 3,218 2,982 2,801 2,795 608
4096×4096 3,087 2,961 2,961 2,959 754

Performance Comparison

Matmul Comparison

Scaling Analysis

Matmul Scaling


Element-wise Operations

Binary element-wise operations (add, sub, mul, div) measured in GB/s throughput.

Results at 4096×4096 (GB/s)

Operation Axiom Eigen3 PyTorch NumPy
add 92.4 121 94.2 40.0
sub 112 119 90.4 40.5
mul 117 119 96.1 42.9
div 99.1 120 95.4 41.2

Performance by Operation

Elementwise Comparison

Bar Chart Comparison

Elementwise Bar


Unary Operations

Unary operations (exp, log, sqrt, sin, cos, tanh, abs, neg, relu, sigmoid) measured in GB/s.

Results at 4096×4096 (GB/s)

Operation Axiom Eigen3 PyTorch NumPy
exp 23.0 16.4 50.7 5.69
log 17.0 13.1 33.7 4.93
sqrt 39.0 66.3 73.2 29.7
sin 26.0 11.0 39.1 6.73
cos 25.2 11.0 33.7 6.46
tanh 14.3 21.5 21.0 9.43
abs 104 118 75.1 27.3
neg 109 66.1 75.4 32.7
relu 121 122 74.2 18.5
sigmoid 14.3 15.9 47.5 3.71

Performance by Operation

Unary Comparison

Bar Chart Comparison

Unary Bar


Linear Algebra

Linear algebra operations (SVD, QR, solve, Cholesky, eigendecomposition, inverse, determinant). Measured in milliseconds (lower is better).

Results at 512×512 (time_ms)

Operation Axiom Eigen3 PyTorch NumPy
svd 18.2 2,200 16.2 25.9
qr 4.72 1.50 4.06 7.89
solve 0.96 2.18 0.47 1.20
cholesky 0.69 0.24 0.22 1.43
eig 151 22.5 9.50 15.4
inv 1.55 2.34 1.06 3.83
det 1.01 1.46 0.57 1.69

Performance by Operation

Linalg Comparison

Bar Chart Comparison

Linalg Bar


FFT Operations

Fast Fourier Transform operations (fft, ifft, rfft, fft2, ifft2, rfft2). Measured in milliseconds (lower is better).

Results at 2048×2048 (time_ms)

Operation Axiom PyTorch NumPy
fft 0.00 0.01 0.01
ifft 0.00 0.01 0.01
rfft 0.00 0.01 0.01
fft2 14.3 27.2 60.6
ifft2 14.3 27.6 29.6
rfft2 10.0 7.82 22.8

Performance by Operation

FFT Comparison

Bar Chart Comparison

FFT Bar


Fusion Patterns

Lazy evaluation with operation fusion vs eager mode execution.

Run make benchmark-fusion to generate fusion data.


Test Environment

OS: Darwin 25.3.0
Architecture: arm64
Python: 3.12.7
Timestamp: 2026-02-21T15:56:58.080886

Notes

  • All benchmarks run on CPU
  • Axiom uses Accelerate framework (BLAS) on macOS
  • Higher GFLOPS/GB/s = better for throughput metrics
  • Lower ms = better for time metrics
  • Results may vary based on system load and thermal conditions