-
-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Labels
Description
Basic information
- Board URL (official): https://www.raspberrypi.com/products/compute-module-zero/
- Board purchased from: Provided by EDAtec
- Board purchase date: December 9, 2025
- Board specs (as tested): 16GB eMMC, Wireless
- Board price (as tested): $50
Linux/system information
# output of `screenfetch`
.',;:cc;,'. .,;::c:,,. jgeerling@cm0
,ooolcloooo: 'oooooccloo: OS: Raspbian 13 trixie
.looooc;;:ol :oc;;:ooooo' Kernel: armv7l Linux 6.12.47+rpt-rpi-v7
;oooooo: ,ooooooc. Uptime: 6m
.,:;'. .;:;'. Packages: 1633
.dQ. .d0Q0Q0. '0Q. Shell: bash 5.2.37
.0Q0' 'Q0Q0Q' 'Q0Q. Disk: 4.7G / 15G (35%)
'' .odo. .odo. '' CPU: ARMv7 rev 4 (v7l) @ 4x 1GHz
. .0Q0Q0Q' .0Q0Q0Q. . GPU:
,0Q .0Q0Q0Q0Q 'Q0Q0Q0b. 0Q. RAM: 197MiB / 425MiB
:Q0 Q0Q0Q0Q 'Q0Q0Q0 Q0'
'0 '0Q0' .0Q0. '0' 'Q'
.oo. .0Q0Q0. .oo.
'Q0Q0. '0Q0Q0Q0. .Q0Q0b
'Q0Q0. '0Q0Q0' .d0Q0Q'
'Q0Q' .. '0Q.'
.0Q0Q0Q.
'0Q0Q'
# output of `uname -a`
Linux cm0 6.12.47+rpt-rpi-v7 #1 SMP Raspbian 1:6.12.47-1+rpt1 (2025-09-16) armv7l GNU/Linux
System topology
Benchmark results
CPU
(Could not get either of these tools to complete a run without locking up the system and causing a hard reset...)
- Geekbench 6: (TODO single / TODO multi - PASTE_URL)
- TODO Gflops (geerlingguy/top500-benchmark HPL result)
Power
- Idle power draw (at wall): 1.2 W
- With WiFi: 1.3W
- With WiFi + Ethernet: 1.5W
- With WiFi + Ethernet + HDMI: 1.7W
- Maximum simulated power draw (
stress-ng --matrix 0): 3.6 W - During Geekbench multicore benchmark: N/A W
- During
top500HPL benchmark: 3.7 W
Disk
Built in eMMC (16GB) - Rayson RS70B16G4
| Benchmark | Result |
|---|---|
| iozone 4K random read | 9.20 MB/s |
| iozone 4K random write | 10.37 MB/s |
| iozone 1M random read | 22.24 MB/s |
| iozone 1M random write | 19.35 MB/s |
| iozone 1M sequential read | 22.36 MB/s |
| iozone 1M sequential write | 19.38 MB/s |
Network
iperf3 results:
Built-in Ethernet (USB 0fe6:9900 ICS Advent USB 10/100 LAN)
iperf3 -c $SERVER_IP: 94.5 Mbpsiperf3 -c $SERVER_IP --reverse: 92.2 Mbpsiperf3 -c $SERVER_IP --bidir: 74.2 Mbps up, 91.9 Mbps down
(Be sure to test all interfaces, noting any that are non-functional.)
GPU
glmark2
N/A
vkmark
N/A
GravityMark
N/A
AI / LLM Inference
N/A, not enough memory.
Memory
tinymembench results:
Click to expand memory benchmark result
tinymembench v0.4.10 (simple benchmark for memory throughput and latency)
==========================================================================
== Memory bandwidth tests ==
== ==
== Note 1: 1MB = 1000000 bytes ==
== Note 2: Results for 'copy' tests show how many bytes can be ==
== copied per second (adding together read and writen ==
== bytes would have provided twice higher numbers) ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
== to first fetch data into it, and only then write it to the ==
== destination (source -> L1 cache, L1 cache -> destination) ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in ==
== brackets ==
==========================================================================
C copy backwards : 469.7 MB/s (4.4%)
C copy backwards (32 byte blocks) : 482.6 MB/s (2.4%)
C copy backwards (64 byte blocks) : 486.0 MB/s (1.6%)
C copy : 471.5 MB/s (2.1%)
C copy prefetched (32 bytes step) : 464.9 MB/s (1.6%)
C copy prefetched (64 bytes step) : 462.5 MB/s (0.7%)
C 2-pass copy : 403.7 MB/s (1.3%)
C 2-pass copy prefetched (32 bytes step) : 448.7 MB/s (5.9%)
C 2-pass copy prefetched (64 bytes step) : 446.9 MB/s (0.8%)
C fill : 833.2 MB/s (6.6%)
C fill (shuffle within 16 byte blocks) : 1127.5 MB/s (13.5%)
C fill (shuffle within 32 byte blocks) : 828.0 MB/s (0.8%)
C fill (shuffle within 64 byte blocks) : 820.7 MB/s (1.1%)
---
standard memcpy : 590.5 MB/s (5.4%)
standard memset : 924.6 MB/s (8.6%)
---
NEON read : 676.3 MB/s (1.0%)
NEON read prefetched (32 bytes step) : 892.9 MB/s (1.9%)
NEON read prefetched (64 bytes step) : 878.4 MB/s (1.3%)
NEON read 2 data streams : 701.4 MB/s (1.7%)
NEON read 2 data streams prefetched (32 bytes step) : 757.5 MB/s (0.9%)
NEON read 2 data streams prefetched (64 bytes step) : 811.8 MB/s (3.2%)
NEON copy : 474.8 MB/s (1.0%)
NEON copy prefetched (32 bytes step) : 469.8 MB/s (0.2%)
NEON copy prefetched (64 bytes step) : 476.5 MB/s (0.8%)
NEON unrolled copy : 469.5 MB/s (1.2%)
NEON unrolled copy prefetched (32 bytes step) : 469.6 MB/s (0.2%)
NEON unrolled copy prefetched (64 bytes step) : 508.4 MB/s (4.1%)
NEON copy backwards : 478.7 MB/s (0.8%)
NEON copy backwards prefetched (32 bytes step) : 506.2 MB/s (3.7%)
NEON copy backwards prefetched (64 bytes step) : 469.1 MB/s (1.0%)
NEON 2-pass copy : 444.0 MB/s (3.7%)
NEON 2-pass copy prefetched (32 bytes step) : 397.4 MB/s (1.1%)
NEON 2-pass copy prefetched (64 bytes step) : 456.7 MB/s (7.4%)
NEON unrolled 2-pass copy : 426.2 MB/s (1.1%)
NEON unrolled 2-pass copy prefetched (32 bytes step) : 872.0 MB/s (19.4%)
NEON unrolled 2-pass copy prefetched (64 bytes step) : 880.7 MB/s (0.9%)
NEON fill : 1389.8 MB/s (0.2%)
NEON fill backwards : 1388.6 MB/s (0.2%)
VFP copy : 1074.9 MB/s (0.5%)
VFP 2-pass copy : 848.9 MB/s (0.5%)
ARM fill (STRD) : 1391.2 MB/s (0.2%)
ARM fill (STM with 8 registers) : 1394.8 MB/s (0.2%)
ARM fill (STM with 4 registers) : 1392.2 MB/s (0.2%)
ARM copy prefetched (incr pld) : 1069.7 MB/s (0.2%)
ARM copy prefetched (wrap pld) : 1073.1 MB/s (0.5%)
ARM 2-pass copy prefetched (incr pld) : 879.5 MB/s (0.9%)
ARM 2-pass copy prefetched (wrap pld) : 868.3 MB/s (0.8%)
==========================================================================
== Framebuffer read tests. ==
== ==
== Many ARM devices use a part of the system memory as the framebuffer, ==
== typically mapped as uncached but with write-combining enabled. ==
== Writes to such framebuffers are quite fast, but reads are much ==
== slower and very sensitive to the alignment and the selection of ==
== CPU instructions which are used for accessing memory. ==
== ==
== Many x86 systems allocate the framebuffer in the GPU memory, ==
== accessible for the CPU via a relatively slow PCI-E bus. Moreover, ==
== PCI-E is asymmetric and handles reads a lot worse than writes. ==
== ==
== If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
== or preferably >300 MB/s), then using the shadow framebuffer layer ==
== is not necessary in Xorg DDX drivers, resulting in a nice overall ==
== performance improvement. For example, the xf86-video-fbturbo DDX ==
== uses this trick. ==
==========================================================================
NEON read (from framebuffer) : 60.0 MB/s
NEON copy (from framebuffer) : 57.8 MB/s (0.5%)
NEON 2-pass copy (from framebuffer) : 57.9 MB/s
NEON unrolled copy (from framebuffer) : 59.1 MB/s
NEON 2-pass unrolled copy (from framebuffer) : 57.6 MB/s (0.3%)
VFP copy (from framebuffer) : 379.0 MB/s (1.9%)
VFP 2-pass copy (from framebuffer) : 325.8 MB/s (0.3%)
ARM copy (from framebuffer) : 184.8 MB/s (0.4%)
ARM 2-pass copy (from framebuffer) : 192.2 MB/s (0.2%)
==========================================================================
== Memory latency test ==
== ==
== Average time is measured for random memory accesses in the buffers ==
== of different sizes. The larger is the buffer, the more significant ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM ==
== accesses. For extremely large buffer sizes we are expecting to see ==
== page table walk with several requests to SDRAM for almost every ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest). ==
== ==
== Note 1: All the numbers are representing extra time, which needs to ==
== be added to L1 cache latency. The cycle timings for L1 cache ==
== latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
== two independent memory accesses at a time. In the case if ==
== the memory subsystem can't handle multiple outstanding ==
== requests, dual random read has the same timings as two ==
== single reads performed one after another. ==
==========================================================================
block size : single random read / dual random read
1024 : 0.0 ns / 0.0 ns
2048 : 0.0 ns / 0.0 ns
4096 : 0.0 ns / 0.0 ns
8192 : 0.0 ns / 0.0 ns
16384 : 0.0 ns / 0.0 ns
32768 : 0.2 ns / 0.2 ns
65536 : 6.5 ns / 11.1 ns
131072 : 10.0 ns / 15.8 ns
262144 : 11.7 ns / 17.9 ns
524288 : 13.7 ns / 21.1 ns
1048576 : 94.0 ns / 145.3 ns
2097152 : 135.3 ns / 186.8 ns
4194304 : 162.2 ns / 209.6 ns
8388608 : 176.3 ns / 220.3 ns
16777216 : 185.7 ns / 227.2 ns
33554432 : 191.8 ns / 232.6 ns
67108864 : 196.9 ns / 236.5 ns
Core to Core Memory Latency
sbc-bench results
Run sbc-bench and paste a link to the results here:
wget https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/master/sbc-bench.sh
sudo /bin/bash ./sbc-bench.sh -r
Phoronix Test Suite
Results from pi-general-benchmark.sh:
- pts/encode-mp3: TODO sec
- pts/x264 1080p: TODO fps
- pts/x264 4K: TODO fps
- pts/phpbench: TODO
- pts/build-linux-kernel (defconfig): TODO sec
Reactions are currently unavailable
