You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Linux Perf Tool is an outstanding tool that serves the significant purpose of measuring CPU performance and offers valuable assistance to developers in optimizing software.
Linux Kernel For RISC-V improved perf support for RISC-V using SBI PMU extension and Sscofpmf extension, which are both supported by Nuclei RISC-V Processor now.
SBI PMU extension is an interface for supervisor-mode to configure and use the RISC-V hardware performance counters with assistance from the machine-mode (or hypervisor-mode)
Sscofpmf risc-v extension is used to add an counter overflow interrupt and Privilege Mode counter Filtering
Note
Please manually enable kernel PERF feature by CONFIG_PERF_EVENTS=y in conf/evalsoc/linux_rv64*_defconfig
By default, now in dev_nuclei_6.1_v2/dev_nuclei_6.6_v2/dev_nuclei_6.9_v2 branches, you need to manually enable CONFIG_PERF_EVENTS=y for Linux Kernel and build linux kernel, then PMU support will be enabled in Linux Kernel, if your CPU is configured with PMU, you will be able to see following output during boot up, see fd3dc31 commit for changes we made to support PMU v1.
And now we support PMU v2 in dev_nuclei_6.6_v3, see commit 4ca5808 for changes made.
We tested Linux Kernel 6.9 with PMU.
# In OpenSBI
Boot HART ISA Extensions : sscofpmf,time,sstc
... ...
Boot HART MHPM Count : 4
# In Linux Kernel
[ 17.066711] riscv-pmu-sbi: SBI PMU extension is available
[ 17.071960] riscv-pmu-sbi: 16 firmware and 6 hardware counters
But when CONFIG_PERF_EVENTS=y enabled, you will not be able to read cycle/instret/time csr, you need to execute echo 2 > /proc/sys/kernel/perf_user_access to allow user mode access these counters, and now the cycle/instret counter by default is not counted, it is controlled by PMU now, you need to use linux perf tool to enable it.
Here is a sample usage for coremark program ported by us using perf tool.
# echo 2 > /proc/sys/kernel/perf_user_access# perf stat -e cycles -e instructions -e cache-misses -e branches -e branch-misses coremark_1core
Start to run CoreMark
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 14788
Total time (secs): 14.788000
Iterations/Sec : 270.489586
Iterations : 4000
Compiler version : GCC10.2.0
Compiler flags : -Ofast -mbranch-cost=1 -mstrict-align -funroll-all-loops -finline-limit=1000 -ftree-dominator-opts -fselective-scheduling -funroll-loops -finline-functions -falign-functions=4 -falign-jumps=4 -falign-loops=4 -fipa-pta -fno-code-hoisting -fno-common -fno-if-conversion -fno-if-conversion2 -fno-tree-loop-distribute-patterns -fno-tree-vectorize -fno-tree-loop-ivcanon -fno-tree-vrp -fgcse-las --param=max-loop-header-insns=4 --param loop-max-datarefs-for-datadeps=0 --param=unroll-jam-min-percent=0 --param=max-goto-duplication-insns=0 -DMULTITHREAD=1 -DUSE_PTHREAD -lrt -lpthread -march=rv64imafdc -mabi=lp64d
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0x65c5
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 270.489586 / GCC10.2.0 -Ofast -mbranch-cost=1 -mstrict-align -funroll-all-loops -finline-limit=1000 -ftree-dominator-opts -fselective-scheduling -funroll-loops -finline-functions -falign-functions=4 -falign-jumps=4 -falign-loops=4 -fipa-pta -fno-code-hoisting -fno-common -fno-if-conversion -fno-if-conversion2 -fno-tree-loop-distribute-patterns -fno-tree-vectorize -fno-tree-loop-ivcanon -fno-tree-vrp -fgcse-las --param=max-loop-header-insns=4 --param loop-max-datarefs-for-datadeps=0 --param=unroll-jam-min-percent=0 --param=max-goto-duplication-insns=0 -DMULTITHREAD=1 -DUSE_PTHREAD -lrt -lpthread -march=rv64imafdc -mabi=lp64d / Heap
Begin_Cycle 9223372037062820250, End_Cycle 9223372037805837188, User_Cycle 743016938 cycles
Begin_Instret 9223372037119765118, End_Instret 9223372038073768277, User_Instret 954003159 Instrets
CoreMark/MHz calc via cycle: 5.383457 CoreMark/MHz
IPC: 1.284
Performance counter stats for'coremark_1core':
951534830 cycles
1219293100 instructions # 1.28 insn per cycle
13957 cache-misses
264065366 branches
6903336 branch-misses # 2.61% of all branches
18.942352295 seconds time elapsed
18.884026000 seconds user
0.039923000 seconds sys
About Linux Perf Tool, you need to cross build it using Linux Kernel, you will need at least to cross build zlib, elfutils and libtraceevent and then build linux perf tool
If you are using 2025.10 glibc toolchain or shellcheck installed in your linux pc, please take a look at this note #26 (comment)
# Clone correct dev_nuclei_6.9.y-perf linux kernel branch, clone less code using --depth 1
git clone -b dev_nuclei_6.9.y-perf --depth 1 https://github.com/Nuclei-Software/linux
# Make sure above directory include source code are setup# Make sure toolchain PATH is already setup
$ which riscv64-unknown-linux-gnu-gcc
/path/to/your/gcc/bin/riscv64-unknown-linux-gnu-gcc
$ cd linux/tools/perf
$ ls *_perf.sh
do_perf.sh pre_perf.sh
# Please dont place install in current directory, so we placed it into ../install# Build perf required third party libraries# If you want to build for rv32imafdc, please change# rv64imafdc lp64d -> rv32imafdc ilp32d
./pre_perf.sh riscv64-unknown-linux-gnu rv64imafdc lp64d ../install
# Build perf tools
./do_perf.sh riscv64-unknown-linux-gnu rv64imafdc lp64d ../install
# Then you will be able to see perf tools in `../install` folder
$ ls ../install/
bin/ env.sh etc/ include/ lib/ libexec/ share/
# WARNING: Below is in a riscv linux environment, not x86 linux environment# You just need to copy these install folder in your riscv linux environment and execute following command# you will be able to use perf in your riscv linuxsource /path/to/install/env.sh
perf --version
Linux Perf Tool is an outstanding tool that serves the significant purpose of measuring CPU performance and offers valuable assistance to developers in optimizing software.
Linux Kernel For RISC-V improved perf support for RISC-V using SBI PMU extension and Sscofpmf extension, which are both supported by Nuclei RISC-V Processor now.
Sscofpmfrisc-v extension is used to add an counter overflow interrupt and Privilege Mode counter FilteringNote
Please manually enable kernel PERF feature by
CONFIG_PERF_EVENTS=yinconf/evalsoc/linux_rv64*_defconfigBy default, now in
dev_nuclei_6.1_v2/dev_nuclei_6.6_v2/dev_nuclei_6.9_v2branches, you need to manually enableCONFIG_PERF_EVENTS=yfor Linux Kernel and build linux kernel, then PMU support will be enabled in Linux Kernel, if your CPU is configured with PMU, you will be able to see following output during boot up, see fd3dc31 commit for changes we made to support PMU v1.And now we support PMU v2 in
dev_nuclei_6.6_v3, see commit 4ca5808 for changes made.But when
CONFIG_PERF_EVENTS=yenabled, you will not be able to readcycle/instret/timecsr, you need to executeecho 2 > /proc/sys/kernel/perf_user_accessto allow user mode access these counters, and now thecycle/instretcounter by default is not counted, it is controlled by PMU now, you need to use linux perf tool to enable it.Here is a sample usage for coremark program ported by us using perf tool.
About Linux Perf Tool, you need to cross build it using Linux Kernel, you will need at least to cross build
zlib,elfutilsandlibtraceeventand then build linux perf toolzlib1.3.1: https://github.com/madler/zlib/tree/v1.3.1elfutils0.191: https://sourceware.org/git/elfutils.gitlibtraceevent1.8.3: https://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.gitWe provide sample script to cross build perf tool for linux kernel 6.9 for rv64imafdc and rv32imafdc:
Tools directories must be like this:
Download prebuilt Nuclei RISC-V Toolchain ( Linux/Glibc )(gcc14) from https://nucleisys.com/download.php
Warning
If you are using 2025.10 glibc toolchain or shellcheck installed in your linux pc, please take a look at this note
#26 (comment)