bmpu_verify is the fastest correctness-oriented app in the repository for validating BMPU mixed-precision behavior.
- generated test vectors match the declared benchmark cases
- BMPU execution completes successfully
- packed output can be unpacked back into column-major form
- unpacked output matches the reference result tensor
- 18 directed cases
- precisions: binary, INT2, INT4
- group shapes:
gm x gn = 2x2,4x1,1x2,1x4,8x1,1x8 - tail coverage for non-full tiles and larger
ntileedge cases
The gm=8 / gn=8 cases are legal even when the last tile group is smaller than 8.
Execution clips the tail group to the remaining tile count, so these cases explicitly
exercise the clipped-group path.
Additional current coverage includes small and tail-oriented binary cases such as:
mtile=8, ntile=128minimal and tail casesmtile=16, ntile=64minimal and tail cases
result_torchis stored as column-major reference data.result_hpis not a plainM*Nmatrix in this app. It is a packed BMPU store buffer laid out tile-by-tile and block-by-block.- For non-full edge tiles,
result_hpmust be sized by packed tile capacity:ceil(M/mtile) * ceil(N/ntile) * ceil(mtile/8) * ceil(ntile/16) * 8 * 16int16 elements.
If result_hp is only allocated as M*N, tail-tile cases can overwrite following symbols in
the generated dataset and cause false mismatches or hangs.
Use this app before large benchmark campaigns when you changed:
- BMPU RTL
- low-precision packing logic
- benchmark data generation
- shared mixed-matmul helpers
This app should remain the quickest full-flow signal that a maintained BMPU-related change is still functionally safe.
main.c: executes the validation loop and compares resultskernel/: generated tensors and benchmark case metadatascript/gen_data.py: generator for the validation dataset
- app-side compare/debug prints are controlled by
BMPU_VERIFY_DEBUG - hardware-side debug prints are controlled from
src/hardware/rtl/extended/include/bitfly_debug.svh - keep both disabled by default for full regressions
source /data2/zzx/data/miniconda3/etc/profile.d/conda.sh
conda activate bitfly
./scripts/sync_src_to_ara.sh --no-patch
make -C ara/apps bin/bmpu_verify -j8
make -C ara/hardware simv app=bmpu_verifyA healthy run ends with ALL CASES PASSED.