I am trying to use OpenVINO with binary neural network layers (xnore) on embedded boards that do not support AVX (no AVX2, AVX512F, BMI2). To test the behaviour, I built OpenVINO from source on my development machine (which does support AVX) with all AVX options disabled via CMake. The build succeeded and a simple C++ inference program worked fine on my machine.
1- Build OpenVINO with AVX fully disabled using the following CMake command (inside build directory):
2- Build OpenVINO as usual.
3- Write a minimal C++ inference program (e.g., run_model.cpp):
4- Compile the program against the built OpenVINO.
5- Run the program with a model that contains binary convolution layers (e.g., a YOLOv8 BNN model). On a machine with AVX support it works:
OpenVINO Core initialized.
Model loaded: yolov8_bnn.xml
onednn_verbose,v1,info,oneDNN v3.10.2 (commit 87f65fdd1927b1d0cbdf0ea37728146abfbffb52)
onednn_verbose,v1,info,cpu,runtime:threadpool,nthr:10
onednn_verbose,v1,info,cpu,isa:Intel SSE4.1
onednn_verbose,v1,info,gpu,runtime:none
onednn_verbose,v1,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:Acdb8a::f0,,,32x1x3x3,0.0100098
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:bin::blocked:abcd::f0 dst:bin::blocked:ABcd8a32b::f0,,,64x32x3x3,0.166016
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:bin::blocked:abcd::f0 dst:bin::blocked:ABcd8a32b::f0,,,64x64x3x3,0.092041
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:bin::blocked:abcd::f0 dst:bin::blocked:ABcd8a32b::f0,,,128x64x3x3,0.164062
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:bin::blocked:abcd::f0 dst:bin::blocked:ABcd8a32b::f0,,,128x128x3x3,0.181885
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:bin::blocked:abcd::f0 dst:bin::blocked:ABcd8a32b::f0,,,256x128x3x3,0.468018
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:bin::blocked:abcd::f0 dst:bin::blocked:ABcd8a32b::f0,,,256x256x3x3,0.679932
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:bin::blocked:abcd::f0 dst:bin::blocked:ABcd8a32b::f0,,,512x256x3x3,1.04199
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:bin::blocked:abcd::f0 dst:bin::blocked:ABcd8a32b::f0,,,512x512x3x3,2.18701
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:bin::blocked:abcd::f0 dst:bin::blocked:ABcd8a32b::f0,,,256x512x1x1,0.314941
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:abcd::f0 dst:f32:p:blocked:ABcd8b8a::f0,,,70x256x1x1,0.291016
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:ABcd8b8a::f0,,,256x256x1x1,0.436035
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:bin::blocked:abcd::f0 dst:bin::blocked:ABcd8a32b::f0,,,256x512x3x3,1.073
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:abcd::f0 dst:f32:p:blocked:ABcd8b8a::f0,,,70x256x1x1,0.286133
Model compiled for CPU.
onednn_verbose,v1,primitive,exec,cpu,convolution,jit:sse41,forward_inference,src:f32::blocked:abcd::f0 wei:f32:a:blocked:Acdb8a::f0 bia:undef::undef::: dst:f32::blocked:aBcd8b::f0,attr-scratchpad:user,alg:convolution_direct,mb1_ic1oc32_ih256oh128kh3sh2dh0ph1_iw480ow240kw3sw2dw0pw1,1.37598
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:aBcd8b::f0 dst:f32::blocked:acdb::f0,,,1x32x128x240,1.24316
Segmentation fault (core dumped)
Environment
Device used for inference
CPU
Framework
PyTorch
Model used
custome yoloV8
Issue description
I am trying to use OpenVINO with binary neural network layers (xnore) on embedded boards that do not support AVX (no AVX2, AVX512F, BMI2). To test the behaviour, I built OpenVINO from source on my development machine (which does support AVX) with all AVX options disabled via CMake. The build succeeded and a simple C++ inference program worked fine on my machine.
However, when I forced oneDNN to use SSE4.1 only (by setting
ONEDNN_MAX_CPU_ISA=SSE41andONEDNN_VERBOSE=1), the same program crashes with a segmentation fault during execution. The oneDNN logs show that the ISA is correctly set to SSE4.1 and the crash occurs inside a reorder operation.I suspect a bug in oneDNN’s SSE4.1 fallback path, likely in
jit_uni_reorder.cppor related memory alignment/instruction generation.Step-by-step reproduction
1- Build OpenVINO with AVX fully disabled using the following CMake command (inside build directory):
2- Build OpenVINO as usual.
3- Write a minimal C++ inference program (e.g.,
run_model.cpp):4- Compile the program against the built OpenVINO.
5- Run the program with a model that contains binary convolution layers (e.g., a YOLOv8 BNN model). On a machine with AVX support it works:
6- Force oneDNN to SSE4.1 and enable verbose logging:
Observed Behaviour
The program crashes with
Segmentation fault (core dumped).The oneDNN verbose output shows:
The crash occurs during the reorder from
aBcd8btoacdb.Additional Information
ONEDNN_MAX_CPU_ISA).Acdb16a) to SSE‑optimized (aBcd8b) when AVX is unavailable.src/plugins/intel_cpu/thirdparty/onednn/src/cpu/x64/jit_uni_reorder.cpp) has a bug, possibly an unaligned memory access or incorrect instruction generation for the aBcd8b → acdb transformation.Request
I need guidance on how to successfully run OpenVINO with binary neural network models on embedded CPUs that lack any AVX support (only SSE4.1). Is there a workaround (e.g., additional build flags, different memory layout, disabling certain JIT kernels) or a known fix for this reorder crash? Any help would be greatly appreciated.
Possible Root Cause (as per my analysis)
On CPUs without AVX2, oneDNN falls back to SSE4.1 and uses 128‑bit registers. The reorder from aBcd8b (SSE‑specific blocked layout) to acdb (plain layout) is handled by a JIT kernel. This specific kernel may contain a bug – either unaligned memory access or incorrect register allocation – leading to a segfault.
Please let me know if any additional logs or debug information would help. Thank you!
Issue submission checklist