Skip to content

[SYCL] - Run hang with undefined temporary symbol error on Windows 11 #596

Open
@aahouzi

Description

@aahouzi

Type of issue

  • Started by installing oneAPI 2025.0.1 and activating the environment with:
 "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
  • Verified SYCL devices with sycl-ls:
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) B580 Graphics 20.1.0 [1.6.31896]
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 9 285K OpenCL 3.0 (Build 0) [2024.18.12.0.05_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) B580 Graphics OpenCL 3.0 NEO  [32.0.101.6559]
  • Used the following CMake commands to build the project with SYCL backend on my B580 dGPU:
cmake -B build -G "Ninja" -DSD_SYCL=ON -DCMAKE_C_COMPILER=cl -DCMAKE_CXX_COMPILER=icx 
cmake --build build --config Release -j
  • The build completes successfully. When I attempt running the binary on sd3-medium model, the run hangs indefinitely with the following error:
C:\Users\Intel\Desktop\aahouzi\stable-diffusion.cpp>build\bin\sd.exe -m sd3_medium_incl_clips_t5xxlfp16.safetensors --cfg-scale 5 --steps 30 --sampling-method euler  -H 1024 -W 1024 --seed 42 -p "fantasy medieval village world inside a glass sphere , high detail, fantasy, realistic, light effect, hyper detail, volumetric lighting, cinematic, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art cg render made in maya, blender and photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, world inside a glass sphere by james gurney by artgerm with james jean, joe fenton and tristan eaton by ross tran, fine details, 4k resolution"
[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Arc B580 Graphics|   20.1|    160|    1024|   32| 12450M|            1.6.31896|
ggml_sycl_init: GGML_SYCL_FORCE_MMQ:   no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
[INFO ] stable-diffusion.cpp:195  - loading model from 'sd3_medium_incl_clips_t5xxlfp16.safetensors'
[INFO ] model.cpp:888  - load sd3_medium_incl_clips_t5xxlfp16.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:242  - Version: SD3.x
[INFO ] stable-diffusion.cpp:275  - Weight type:                 f16
[INFO ] stable-diffusion.cpp:276  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:277  - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:278  - VAE weight type:             f16
[INFO ] stable-diffusion.cpp:319  - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:322  - CLIP: Using CPU backend
[INFO ] mmdit.hpp:706  - MMDiT layers: 24 (including 0 MMDiT-x layers)
  |==================================================| 1665/1668 - 0.00it/s[INFO ] model.cpp:1868 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file
  |==================================================| 1668/1668 - 5.68it/s
[INFO ] stable-diffusion.cpp:516  - total params memory size = 14857.47MB (VRAM 4209.34MB, RAM 10648.12MB): clip 10648.12MB(RAM), unet 4114.77MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:520  - loading model from 'sd3_medium_incl_clips_t5xxlfp16.safetensors' completed, taking 24.17s
[INFO ] stable-diffusion.cpp:538  - running in FLOW mode
[INFO ] stable-diffusion.cpp:688  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1241 - apply_loras completed, taking 0.00s
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 1.40 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 1.40 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 2.33 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 2.33 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 11.94 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 1.40 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 2.33 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 11.94 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 1.40 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 1.40 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 2.33 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 2.33 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 11.94 MiB
[INFO ] stable-diffusion.cpp:1374 - get_learned_condition completed, taking 44453 ms
[INFO ] stable-diffusion.cpp:1397 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1434 - generating image: 1/1 - seed 42
ggml_gallocr_reserve_n: reallocating SYCL0 buffer from size 0.00 MiB to 1921.36 MiB
<unknown>:0: error: Undefined temporary symbol .L_ZL13cpy_1_i16_i16PKcPc
<unknown>:0: error: Undefined temporary symbol .L_ZL13cpy_1_f16_f16PKcPc
<unknown>:0: error: Undefined temporary symbol .L_ZL13cpy_1_i32_i32PKcPc
<unknown>:0: error: Undefined temporary symbol .L_ZL13cpy_1_f32_f32PKcPc
  • I tried on Intel Core Ultra 7 155H (MTL) iGPU too, but I ended up with the same issue. @zhentaoyu any guidance you can provide on this issue ?

Hardware

  • Intel Core Ultra 7 155H (MTL) iGPU
  • Intel Arc B580 dGPU

GPU Driver version

32.0.101.6559

OS

Windows 11

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions