Skip to content

On mobile devices, submitting excessively large tasks to the GPU in the Vulkan backend can lead to timeouts. #1420

@mgxhhg

Description

@mgxhhg

Added logging of Vulkan kernel size
VK_KERNEL] Name: pad_f32 | Elements: [512, 512, 34] | Dispatch Grid: [1, 512, 34] [VK_KERNEL] Name: scale_f32 | Elements: [512, 512, 34] | Dispatch Grid: [1, 512, 34] [VK_KERNEL] Name: contig_cpy_f32_f16 | Elements: [512, 512, 34] | Dispatch Grid: [1, 512, 34] [VK_KERNEL] Name: cpy_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: pad_f32 | Elements: [512, 512, 34] | Dispatch Grid: [1, 512, 34] [VK_KERNEL] Name: scale_f32 | Elements: [512, 512, 34] | Dispatch Grid: [1, 512, 34] [VK_KERNEL] Name: contig_cpy_f32_f16 | Elements: [512, 512, 34] | Dispatch Grid: [1, 512, 34] [VK_KERNEL] Name: scale_f32 | Elements: [1, 1, 1] | Dispatch Grid: [1, 1, 1] [VK_KERNEL] Name: repeat_f32 | Elements: [512, 512, 18] | Dispatch Grid: [1, 512, 18] [VK_KERNEL] Name: scale_f32 | Elements: [1, 1, 1] | Dispatch Grid: [1, 1, 1] [VK_KERNEL] Name: repeat_f32 | Elements: [512, 512, 2] | Dispatch Grid: [1, 512, 2] [VK_KERNEL] Name: concat_f32 | Elements: [512, 512, 19] | Dispatch Grid: [1, 512, 19] [VK_KERNEL] Name: contig_cpy_f32_f16 | Elements: [512, 512, 19] | Dispatch Grid: [1, 512, 19] [VK_KERNEL] Name: flash_attn_f32_f16_aligned_f32accf16 | Elements: [2112, 30, 1] | Dispatch Grid: [264, 30, 1] [VK_KERNEL] Name: scale_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: matmul_q8_0_f32_f16acc_aligned_l | Elements: [3840, 2112, 1] | Dispatch Grid: [30, 17, 1] [VK_KERNEL] Name: rms_norm_mul_f32 | Elements: [2112, 1, 1] | Dispatch Grid: [2112, 1, 1] [VK_KERNEL] Name: contig_cpy_f32_f32 | Elements: [512, 8, 1] | Dispatch Grid: [1, 8, 1] [VK_KERNEL] Name: tanh_f32 | Elements: [512, 8, 1] | Dispatch Grid: [1, 8, 1] [VK_KERNEL] Name: mul_f32_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: add_f32_f32_f32_norepeat | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: rms_norm_mul_f32 | Elements: [2112, 1, 1] | Dispatch Grid: [2112, 1, 1] [VK_KERNEL] Name: contig_cpy_f32_f32 | Elements: [512, 8, 1] | Dispatch Grid: [1, 8, 1] [VK_KERNEL] Name: mul_f32_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: add_f32_f32_f32_norepeat | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: matmul_q8_0_f32_f16acc_aligned_l | Elements: [10240, 2112, 1] | Dispatch Grid: [80, 17, 1] [VK_KERNEL] Name: silu_f32 | Elements: [512, 512, 83] | Dispatch Grid: [1, 512, 83] [VK_KERNEL] Name: matmul_q8_0_f32_f16acc_aligned_l | Elements: [10240, 2112, 1] | Dispatch Grid: [80, 17, 1] [VK_KERNEL] Name: mul_f32_f32_f32_norepeat | Elements: [512, 512, 83] | Dispatch Grid: [1, 512, 83] [VK_KERNEL] Name: scale_f32 | Elements: [512, 512, 83] | Dispatch Grid: [1, 512, 83] [VK_KERNEL] Name: matmul_q8_0_f32_aligned_l | Elements: [3840, 2112, 1] | Dispatch Grid: [30, 17, 1] [VK_KERNEL] Name: scale_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: rms_norm_mul_f32 | Elements: [2112, 1, 1] | Dispatch Grid: [2112, 1, 1] [VK_KERNEL] Name: contig_cpy_f32_f32 | Elements: [512, 8, 1] | Dispatch Grid: [1, 8, 1] [VK_KERNEL] Name: tanh_f32 | Elements: [512, 8, 1] | Dispatch Grid: [1, 8, 1] [VK_KERNEL] Name: mul_f32_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: add_f32_f32_f32_norepeat | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: rms_norm_mul_f32 | Elements: [2112, 1, 1] | Dispatch Grid: [2112, 1, 1] [VK_KERNEL] Name: mul_mat_vec_q8_0_f32_f32 | Elements: [15360, 1, 1] | Dispatch Grid: [15360, 1, 1] [VK_KERNEL] Name: contig_cpy_f32_f32 | Elements: [512, 8, 1] | Dispatch Grid: [1, 8, 1] [VK_KERNEL] Name: mul_f32_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: add_f32_f32_f32_norepeat | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: matmul_q8_0_f32_f16acc_aligned_l | Elements: [11520, 2112, 1] | Dispatch Grid: [90, 17, 1] [VK_KERNEL] Name: rms_norm_mul_f32 | Elements: [30, 2112, 1] | Dispatch Grid: [30, 2112, 1] [VK_KERNEL] Name: cpy_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: cpy_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: repeat_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: cpy_f32_f32 | Elements: [512, 512, 3] | Dispatch Grid: [1, 512, 3] [VK_KERNEL] Name: mul_f32_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: repeat_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: mul_f32_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: add_f32_f32_f32_norepeat | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: rms_norm_mul_f32 | Elements: [30, 2112, 1] | Dispatch Grid: [30, 2112, 1] [VK_KERNEL] Name: cpy_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: cpy_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: repeat_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: cpy_f32_f32 | Elements: [512, 512, 3] | Dispatch Grid: [1, 512, 3] [VK_KERNEL] Name: mul_f32_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: repeat_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: mul_f32_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: add_f32_f32_f32_norepeat | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: pad_f32 | Elements: [512, 512, 34] | Dispatch Grid: [1, 512, 34] [VK_KERNEL] Name: scale_f32 | Elements: [512, 512, 34] | Dispatch Grid: [1, 512, 34] [VK_KERNEL] Name: contig_cpy_f32_f16 | Elements: [512, 512, 34] | Dispatch Grid: [1, 512, 34] [VK_KERNEL] Name: cpy_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: pad_f32 | Elements: [512, 512, 34] | Dispatch Grid: [1, 512, 34] [VK_KERNEL] Name: scale_f32 | Elements: [512, 512, 34] | Dispatch Grid: [1, 512, 34] [VK_KERNEL] Name: contig_cpy_f32_f16 | Elements: [512, 512, 34] | Dispatch Grid: [1, 512, 34] [VK_KERNEL] Name: scale_f32 | Elements: [1, 1, 1] | Dispatch Grid: [1, 1, 1] [VK_KERNEL] Name: repeat_f32 | Elements: [512, 512, 18] | Dispatch Grid: [1, 512, 18] [VK_KERNEL] Name: scale_f32 | Elements: [1, 1, 1] | Dispatch Grid: [1, 1, 1] [VK_KERNEL] Name: repeat_f32 | Elements: [512, 512, 2] | Dispatch Grid: [1, 512, 2] [VK_KERNEL] Name: concat_f32 | Elements: [512, 512, 19] | Dispatch Grid: [1, 512, 19] [VK_KERNEL] Name: contig_cpy_f32_f16 | Elements: [512, 512, 19] | Dispatch Grid: [1, 512, 19] [VK_KERNEL] Name: flash_attn_f32_f16_aligned_f32accf16 | Elements: [2112, 30, 1] | Dispatch Grid: [264, 30, 1] [VK_KERNEL] Name: scale_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: matmul_q8_0_f32_f16acc_aligned_l | Elements: [3840, 2112, 1] | Dispatch Grid: [30, 17, 1] [VK_KERNEL] Name: rms_norm_mul_f32 | Elements: [2112, 1, 1] | Dispatch Grid: [2112, 1, 1] [VK_KERNEL] Name: contig_cpy_f32_f32 | Elements: [512, 8, 1] | Dispatch Grid: [1, 8, 1] [VK_KERNEL] Name: tanh_f32 | Elements: [512, 8, 1] | Dispatch Grid: [1, 8, 1] [VK_KERNEL] Name: mul_f32_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: add_f32_f32_f32_norepeat | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: rms_norm_mul_f32 | Elements: [2112, 1, 1] | Dispatch Grid: [2112, 1, 1] [VK_KERNEL] Name: contig_cpy_f32_f32 | Elements: [512, 8, 1] | Dispatch Grid: [1, 8, 1] [VK_KERNEL] Name: mul_f32_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: add_f32_f32_f32_norepeat | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: matmul_q8_0_f32_f16acc_aligned_l | Elements: [10240, 2112, 1] | Dispatch Grid: [80, 17, 1] [VK_KERNEL] Name: silu_f32 | Elements: [512, 512, 83] | Dispatch Grid: [1, 512, 83] [VK_KERNEL] Name: matmul_q8_0_f32_f16acc_aligned_l | Elements: [10240, 2112, 1] | Dispatch Grid: [80, 17, 1]
[VK_KERNEL] Name: mul_f32_f32_f32_norepeat | Elements: [512, 512, 83] | Dispatch Grid: [1, 512, 83] [VK_KERNEL] Name: scale_f32 | Elements: [512, 512, 83] | Dispatch Grid: [1, 512, 83] [VK_KERNEL] Name: matmul_q8_0_f32_aligned_l | Elements: [3840, 2112, 1] | Dispatch Grid: [30, 17, 1] [VK_KERNEL] Name: scale_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: rms_norm_mul_f32 | Elements: [2112, 1, 1] | Dispatch Grid: [2112, 1, 1] [VK_KERNEL] Name: contig_cpy_f32_f32 | Elements: [512, 8, 1] | Dispatch Grid: [1, 8, 1] [VK_KERNEL] Name: tanh_f32 | Elements: [512, 8, 1] | Dispatch Grid: [1, 8, 1] [VK_KERNEL] Name: mul_f32_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: add_f32_f32_f32_norepeat | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: rms_norm_mul_f32 | Elements: [2112, 1, 1] | Dispatch Grid: [2112, 1, 1] [VK_KERNEL] Name: mul_mat_vec_q8_0_f32_f32 | Elements: [15360, 1, 1] | Dispatch Grid: [15360, 1, 1] [VK_KERNEL] Name: contig_cpy_f32_f32 | Elements: [512, 8, 1] | Dispatch Grid: [1, 8, 1] [VK_KERNEL] Name: mul_f32_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: add_f32_f32_f32_norepeat | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: matmul_q8_0_f32_f16acc_aligned_l | Elements: [11520, 2112, 1] | Dispatch Grid: [90, 17, 1] [VK_KERNEL] Name: rms_norm_mul_f32 | Elements: [30, 2112, 1] | Dispatch Grid: [30, 2112, 1] [VK_KERNEL] Name: cpy_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: cpy_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: repeat_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: cpy_f32_f32 | Elements: [512, 512, 3] | Dispatch Grid: [1, 512, 3] [VK_KERNEL] Name: mul_f32_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: repeat_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: mul_f32_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: add_f32_f32_f32_norepeat | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: rms_norm_mul_f32 | Elements: [30, 2112, 1] | Dispatch Grid: [30, 2112, 1] [VK_KERNEL] Name: cpy_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: cpy_f32_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: repeat_f32 | Elements: [512, 512, 31] | Dispatch Grid: [1, 512, 31] [VK_KERNEL] Name: cpy_f32_f32 | Elements: [512, 512, 3] | Dispatch Grid: [1, 512, 3] MESA: warning: ../src/freedreno/vulkan/tu_knl_kgsl.cc:1592: submit failed: Resource deadlock avoided (VK_ERROR_DEVICE_LOST) MESA: debug: ../src/vulkan/runtime/vk_device.c:402: Timeline mode is EMULATED. terminate called after throwing an instance of 'vk::DeviceLostError' what(): vk::Queue::submit: ErrorDeviceLost

[53458.382975] [ T1315] [drm:sde_fence_dump:846] [sde error]fence drv name:kgsl-timeline timeline name:kgsl-3d0_26-com.termux(22348)-R seqno:0x8d timeline:140 queued:142 retired:140 signaled:0x0 status:0 flags:0x4 [53460.432721] [ C6] kgsl kgsl-3d0: kgsl: possible gpu syncpoint deadlock for context 3 timestamp 0 [53460.432731] [ C6] kgsl kgsl-3d0: context[3]: queue=4218015360, submit=4218002880, start=4218002880, retire=4218002880 [53460.432733] [ C6] kgsl kgsl-3d0: possible deadlock. Context 3 might be blocked for itself [53460.432734] [ C6] kgsl kgsl-3d0: context[3]: submit times: [53460.432735] [ C6] kgsl kgsl-3d0: pending events: [53460.432737] [ C6] kgsl kgsl-3d0: [0] FENCE kgsl-timeline kgsl-3d0_26-com.termux(22348)-R: 141 [53460.432739] [ C6] kgsl kgsl-3d0: --gpu syncpoint deadlock print end-- [53461.454945] [ T1315] [drm:sde_fence_dump:846] [sde error]fence drv name:kgsl-timeline timeline name:kgsl-3d0_26-com.termux(22348)-R seqno:0x8d timeline:140 queued:142 retired:140 signaled:0x0 status:0 flags:0x4 [53462.912002] [ T1281] kgsl kgsl-3d0: Fault id:2 and GX is ON [53462.912006] [ T1281] kgsl kgsl-3d0: sd-cli[17997]: ctx 49 ctx_type ANY ts 1018 policy 2 [53462.912009] [ T1281] kgsl kgsl-3d0: sd-cli[17997]: cmdline: /home/sed/AI/stable-diffusion.cpp/build/bin/sd-cli --clip_l /home/sed/AI/stable-diffusion/clip_l.safetensors --clip_g /home/sed/AI/stable-diffusion/clip_g-Q8_0.gguf --vae /home/sed/AI/stable-diffusion/diffusion_pytorch_model.safetensors --diffusion-model /home/sed/AI/stable-diffusion/beyondREALITY_zTURBOREBUILDV30.q8_0.gguf --llm /home/sed/AI/stable-diffusion/Qwen3-4B-Thinking-2507-abliterated.Q4_K_M.gguf -p cat -H 1024 -W 512 -n (worst quality, low quality:2), watermark, ((text)), bad anatomy, ((bad hand)), extra hands, extra fingers, fused fingers, bad arm, extra arms, fused arms, extra legs, missing leg, extra nipples, liquid hand, inverted hand, disembodied limb, oversized head, extra body, completely nude, extra navel, (hair between eyes), duplicate, huge eyes, logo, worst face, ( [53462.912035] [ T1281] kgsl kgsl-3d0: status 00FE1107 gfx_status 00FC1106 gfx_br_status 00FC1146 gfx_bv_status 00880004 [53462.912037] [ T1281] kgsl kgsl-3d0: BR: rb 16e6/1de7 ib1 00000042C79F1000/053b ib2 00000042C78AA400/0000 ib3 000000400219D268/0000 [53462.912039] [ T1281] kgsl kgsl-3d0: BV: rb 16ea/1de7 ib1 00000042C79F1000/0000 ib2 0001FF8002A35190/0000 ib3 00000040E178F000/0000 [53464.500773] [ T1281] kgsl kgsl-3d0: GPU snapshot froze 816Kb of GPU buffers [53464.500783] [ T1281] kgsl kgsl-3d0: GPU snapshot created at pa ee000000++0x4b0bf0 [53464.500788] [ T1281] kgsl kgsl-3d0: falut=TIMEOUTFAULT, pid=17997, processname=sd-cli [53464.524435] [ T2915] kgsl kgsl-3d0: snapshot: objects released

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions