Skip to content

vulkan: perf_logger improvements#17672

Merged
0cc4m merged 2 commits into
ggml-org:masterfrom
jeffbolznv:perf_logger
Dec 6, 2025
Merged

vulkan: perf_logger improvements#17672
0cc4m merged 2 commits into
ggml-org:masterfrom
jeffbolznv:perf_logger

Conversation

@jeffbolznv
Copy link
Copy Markdown
Contributor

  • Move perf_logger from device to ctx.
  • Add an env var to control the frequency we dump the stats. If you set a very large value, it just dumps when the ctx is destroyed.
  • Add a fusion info string to the tracking, only log one item per fused op.
  • Fix MUL_MAT_ID flops calculation.

@jeffbolznv jeffbolznv requested a review from 0cc4m as a code owner December 2, 2025 01:50
@github-actions github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Dec 2, 2025
@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Dec 2, 2025

Please resolve the conflict.

@jeffbolznv
Copy link
Copy Markdown
Contributor Author

Rebased.

- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed.
- Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.
@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Dec 6, 2025

There's another conflict, but more importantly I'm getting a segfault with Qwen3-Next-80B-A3B-Instruct-Q4_0:

Core was generated by `build_vk_debug/bin/llama-bench -m models/Qwen3-Next-80B-A3B-Instruct-Q4_0.gguf -fa 1 --mmap 0'.
Program terminated with signal SIGABRT, Aborted.

#0  0x00007f8f3c49890c in ?? () from /usr/lib/libc.so.6
#1  0x00007f8f3c43e3a0 in raise () from /usr/lib/libc.so.6
#2  0x00007f8f3c42557a in abort () from /usr/lib/libc.so.6
#3  0x00007f8f3c89a41f in std::__glibcxx_assert_fail (file=<optimized out>, line=<optimized out>, function=<optimized out>, condition=<optimized out>) at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/assert_fail.cc:41
#4  0x00007f8f4075dc96 in std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >::operator[] (this=0x55fdfca3d2d0, __n=3) at /usr/include/c++/15.2.1/bits/stl_vector.h:1263
#5  0x00007f8f3ce523cc in ggml_backend_vk_graph_compute (backend=0x55fdfca30890, cgraph=0x55fdfca378a8) at ggml/src/ggml-vulkan/ggml-vulkan.cpp:13173
#6  0x00007f8f4036bc25 in ggml_backend_graph_compute_async (backend=0x55fdfca30890, cgraph=0x55fdfca378a8) at ggml/src/ggml-backend.cpp:359
#7  0x00007f8f40370aa4 in ggml_backend_sched_compute_splits (sched=0x55fdfc98b840) at ggml/src/ggml-backend.cpp:1575
#8  0x00007f8f403718c3 in ggml_backend_sched_graph_compute_async (sched=0x55fdfc98b840, graph=0x7f8f28d70030) at ggml/src/ggml-backend.cpp:1784
#9  0x00007f8f4079ba93 in llama_context::graph_compute (this=0x55fdff05c900, gf=0x7f8f28d70030, batched=true) at src/llama-context.cpp:1488
#10 0x00007f8f407986e9 in llama_context::process_ubatch (this=0x55fdff05c900, ubatch=..., gtype=LLM_GRAPH_TYPE_DECODER, mctx=0x55fdfca2d350, ret=@0x7ffc0c26da58: GGML_STATUS_SUCCESS) at src/llama-context.cpp:809
#11 0x00007f8f40799bef in llama_context::decode (this=0x55fdff05c900, batch_inp=...) at src/llama-context.cpp:1113
#12 0x00007f8f407a091d in llama_decode (ctx=0x55fdff05c900, batch=...) at src/llama-context.cpp:2780
#13 0x000055fde84babcb in test_prompt (ctx=0x55fdff05c900, n_prompt=512, n_batch=2048, n_threads=16) at tools/llama-bench/llama-bench.cpp:1945
#14 0x000055fde84bb882 in main (argc=7, argv=0x7ffc0c26ea38) at tools/llama-bench/llama-bench.cpp:2125

@jeffbolznv
Copy link
Copy Markdown
Contributor Author

I'm not able to reproduce the crash locally, and I don't see the merge conflict in the UI. The crash sounds a lot like what I fixed in e3f771b, but I can't correlate the line number in your call stack to anything. Are you sure you were on the latest commit?

@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Dec 6, 2025

You are right, I didn't notice that it failed updating the PR-branch and I tested only the first commit. My bad, sorry.

@0cc4m 0cc4m merged commit db97837 into ggml-org:master Dec 6, 2025
77 of 78 checks passed
JayZenith pushed a commit to JayZenith/llama.cpp that referenced this pull request Dec 7, 2025
* vulkan: perf_logger improvements

- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed.
- Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.

* fix vector sizes
0Marble pushed a commit to 0Marble/llama.cpp that referenced this pull request Dec 18, 2025
* vulkan: perf_logger improvements

- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed.
- Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.

* fix vector sizes
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
* vulkan: perf_logger improvements

- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed.
- Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.

* fix vector sizes
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* vulkan: perf_logger improvements

- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed.
- Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.

* fix vector sizes
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* vulkan: perf_logger improvements

- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed.
- Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.

* fix vector sizes
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
* vulkan: perf_logger improvements

- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed.
- Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.

* fix vector sizes
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
* vulkan: perf_logger improvements

- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed.
- Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.

* fix vector sizes
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
* vulkan: perf_logger improvements

- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed.
- Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.

* fix vector sizes
phibya pushed a commit to ziee-ai/llama.cpp that referenced this pull request May 29, 2026
* vulkan: perf_logger improvements

- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed.
- Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.

* fix vector sizes
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
* vulkan: perf_logger improvements

- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed.
- Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.

* fix vector sizes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants