Skip to content

Win build improvements#940

Merged
aittalam merged 9 commits into
mainfrom
win_build_improvements
Apr 17, 2026
Merged

Win build improvements#940
aittalam merged 9 commits into
mainfrom
win_build_improvements

Conversation

@aittalam

@aittalam aittalam commented Apr 9, 2026

Copy link
Copy Markdown
Member

A WIP PR to work on some improvements re: the windows support for GPU acceleration.

  • easier .bat script (with auto-detection of VS tooling)
  • fixes to build with CUDA13
  • fixes to ggml-vulkan (running single-threaded shaders compilation to avoid breaking issues with cosmocc+win)
  • hide low-level logging unless --verbose, made calls more consistent

@aittalam

aittalam commented Apr 9, 2026

Copy link
Copy Markdown
Member Author

@wingenlit I think I might have been able to replicate your issue (vulkan build succeeded, GPU found, model loading but then coredump as soon as some inference was called). I left some notes here (I investigated the issue with Claude, so while the methodology was relatively sound -delving deeper and deeper into the error with debugging code- there might be some mistakes in how it explains things).

Vulkan now seems to be working (tested with llama-benchy), even if pp2048 is quite slower than CUDA.

uvx llama-benchy --base-url http://localhost:8080 --model Qwen3.5-9B_Q5_K_S --tokenizer Qwen/Qwen3.5-9B --runs 10

model test t/s peak t/s ttfr (ms) est_ppt (ms) e2e_ttft (ms)
Qwen3.5-9B_Q5_K_S Vulkan pp2048 630.55 ± 42.61 3265.10 ± 206.56 3263.33 ± 206.56 3265.15 ± 206.56
Qwen3.5-9B_Q5_K_S Vulkan tg32 48.27 ± 2.19 50.12 ± 2.70
Qwen3.5-9B_Q5_K_S Cuda pp2048 1127.93 ± 101.46 1832.72 ± 162.31 1831.07 ± 162.31 1832.77 ± 162.30
Qwen3.5-9B_Q5_K_S Cuda tg32 35.80 ± 0.61 37.01 ± 0.61

@aittalam aittalam force-pushed the win_build_improvements branch from d25caed to ace7f51 Compare April 10, 2026 12:39
@aittalam aittalam marked this pull request as ready for review April 17, 2026 08:07
@aittalam

Copy link
Copy Markdown
Member Author

Code review

Found 1 issue:

  1. The new synchronous shader-compile path in ggml_vk_load_shaders removes compile_count++ before calling ggml_vk_create_pipeline_func, but the callee still runs assert(compile_count > 0); compile_count--; inside its lock. In debug builds the assert fires on the very first call; in release builds the uint32_t decrement underflows from 0 to UINT32_MAX, poisoning the counter for any future use. Either keep the compile_count++ in the caller, or (consistent with the deleted // TODO: We're no longer benefitting from the async compiles ... this complexity can be removed.) also patch ggml_vk_create_pipeline_func to drop the counter/mutex/cond-var bookkeeping.

@@ -3391,20 +3391,9 @@ static void ggml_vk_load_shaders(vk_device& device) {
if (!pipeline->needed || pipeline->compiled) {
continue;
}
- // TODO: We're no longer benefitting from the async compiles (shaders are
- // compiled individually, as needed) and this complexity can be removed.
- {
- // wait until fewer than N compiles are in progress
- uint32_t N = std::max(1u, std::thread::hardware_concurrency());
- std::unique_lock<std::mutex> guard(compile_count_mutex);
- while (compile_count >= N) {
- compile_count_cond.wait(guard);
- }
- compile_count++;
- }
-
- compiles.push_back(std::async(ggml_vk_create_pipeline_func, std::ref(device), std::ref(pipeline), spv_size, spv_data, entrypoint,
- parameter_count, wg_denoms, specialization_constants, disable_robustness, require_full_subgroups, required_subgroup_size));
+ // Compile synchronously to avoid threading issues in cross-module DLL loading
+ ggml_vk_create_pipeline_func(device, pipeline, spv_size, spv_data, entrypoint,
+ parameter_count, wg_denoms, specialization_constants, disable_robustness, require_full_subgroups, required_subgroup_size);
}
};

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@aittalam

aittalam commented Apr 17, 2026

Copy link
Copy Markdown
Member Author

Code review

Found 1 issue:

Addessed in 86e8378

@aittalam aittalam merged commit f643524 into main Apr 17, 2026
3 checks passed
@aittalam aittalam deleted the win_build_improvements branch April 17, 2026 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant