27b

Here is 1 public repository matching this topic...

hec-ovi / vllm-awq4-qwen

vLLM Qwen 3.6-27B (AWQ-INT4) + DFlash speculative decoding on AMD Strix Halo (gfx1151 iGPU, 128 GB UMA, ROCm 7.13). 24.8 t/s single-stream, vision, tool calling, 256K context, OpenAI-compatible, Docker. Matches DGX Spark FP8+DFlash+MTP at a third of the cost. No CUDA.

docker rocm openai-api awq vllm llm-inference speculative-decoding multimodal-llm qwen3 gfx1151 ryzen-ai-max dflash amd-strix-halo rdna35 27b

Updated Apr 27, 2026
Python

Improve this page

Add a description, image, and links to the 27b topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the 27b topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

27b

Here is 1 public repository matching this topic...

hec-ovi / vllm-awq4-qwen

Improve this page

Add this topic to your repo