Skip to content

unknown model architecture: 'qwen35' #182

@eonun

Description

@eonun

🐛 Bug Description

Unable to load the qwen3.5 model

./shimmy-linux-x86_64 -V
shimmy 1.9.0
./shimmy-linux-x86_64 --model-dirs /home/games/models/qwen3_5_27B/ bench qwen3.5-27b-ud-q2-k-xl
llama_model_loader: loaded meta data with 49 key-value pairs and 851 tensors from /home/games/models/qwen3_5_27B/Qwen3.5-27B-UD-Q2_K_XL.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen35
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                     general.sampling.top_k i32              = 20
llama_model_loader: - kv   3:                     general.sampling.top_p f32              = 0.950000
llama_model_loader: - kv   4:                      general.sampling.temp f32              = 0.600000
llama_model_loader: - kv   5:                               general.name str              = Qwen3.5-27B
llama_model_loader: - kv   6:                           general.basename str              = Qwen3.5-27B
llama_model_loader: - kv   7:                       general.quantized_by str              = Unsloth
llama_model_loader: - kv   8:                         general.size_label str              = 27B
llama_model_loader: - kv   9:                            general.license str              = apache-2.0
llama_model_loader: - kv  10:                       general.license.link str              = https://huggingface.co/Qwen/Qwen3.5-2...
llama_model_loader: - kv  11:                           general.repo_url str              = https://huggingface.co/unsloth
llama_model_loader: - kv  12:                   general.base_model.count u32              = 1
llama_model_loader: - kv  13:                  general.base_model.0.name str              = Qwen3.5 27B
llama_model_loader: - kv  14:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  15:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen3.5-27B
llama_model_loader: - kv  16:                               general.tags arr[str,3]       = ["qwen3_5_moe", "unsloth", "image-tex...
llama_model_loader: - kv  17:                         qwen35.block_count u32              = 64
llama_model_loader: - kv  18:                      qwen35.context_length u32              = 262144
llama_model_loader: - kv  19:                    qwen35.embedding_length u32              = 5120
llama_model_loader: - kv  20:                 qwen35.feed_forward_length u32              = 17408
llama_model_loader: - kv  21:                qwen35.attention.head_count u32              = 24
llama_model_loader: - kv  22:             qwen35.attention.head_count_kv u32              = 4
llama_model_loader: - kv  23:             qwen35.rope.dimension_sections arr[i32,4]       = [11, 11, 10, 0]
llama_model_loader: - kv  24:                      qwen35.rope.freq_base f32              = 10000000.000000
llama_model_loader: - kv  25:    qwen35.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  26:                qwen35.attention.key_length u32              = 256
llama_model_loader: - kv  27:              qwen35.attention.value_length u32              = 256
llama_model_loader: - kv  28:                     qwen35.ssm.conv_kernel u32              = 4
llama_model_loader: - kv  29:                      qwen35.ssm.state_size u32              = 128
llama_model_loader: - kv  30:                     qwen35.ssm.group_count u32              = 16
llama_model_loader: - kv  31:                  qwen35.ssm.time_step_rank u32              = 48
llama_model_loader: - kv  32:                      qwen35.ssm.inner_size u32              = 6144
llama_model_loader: - kv  33:             qwen35.full_attention_interval u32              = 4
llama_model_loader: - kv  34:                qwen35.rope.dimension_count u32              = 64
llama_model_loader: - kv  35:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  36:                         tokenizer.ggml.pre str              = qwen35
llama_model_loader: - kv  37:                      tokenizer.ggml.tokens arr[str,248320]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  38:                  tokenizer.ggml.token_type arr[i32,248320]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  39:                      tokenizer.ggml.merges arr[str,247587]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  40:                tokenizer.ggml.eos_token_id u32              = 248046
llama_model_loader: - kv  41:            tokenizer.ggml.padding_token_id u32              = 248055
llama_model_loader: - kv  42:                    tokenizer.chat_template str              = {%- set image_count = namespace(value...
llama_model_loader: - kv  43:               general.quantization_version u32              = 2
llama_model_loader: - kv  44:                          general.file_type u32              = 10
llama_model_loader: - kv  45:                      quantize.imatrix.file str              = Qwen3.5-27B-GGUF/imatrix_unsloth.gguf
llama_model_loader: - kv  46:                   quantize.imatrix.dataset str              = unsloth_calibration_Qwen3.5-27B.txt
llama_model_loader: - kv  47:             quantize.imatrix.entries_count u32              = 496
llama_model_loader: - kv  48:              quantize.imatrix.chunks_count u32              = 80
llama_model_loader: - type  f32:  353 tensors
llama_model_loader: - type q8_0:   96 tensors
llama_model_loader: - type q2_K:  174 tensors
llama_model_loader: - type q3_K:   81 tensors
llama_model_loader: - type q4_K:    4 tensors
llama_model_loader: - type q5_K:   48 tensors
llama_model_loader: - type q6_K:    1 tensors
llama_model_loader: - type iq2_xs:    4 tensors
llama_model_loader: - type iq3_xxs:   54 tensors
llama_model_loader: - type iq3_s:   18 tensors
llama_model_loader: - type iq2_s:   12 tensors
llama_model_loader: - type iq4_xs:    6 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q2_K - Medium
print_info: file size   = 10.43 GiB (3.33 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
llama_model_load_from_file_impl: failed to load model
Error: null result from llama cpp

🔄 Steps to Reproduce

./shimmy-linux-x86_64 --model-dirs /home/games/models/qwen3_5_27B/ bench qwen3.5-27b-ud-q2-k-xl

✅ Expected Behavior

Successfully loaded the qwen3.5 model

❌ Actual Behavior

无法加载qwen3.5的模型

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
llama_model_load_from_file_impl: failed to load model
Error: null result from llama cpp

📦 Shimmy Version

Latest (main branch)

💻 Operating System

Linux (other)

📥 Installation Method

Pre-built binary from releases

🌍 Environment Details

neofetch
                     ./o.                  me@swiftsfx
                   ./sssso-                --------------
                 `:osssssss+-              OS: EndeavourOS Linux x86_64
               `:+sssssssssso/.            Host: Swift SFX14-41G V1.10
             `-/ossssssssssssso/.          Kernel: 6.18.16-1-lts
           `-/+sssssssssssssssso+:`        Uptime: 21 mins
         `-:/+sssssssssssssssssso+/.       Packages: 2087 (pacman)
       `.://osssssssssssssssssssso++-      Shell: zsh 5.9
      .://+ssssssssssssssssssssssso++:     Resolution: 1920x1080
    .:///ossssssssssssssssssssssssso++:    DE: Cinnamon 6.6.7
  `:////ssssssssssssssssssssssssssso+++.   WM: Mutter (Muffin)
`-////+ssssssssssssssssssssssssssso++++-   WM Theme: cinnamon (Adwaita)
 `..-+oosssssssssssssssssssssssso+++++/`   Theme: Flat-Remix-GTK-Blue-Dark-Solid [GTK2/3]
   ./++++++++++++++++++++++++++++++/:.     Icons: Tela-circle-black [GTK2/3]
  `:::::::::::::::::::::::::------``       Terminal: alacritty
                                           CPU: AMD Ryzen 7 5800U with Radeon Graphics (16) @ 4.508GHz
                                           GPU: AMD ATI Radeon Vega Series / Radeon Vega Mobile Series
                                           Memory: 5939MiB / 15331MiB


./shimmy-linux-x86_64 -V
shimmy 1.9.0


./shimmy-linux-x86_64 --gpu-backend auto gpu-info
🖥️  GPU Backend Information

🔧 llama.cpp Backend: CPU
📋 Available GPU Features:
  ❌ CUDA support disabled
  ❌ Vulkan support disabled
  ❌ OpenCL support disabled


📋 Logs/Error Messages

No response

📝 Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions