Skip to content

gguf_init_from_file: failed to open GGUF file './models/phi3-mini.gguf'Β #200

@tiansiyuan

Description

@tiansiyuan

πŸ› Bug Description

./shimmy-macos-intel serve &                  
[1] 14955
➜  Downloads 🎯 Shimmy v1.9.0
πŸ”§ Backend: CPU (no GPU acceleration)
πŸ“¦ Models: 0 available
πŸš€ Starting server on 127.0.0.1:11435
πŸ“¦ Models: 1 available
βœ… Ready to serve requests
   β€’ POST /api/generate (streaming + non-streaming)
   β€’ GET  /health (health check + metrics)
   β€’ GET  /v1/models (OpenAI-compatible)

./shimmy-macos-intel list
πŸ“‹ Registered Models:
  phi3-lora => "./models/phi3-mini.gguf"

βœ… Total available models: 1

curl -s http://127.0.0.1:11435/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
        "model":"REPLACE_WITH_MODEL_FROM_list",
        "messages":[{"role":"user","content":"Say hi in 5 words."}],
        "max_tokens":32
      }' | jq -r '.choices[0].message.content'
null
➜  Downloads curl -s http://127.0.0.1:11435/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
        "model":"phi3-lora",                   
        "messages":[{"role":"user","content":"Say hi in 5 words."}],
        "max_tokens":32
      }' | jq -r '.choices[0].message.content'
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 16.154 sec
ggml_metal_device_init: GPU name:   Intel(R) Iris(TM) Graphics 6100
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon2 (3002)
ggml_metal_device_init: simdgroup reduction   = false
ggml_metal_device_init: simdgroup matrix mul. = false
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  =  1610.61 MB
llama_model_load_from_file_impl: using device Metal (Intel(R) Iris(TM) Graphics 6100) (unknown id) - 1536 MiB free
gguf_init_from_file: failed to open GGUF file './models/phi3-mini.gguf'
llama_model_load: error loading model: llama_model_loader: failed to load model from ./models/phi3-mini.gguf
llama_model_load_from_file_impl: failed to load model
2026-05-06T02:20:40.093140Z ERROR shimmy::openai_compat: Failed to load model 'phi3-lora': null result from llama cpp

πŸ”„ Steps to Reproduce

As shown above.

βœ… Expected Behavior

Decent output.

❌ Actual Behavior

Can not find model file.

πŸ“¦ Shimmy Version

Latest (main branch)

πŸ’» Operating System

macOS

πŸ“₯ Installation Method

Pre-built binary from releases

🌍 Environment Details

No response

πŸ“‹ Logs/Error Messages


πŸ“ Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions