Small utility repo to turn a Hugging Face safetensors checkpoint into a quantized GGUF model and register it with Ollama.
The script uses:
huggingface_hubto download only safetensors/model metadata from HFllama.cppto convert HF -> GGUF and quantizeollama createwith a generated Modelfile to make the model runnable through Ollama
System tools:
- Python 3.10+
- Git
- CMake + a C/C++ compiler
- Ollama CLI/app if you want the script to create and run the model
Python packages:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtIf you need gated/private Hugging Face models, set a token:
export HF_TOKEN=hf_...Use a small safetensors model first to validate the pipeline:
python convert_to_ollama.py Qwen/Qwen2.5-0.5B-Instruct \
--quant Q4_K_M \
--ollama-name qwen2.5-0.5b-q4 \
--smoke-prompt "Say hello in one short sentence."Then run it:
ollama run qwen2.5-0.5b-q4- Clones
ggerganov/llama.cppinto./llama.cppif it is not already present. - Builds the
llama-quantizebinary with CMake. - Downloads the requested HF model snapshot into
./downloads/<model>/, restricted to.safetensorsand tokenizer/config files. - Runs
llama.cpp/convert_hf_to_gguf.pyto produce an intermediate GGUF file in./models/. - Runs
llama-quantizeto create a quantized GGUF file. - Writes
./models/Modelfile.<ollama-name>. - Runs
ollama create <ollama-name> -f <modelfile>unless--no-ollama-createis set.
Create a Q5 quantized model but skip Ollama registration:
python convert_to_ollama.py Qwen/Qwen2.5-1.5B-Instruct \
--quant Q5_K_M \
--no-ollama-createUse an existing llama.cpp checkout:
python convert_to_ollama.py meta-llama/Llama-3.2-1B-Instruct \
--llama-cpp-dir ~/src/llama.cpp \
--ollama-name llama3.2-1b-q4Pass extra converter flags through to llama.cpp:
python convert_to_ollama.py some-org/some-model \
--convert-arg=--verbose \
--convert-arg=--model-name \
--convert-arg=some-model.
├── convert_to_ollama.py
├── requirements.txt
├── downloads/ # ignored by git
├── llama.cpp/ # ignored by git
└── models/ # ignored by git
├── <model>.f16.gguf
├── <model>.q4_k_m.gguf
└── Modelfile.<name>
- This script deliberately refuses to proceed if the HF snapshot does not contain
.safetensorsfiles. - GGUF conversion support depends on upstream
llama.cpp; if a model architecture is not supported there, conversion will fail with the upstream error. - Larger models require substantial disk/RAM during conversion. Start with a small model to verify the toolchain.
- On macOS, the script enables llama.cpp Metal support during CMake configuration.