convert_to_ollama

Small utility repo to turn a Hugging Face safetensors checkpoint into a quantized GGUF model and register it with Ollama.

The script uses:

huggingface_hub to download only safetensors/model metadata from HF
llama.cpp to convert HF -> GGUF and quantize
ollama create with a generated Modelfile to make the model runnable through Ollama

Requirements

System tools:

Python 3.10+
Git
CMake + a C/C++ compiler
Ollama CLI/app if you want the script to create and run the model

Python packages:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

If you need gated/private Hugging Face models, set a token:

export HF_TOKEN=hf_...

Quick start

Use a small safetensors model first to validate the pipeline:

python convert_to_ollama.py Qwen/Qwen2.5-0.5B-Instruct \
  --quant Q4_K_M \
  --ollama-name qwen2.5-0.5b-q4 \
  --smoke-prompt "Say hello in one short sentence."

Then run it:

ollama run qwen2.5-0.5b-q4

What the script does

Clones ggerganov/llama.cpp into ./llama.cpp if it is not already present.
Builds the llama-quantize binary with CMake.
Downloads the requested HF model snapshot into ./downloads/<model>/, restricted to .safetensors and tokenizer/config files.
Runs llama.cpp/convert_hf_to_gguf.py to produce an intermediate GGUF file in ./models/.
Runs llama-quantize to create a quantized GGUF file.
Writes ./models/Modelfile.<ollama-name>.
Runs ollama create <ollama-name> -f <modelfile> unless --no-ollama-create is set.

Common examples

Create a Q5 quantized model but skip Ollama registration:

python convert_to_ollama.py Qwen/Qwen2.5-1.5B-Instruct \
  --quant Q5_K_M \
  --no-ollama-create

Use an existing llama.cpp checkout:

python convert_to_ollama.py meta-llama/Llama-3.2-1B-Instruct \
  --llama-cpp-dir ~/src/llama.cpp \
  --ollama-name llama3.2-1b-q4

Pass extra converter flags through to llama.cpp:

python convert_to_ollama.py some-org/some-model \
  --convert-arg=--verbose \
  --convert-arg=--model-name \
  --convert-arg=some-model

Output layout

.
├── convert_to_ollama.py
├── requirements.txt
├── downloads/              # ignored by git
├── llama.cpp/              # ignored by git
└── models/                 # ignored by git
    ├── <model>.f16.gguf
    ├── <model>.q4_k_m.gguf
    └── Modelfile.<name>

Notes

This script deliberately refuses to proceed if the HF snapshot does not contain .safetensors files.
GGUF conversion support depends on upstream llama.cpp; if a model architecture is not supported there, conversion will fail with the upstream error.
Larger models require substantial disk/RAM during conversion. Start with a small model to verify the toolchain.
On macOS, the script enables llama.cpp Metal support during CMake configuration.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
models		models
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
convert_to_ollama.py		convert_to_ollama.py
requirements.txt		requirements.txt
run_conversion.sh		run_conversion.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

convert_to_ollama

Requirements

Quick start

What the script does

Common examples

Output layout

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

convert_to_ollama

Requirements

Quick start

What the script does

Common examples

Output layout

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages