Usage of models

Overview

In the following a table of models tested using Ollama together with a first judgement regarding their usability in this context (Column "T" signals "has tool support"):

Model	T	Judgment
llama3.2:3b	x	Works, but quality is poor
llama3.1:8b	x	Hallucinates quickly
mistral:latest	-
mistral-nemo:12b	x	Slow, does prefix JSON response
gemma3n:latest	-
gemma3:4b	-
codellama:latest	-
qwen3:8b	x	Good reasoning, slow , too creative
qwen3:4b	x	Bigger context, but too creative
devstral:latest	x	Good enough
codeqwen:latest	-
qwen2.5-coder:3b	-
qwen2.5:7b	x	OK, but small context
qwen3:1.7b	x	Too much trial, not enough reasoning
nemotron-mini:4b	-
phi4-mini:latest	-
granite3.3:8b	-
granite3-dense:8b	-
deepseek-r1:latest	-
fl0id/teuken-7b-instruct-commercial-v0.4:latest	-
cogito:8b	?	untested
hermes3:8b	?	untested
gpt-oss:20b	x

Current suggestion

We initially mostly used qwen2.5:7b but are not always happy with the result, possible due to the small context.

We are also experimenting with qwen3:4b, which has a bigger context, but does strange analysis of the sources on its own.

We have fine result with gpt-oss:20b, but execution is slower and the model bigger. However, we do not se hallucination or creative solution strategies like unforced tool execution or not calling tool where appropriate like other bigger models o.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage of models

Overview

Current suggestion

FilesExpand file tree

Models.md

Latest commit

History

Models.md

File metadata and controls

Usage of models

Overview

Current suggestion