Ollama runs local large language models behind a simple HTTP API.
This stack starts the Ollama server and persists downloaded models.
flowchart LR
Client([Client]) -->|:11434| Ollama[Ollama API]
Ollama --> Models[(./data models)]
ollama servestarts the inference API on port11434.- Models are stored in the mounted
./datadirectory. - Clients call
/api/generateor/api/chatto run prompts. - You can pull models inside the running container.
- Image:
ollama/ollama - Command:
serve - API endpoint:
http://<host-ip>:11434 - Persistent data:
./data:/root/.ollama
- Network:
ollama_network
Set in compose directly:
OLLAMA_HOST=0.0.0.0:11434OLLAMA_MODELS=/root/.ollama/modelsOLLAMA_DATA=/root/.ollamaOLLAMA_NO_CLOUD=trueOLLAMA_NUM_PARALLEL,OLLAMA_MAX_QUEUE, etc.
From the repository root:
cd ollama
docker compose up -dPull a model:
docker exec -it ollama-ollama-1 /bin/sh
ollama pull llama3.2Useful commands:
docker compose ps
docker compose logs -f
docker compose restart
docker compose down- First model pull can take time depending on model size.
- If container name differs in your environment, check with
docker psbeforedocker exec.