Summary: Ollama is a free, open-source local LLM runtime that turns a standard workstation into an AI assistant. It installs with a single command, runs models locally with no internet dependency after setup, and exposes an OpenAI-compatible API for tool integration. This guide walks through installation, first model pull, first prompt, and setting up a browser-based chat interface -- all within the first 30 minutes.
- What Is Ollama
- First 30 Minutes Quickstart
- Why Ollama Over Alternatives
- GUI Options for Non-CLI Users
- Post-Setup Verification
- Key Links
- References
Ollama is an open-source local LLM runtime released under the MIT License. It provides a single binary that handles model management, inference, and API serving -- no complex configuration, no Python environments, no dependency management.
| Feature | Detail |
|---|---|
| License | MIT (fully open source, no usage restrictions) |
| Cost | Free -- no subscription, no per-query charges, no vendor lock-in |
| Installation | Single command on Linux/macOS; installer on Windows |
| Model library | 200+ models available through ollama pull |
| API | OpenAI-compatible REST API on localhost:11434 |
| GPU support | NVIDIA CUDA (primary), AMD ROCm (experimental), Apple Metal (macOS) |
| Internet required | Only for initial model download; fully offline after setup |
- Downloads and manages models -- A single
ollama pullcommand downloads a model and handles all file management - Runs inference locally -- Prompts and responses never leave the machine
- Serves an API -- Any tool that speaks the OpenAI API format can connect to Ollama
- Manages GPU memory -- Automatically loads and unloads models based on available VRAM
- No built-in authentication -- Anyone with network access to port 11434 can query the model (see CJIS Compliance for security hardening)
- No audit logging -- Prompts and responses are not logged by default
- No user interface -- Ollama is a command-line and API tool; a separate interface is needed for chat-style interaction
- No encryption -- API traffic is HTTP by default, not HTTPS
These gaps are addressable with additional tooling. See CJIS Compliance for the complete security hardening framework.
- A workstation meeting the specifications in the Hardware Guide (minimum: 16GB VRAM GPU recommended)
- Administrator/root access for installation
- Internet connectivity for initial setup (models are typically 4-8 GB each)
- Approximately 20 GB of free disk space for Ollama and initial models
Linux (recommended for server/workstation deployment):
curl -fsSL https://ollama.com/install.sh | shThis downloads and installs the Ollama binary, creates a systemd service, and starts Ollama automatically.
macOS:
Download the installer from ollama.com/download or install via Homebrew:
brew install ollamaWindows:
Download the installer from ollama.com/download and run it. The installer creates a system service that starts automatically.
Verify the installation:
ollama --versionExpected output: a version number (e.g., ollama version 0.17.1 or newer). If this command returns a version, installation succeeded.
Security advisory — CVE-2026-7482 ("Bleeding Llama"), CVSS 9.1. Ollama versions before 0.17.1 contain a heap out-of-bounds read in the GGUF model loader. An unauthenticated remote attacker can submit a crafted model through
/api/createand leak Ollama server process memory — including environment variables, API keys, system prompts, and concurrent users' conversation data — then exfiltrate it via/api/push. Upgrade to 0.17.1 or later. Verify withollama --versionafter install. KeepOLLAMA_HOSTbound to127.0.0.1(the default) and never expose port11434to untrusted networks — both/api/createand/api/pushare unauthenticated in the upstream distribution. See also Security Considerations — Software Vulnerability Management.
Download the Llama 3.1 8B model -- a strong general-purpose model that fits comfortably on a 16GB VRAM GPU:
ollama pull llama3.1:8bThis downloads approximately 4.7 GB of model data. Download time depends on internet connection speed.
What to expect:
pulling manifest
pulling 8eeb52dfb3bb... 100% ▕████████████████████████▏ 4.7 GB
pulling 948af2743fc7... 100% ▕████████████████████████▏ 1.5 KB
pulling 0ba8f0e314b4... 100% ▕████████████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕████████████████████████▏ 96 B
pulling 1a4c3c319823... 100% ▕████████████████████████▏ 485 B
verifying sha256 digest
writing manifest
success
Test the model with a simple prompt:
ollama run llama3.1:8b "Summarize the key points of the Fourth Amendment in plain language."The model should generate a clear, readable summary of Fourth Amendment protections. This output is generated entirely on the local machine -- no data was sent to any external server.
Try a more practical example:
ollama run llama3.1:8b "Draft a 3-sentence summary of this incident: On January 15, officers responded to a report of a burglary at 123 Main Street. The complainant stated they left for work at 0700 and returned at 1730 to find the rear door forced open. A television, laptop, and jewelry were reported missing. No witnesses have been identified."Run these verification checks:
Check that the model is available:
ollama listExpected output: a table showing llama3.1:8b with its size and modification date.
Check that the API is responding:
curl http://localhost:11434/api/tagsExpected output: a JSON response listing available models. If this returns data, the Ollama API is running and accessible.
Check GPU utilization (NVIDIA):
nvidia-smiWhen a model is loaded, this should show Ollama using GPU memory. If the model is loaded in CPU mode instead, check that NVIDIA drivers and CUDA are properly installed.
Pull one or two additional models for comparison. See Model Selection for detailed recommendations.
# Strong general-purpose alternative
ollama pull mistral:7b
# Strong reasoning model
ollama pull deepseek-r1:8b
# Lightweight model for fast, simple tasks
ollama pull phi3:miniAt this point, the command-line setup is complete. The system can process prompts entirely offline.
Several open-source tools can run LLMs locally. Ollama is recommended as the starting point for law enforcement deployments based on its combination of simplicity, breadth, and community support.
| Tool | Type | Ease of Setup | Model Library | API Support | Best For |
|---|---|---|---|---|---|
| Ollama | Runtime + model manager | Single command install | 200+ models via ollama pull |
OpenAI-compatible | First deployment; broadest compatibility |
| LM Studio | Desktop GUI application | Download and run | Hugging Face model browser | OpenAI-compatible | Users who want a desktop application |
| llama.cpp | Low-level inference engine | Requires compilation | Manual model download | Custom API | Maximum performance tuning; advanced users |
| vLLM | High-throughput server | Python environment setup | Manual model loading | OpenAI-compatible | Multi-user server deployments at scale |
| LocalAI | API-first runtime | Docker deployment | Multiple format support | OpenAI-compatible | Drop-in replacement for OpenAI API |
- Simplest on-ramp -- One install command, one pull command, one run command. No Python environments, no Docker, no compilation.
- Broadest model library -- The
ollama.com/libraryprovides pre-quantized models ready to pull. No need to navigate Hugging Face, convert model formats, or manage quantization. - Active community -- Large user base means more troubleshooting resources, more tutorials, and faster issue resolution.
- API compatibility -- The OpenAI-compatible API means tools built for ChatGPT's API can often connect to Ollama with a URL change.
- Cross-platform -- Works on Linux, macOS, and Windows with the same commands.
No open-source LLM runtime ships with CJIS compliance built in. The compliance question is not "which tool is compliant" but "which tool is easiest to harden." Ollama's simple architecture -- a single binary serving a REST API -- is straightforward to place behind a reverse proxy with authentication, TLS, and logging. See CJIS Compliance for the complete hardening framework.
Most analysts prefer a familiar chat interface over a command-line terminal. Open WebUI provides a ChatGPT-like browser interface that connects directly to Ollama.
What it is: A self-hosted web interface for Ollama that looks and feels like ChatGPT. It runs as a local web application and connects to the Ollama API.
Key features:
| Feature | Detail |
|---|---|
| Interface | Browser-based chat; supports conversations, history, and multiple models |
| Document upload | Upload files directly into conversations for analysis |
| Model switching | Switch between installed models within the same interface |
| Conversation history | Saves chat history locally (stored in a local database) |
| Multi-user support | Supports multiple user accounts with separate conversation histories |
| RAG integration | Built-in document ingestion for retrieval-augmented generation |
| License | MIT (open source) |
Option 1: Docker (recommended)
If Docker is installed:
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host.docker.internal \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:mainThen open http://localhost:3000 in a browser. Create an admin account on first launch.
Option 2: pip install (if Docker is unavailable)
pip install open-webui
open-webui serveThen open http://localhost:8080 in a browser.
Option 3: Docker with bundled Ollama (single-container deployment)
For environments where a single container is preferred:
docker run -d -p 3000:8080 \
--gpus=all \
-v ollama:/root/.ollama \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:ollamaThe command line is efficient for IT staff and power users. For most analysts:
- A chat interface is familiar and intuitive -- it looks like the ChatGPT and messaging tools they already use
- Conversation history allows revisiting previous analyses
- Document upload simplifies the process of feeding data to the model
- Lower barrier to adoption means faster organizational value
Deploying Open WebUI alongside Ollama is strongly recommended for any multi-user deployment or any deployment where non-technical users will interact with the system.
Open WebUI stores conversation history in a local SQLite database. For deployments that will eventually process CJI, this database becomes a CJI-containing asset and must be protected accordingly. See CJIS Compliance, specifically Gap 5 (Encryption at Rest) and Gap 8 (Media Protection), for guidance.
After completing the quickstart, verify the deployment with this checklist:
| Check | Command | Expected Result |
|---|---|---|
| Ollama is running | ollama --version |
Version number displayed |
| Model is installed | ollama list |
Lists llama3.1:8b (and any other pulled models) |
| API is accessible | curl http://localhost:11434/api/tags |
JSON response with model list |
| Model generates output | ollama run llama3.1:8b "Hello" |
Model responds with text |
| GPU is being used | nvidia-smi |
Shows Ollama process using VRAM |
| Open WebUI loads (if installed) | Open http://localhost:3000 in browser |
Login page or chat interface appears |
If all checks pass, the system is ready for non-CJI workloads. For CJI workloads, complete the security hardening described in CJIS Compliance before processing any criminal justice information.
| Resource | URL |
|---|---|
| Ollama Official Site | https://ollama.com |
| Ollama GitHub Repository | https://github.com/ollama/ollama |
| Ollama Model Library | https://ollama.com/library |
| Ollama API Documentation | https://github.com/ollama/ollama/blob/main/docs/api.md |
| Open WebUI GitHub | https://github.com/open-webui/open-webui |
| Open WebUI Documentation | https://docs.openwebui.com |
- Ollama Documentation -- GitHub
- Open WebUI Documentation -- GitHub
- Ollama Model Library -- ollama.com/library
This document is part of the Secure and Affordable In-House AI companion resource. It is an educational resource, not official guidance. Consult your agency's CJIS Systems Officer (CSO) for compliance decisions.