Ollama Quickstart -- Your First 30 Minutes

Summary: Ollama is a free, open-source local LLM runtime that turns a standard workstation into an AI assistant. It installs with a single command, runs models locally with no internet dependency after setup, and exposes an OpenAI-compatible API for tool integration. This guide walks through installation, first model pull, first prompt, and setting up a browser-based chat interface -- all within the first 30 minutes.

What Is Ollama
First 30 Minutes Quickstart
Why Ollama Over Alternatives
GUI Options for Non-CLI Users
Post-Setup Verification
Key Links
References

What Is Ollama

Ollama is an open-source local LLM runtime released under the MIT License. It provides a single binary that handles model management, inference, and API serving -- no complex configuration, no Python environments, no dependency management.

Key Characteristics

Feature	Detail
License	MIT (fully open source, no usage restrictions)
Cost	Free -- no subscription, no per-query charges, no vendor lock-in
Installation	Single command on Linux/macOS; installer on Windows
Model library	200+ models available through `ollama pull`
API	OpenAI-compatible REST API on `localhost:11434`
GPU support	NVIDIA CUDA (primary), AMD ROCm (experimental), Apple Metal (macOS)
Internet required	Only for initial model download; fully offline after setup

What Ollama Does

Downloads and manages models -- A single ollama pull command downloads a model and handles all file management
Runs inference locally -- Prompts and responses never leave the machine
Serves an API -- Any tool that speaks the OpenAI API format can connect to Ollama
Manages GPU memory -- Automatically loads and unloads models based on available VRAM

What Ollama Does NOT Do

No built-in authentication -- Anyone with network access to port 11434 can query the model (see CJIS Compliance for security hardening)
No audit logging -- Prompts and responses are not logged by default
No user interface -- Ollama is a command-line and API tool; a separate interface is needed for chat-style interaction
No encryption -- API traffic is HTTP by default, not HTTPS

These gaps are addressable with additional tooling. See CJIS Compliance for the complete security hardening framework.

First 30 Minutes Quickstart

Prerequisites

A workstation meeting the specifications in the Hardware Guide (minimum: 16GB VRAM GPU recommended)
Administrator/root access for installation
Internet connectivity for initial setup (models are typically 4-8 GB each)
Approximately 20 GB of free disk space for Ollama and initial models

Step 1: Install Ollama (5 minutes)

Linux (recommended for server/workstation deployment):

curl -fsSL https://ollama.com/install.sh | sh

This downloads and installs the Ollama binary, creates a systemd service, and starts Ollama automatically.

macOS:

Download the installer from ollama.com/download or install via Homebrew:

brew install ollama

Windows:

Download the installer from ollama.com/download and run it. The installer creates a system service that starts automatically.

Verify the installation:

ollama --version

Expected output: a version number (e.g., ollama version 0.17.1 or newer). If this command returns a version, installation succeeded.

Security advisory — CVE-2026-7482 ("Bleeding Llama"), CVSS 9.1. Ollama versions before 0.17.1 contain a heap out-of-bounds read in the GGUF model loader. An unauthenticated remote attacker can submit a crafted model through /api/create and leak Ollama server process memory — including environment variables, API keys, system prompts, and concurrent users' conversation data — then exfiltrate it via /api/push. Upgrade to 0.17.1 or later. Verify with ollama --version after install. Keep OLLAMA_HOST bound to 127.0.0.1 (the default) and never expose port 11434 to untrusted networks — both /api/create and /api/push are unauthenticated in the upstream distribution. See also Security Considerations — Software Vulnerability Management.

Step 2: Pull Your First Model (5-10 minutes)

Download the Llama 3.1 8B model -- a strong general-purpose model that fits comfortably on a 16GB VRAM GPU:

ollama pull llama3.1:8b

This downloads approximately 4.7 GB of model data. Download time depends on internet connection speed.

What to expect:

pulling manifest
pulling 8eeb52dfb3bb... 100% ▕████████████████████████▏ 4.7 GB
pulling 948af2743fc7... 100% ▕████████████████████████▏ 1.5 KB
pulling 0ba8f0e314b4... 100% ▕████████████████████████▏  12 KB
pulling 56bb8bd477a5... 100% ▕████████████████████████▏   96 B
pulling 1a4c3c319823... 100% ▕████████████████████████▏  485 B
verifying sha256 digest
writing manifest
success

Step 3: Run Your First Prompt (2 minutes)

Test the model with a simple prompt:

ollama run llama3.1:8b "Summarize the key points of the Fourth Amendment in plain language."

The model should generate a clear, readable summary of Fourth Amendment protections. This output is generated entirely on the local machine -- no data was sent to any external server.

Try a more practical example:

ollama run llama3.1:8b "Draft a 3-sentence summary of this incident: On January 15, officers responded to a report of a burglary at 123 Main Street. The complainant stated they left for work at 0700 and returned at 1730 to find the rear door forced open. A television, laptop, and jewelry were reported missing. No witnesses have been identified."

Step 4: Verify It Works (2 minutes)

Run these verification checks:

Check that the model is available:

ollama list

Expected output: a table showing llama3.1:8b with its size and modification date.

Check that the API is responding:

curl http://localhost:11434/api/tags

Expected output: a JSON response listing available models. If this returns data, the Ollama API is running and accessible.

Check GPU utilization (NVIDIA):

nvidia-smi

When a model is loaded, this should show Ollama using GPU memory. If the model is loaded in CPU mode instead, check that NVIDIA drivers and CUDA are properly installed.

Step 5: Pull Additional Models (5-10 minutes)

Pull one or two additional models for comparison. See Model Selection for detailed recommendations.

# Strong general-purpose alternative
ollama pull mistral:7b

# Strong reasoning model
ollama pull deepseek-r1:8b

# Lightweight model for fast, simple tasks
ollama pull phi3:mini

At this point, the command-line setup is complete. The system can process prompts entirely offline.

Why Ollama Over Alternatives

Several open-source tools can run LLMs locally. Ollama is recommended as the starting point for law enforcement deployments based on its combination of simplicity, breadth, and community support.

Comparison Table

Tool	Type	Ease of Setup	Model Library	API Support	Best For
Ollama	Runtime + model manager	Single command install	200+ models via `ollama pull`	OpenAI-compatible	First deployment; broadest compatibility
LM Studio	Desktop GUI application	Download and run	Hugging Face model browser	OpenAI-compatible	Users who want a desktop application
llama.cpp	Low-level inference engine	Requires compilation	Manual model download	Custom API	Maximum performance tuning; advanced users
vLLM	High-throughput server	Python environment setup	Manual model loading	OpenAI-compatible	Multi-user server deployments at scale
LocalAI	API-first runtime	Docker deployment	Multiple format support	OpenAI-compatible	Drop-in replacement for OpenAI API

Why Ollama Is the Recommended Starting Point

Simplest on-ramp -- One install command, one pull command, one run command. No Python environments, no Docker, no compilation.
Broadest model library -- The ollama.com/library provides pre-quantized models ready to pull. No need to navigate Hugging Face, convert model formats, or manage quantization.
Active community -- Large user base means more troubleshooting resources, more tutorials, and faster issue resolution.
API compatibility -- The OpenAI-compatible API means tools built for ChatGPT's API can often connect to Ollama with a URL change.
Cross-platform -- Works on Linux, macOS, and Windows with the same commands.

Important Context

No open-source LLM runtime ships with CJIS compliance built in. The compliance question is not "which tool is compliant" but "which tool is easiest to harden." Ollama's simple architecture -- a single binary serving a REST API -- is straightforward to place behind a reverse proxy with authentication, TLS, and logging. See CJIS Compliance for the complete hardening framework.

GUI Options for Non-CLI Users

Most analysts prefer a familiar chat interface over a command-line terminal. Open WebUI provides a ChatGPT-like browser interface that connects directly to Ollama.

Open WebUI

What it is: A self-hosted web interface for Ollama that looks and feels like ChatGPT. It runs as a local web application and connects to the Ollama API.

Key features:

Feature	Detail
Interface	Browser-based chat; supports conversations, history, and multiple models
Document upload	Upload files directly into conversations for analysis
Model switching	Switch between installed models within the same interface
Conversation history	Saves chat history locally (stored in a local database)
Multi-user support	Supports multiple user accounts with separate conversation histories
RAG integration	Built-in document ingestion for retrieval-augmented generation
License	MIT (open source)

Setting Up Open WebUI

Option 1: Docker (recommended)

If Docker is installed:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host.docker.internal \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000 in a browser. Create an admin account on first launch.

Option 2: pip install (if Docker is unavailable)

pip install open-webui
open-webui serve

Then open http://localhost:8080 in a browser.

Option 3: Docker with bundled Ollama (single-container deployment)

For environments where a single container is preferred:

docker run -d -p 3000:8080 \
  --gpus=all \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:ollama

Why a GUI Matters

The command line is efficient for IT staff and power users. For most analysts:

A chat interface is familiar and intuitive -- it looks like the ChatGPT and messaging tools they already use
Conversation history allows revisiting previous analyses
Document upload simplifies the process of feeding data to the model
Lower barrier to adoption means faster organizational value

Deploying Open WebUI alongside Ollama is strongly recommended for any multi-user deployment or any deployment where non-technical users will interact with the system.

Security Note

Open WebUI stores conversation history in a local SQLite database. For deployments that will eventually process CJI, this database becomes a CJI-containing asset and must be protected accordingly. See CJIS Compliance, specifically Gap 5 (Encryption at Rest) and Gap 8 (Media Protection), for guidance.

Post-Setup Verification

After completing the quickstart, verify the deployment with this checklist:

Check	Command	Expected Result
Ollama is running	`ollama --version`	Version number displayed
Model is installed	`ollama list`	Lists `llama3.1:8b` (and any other pulled models)
API is accessible	`curl http://localhost:11434/api/tags`	JSON response with model list
Model generates output	`ollama run llama3.1:8b "Hello"`	Model responds with text
GPU is being used	`nvidia-smi`	Shows Ollama process using VRAM
Open WebUI loads (if installed)	Open `http://localhost:3000` in browser	Login page or chat interface appears

If all checks pass, the system is ready for non-CJI workloads. For CJI workloads, complete the security hardening described in CJIS Compliance before processing any criminal justice information.

Key Links

Resource	URL
Ollama Official Site	https://ollama.com
Ollama GitHub Repository	https://github.com/ollama/ollama
Ollama Model Library	https://ollama.com/library
Ollama API Documentation	https://github.com/ollama/ollama/blob/main/docs/api.md
Open WebUI GitHub	https://github.com/open-webui/open-webui
Open WebUI Documentation	https://docs.openwebui.com

References

Ollama Documentation -- GitHub
Open WebUI Documentation -- GitHub
Ollama Model Library -- ollama.com/library

This document is part of the Secure and Affordable In-House AI companion resource. It is an educational resource, not official guidance. Consult your agency's CJIS Systems Officer (CSO) for compliance decisions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama Quickstart -- Your First 30 Minutes

Table of Contents

What Is Ollama

Key Characteristics

What Ollama Does

What Ollama Does NOT Do

First 30 Minutes Quickstart

Prerequisites

Step 1: Install Ollama (5 minutes)

Step 2: Pull Your First Model (5-10 minutes)

Step 3: Run Your First Prompt (2 minutes)

Step 4: Verify It Works (2 minutes)

Step 5: Pull Additional Models (5-10 minutes)

Why Ollama Over Alternatives

Comparison Table

Why Ollama Is the Recommended Starting Point

Important Context

GUI Options for Non-CLI Users

Open WebUI

Setting Up Open WebUI

Why a GUI Matters

Security Note

Post-Setup Verification

Key Links

References

FilesExpand file tree

03-ollama-quickstart.md

Latest commit

History

03-ollama-quickstart.md

File metadata and controls

Ollama Quickstart -- Your First 30 Minutes

Table of Contents

What Is Ollama

Key Characteristics

What Ollama Does

What Ollama Does NOT Do

First 30 Minutes Quickstart

Prerequisites

Step 1: Install Ollama (5 minutes)

Step 2: Pull Your First Model (5-10 minutes)

Step 3: Run Your First Prompt (2 minutes)

Step 4: Verify It Works (2 minutes)

Step 5: Pull Additional Models (5-10 minutes)

Why Ollama Over Alternatives

Comparison Table

Why Ollama Is the Recommended Starting Point

Important Context

GUI Options for Non-CLI Users

Open WebUI

Setting Up Open WebUI

Why a GUI Matters

Security Note

Post-Setup Verification

Key Links

References