ohhhllama

Bandwidth-friendly Ollama proxy with HuggingFace integration and download queuing.

Queue model downloads for off-peak hours. Supports both Ollama library models and HuggingFace GGUF models with automatic conversion.

Features

🕐 Scheduled Downloads - Queue models for off-peak download (default: 10 PM)
🤗 HuggingFace Integration - Download GGUF models directly from HuggingFace
🔄 Auto-Conversion - Automatically converts HuggingFace models to Ollama format
📊 Interactive CLI - User-friendly menu for managing models
🔒 Rate Limiting - Prevent abuse with per-IP daily limits
💾 Disk Monitoring - Automatic disk space checks before downloads
🔌 Transparent Proxy - Drop-in replacement for Ollama API

Quick Start

Installation

git clone https://github.com/wildwasser/ohhhllama.git
cd ohhhllama
sudo ./install.sh

The installer will:

Install Docker (if not present)
Set up Ollama in a Docker container
Install the ohhhllama proxy service
Set up the download queue timer
Install the HuggingFace integration module

Usage

Interactive Menu

ohhhllama

This opens an interactive menu where you can:

View system status
Queue Ollama models
Queue HuggingFace models
View/manage the download queue
List and remove installed models
View logs

Quick Status

ohhhllama --status

Queue Models via API

Ollama models:

curl http://localhost:11434/api/pull -d '{"name": "llama3:8b"}'

HuggingFace models:

curl http://localhost:11434/api/hf/queue -d '{"repo_id": "TheBloke/Mistral-7B-v0.1-GGUF"}'

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Client/App    │────▶│ ohhhllama Proxy │────▶│ Ollama (Docker) │
│  (port 11434)   │     │   (port 11434)  │     │   (port 11435)  │
└─────────────────┘     └────────┬────────┘     └─────────────────┘
                                 │
                                 ▼
                        ┌─────────────────┐
                        │  SQLite Queue   │
                        │   Database      │
                        └────────┬────────┘
                                 │
                                 ▼ (scheduled)
                        ┌─────────────────┐
                        │ Queue Processor │
                        │  (systemd timer)│
                        └─────────────────┘

Configuration

Configuration file: /opt/ohhhllama/ohhhllama.conf

# Ollama backend URL (internal)
OLLAMA_BACKEND=http://127.0.0.1:11435

# Proxy listen port
LISTEN_PORT=11434

# Queue database path
DB_PATH=/var/lib/ohhhllama/queue.db

# Rate limit (requests per IP per day)
RATE_LIMIT=5

# Disk monitoring
DISK_PATH=/data/ollama
DISK_THRESHOLD=90

# HuggingFace settings
HF_CACHE_DIR=/data/huggingface

API Reference

Standard Ollama Endpoints

All standard Ollama API endpoints are proxied transparently:

GET /api/tags - List models
POST /api/generate - Generate text
POST /api/chat - Chat completion
POST /api/pull - Pull model (queued for off-peak)
DELETE /api/delete - Delete model

ohhhllama Extensions

Queue Status

GET /api/queue

Returns queue status and pending downloads.

Health Check

GET /api/health

Returns system health including disk space and service status.

Queue HuggingFace Model

POST /api/hf/queue
Content-Type: application/json

{
  "repo_id": "TheBloke/Llama-2-7B-GGUF",
  "quant": "Q4_K_M",      # Optional, default: Q4_K_M
  "name": "my-llama"      # Optional, custom Ollama model name
}

HuggingFace Integration

Supported Sources

GGUF Repositories (recommended)
- Pre-quantized models ready for Ollama
- Providers: TheBloke, bartowski, QuantFactory, mradermacher
- Example: TheBloke/Mistral-7B-v0.1-GGUF
Standard HuggingFace Models
- Automatically converted to GGUF
- Requires supported architecture

Supported Architectures

Models with these architectures can be converted:

LlamaForCausalLM (Llama, Llama 2, Llama 3)
MistralForCausalLM, MixtralForCausalLM
Qwen2ForCausalLM
PhiForCausalLM, Phi3ForCausalLM
GemmaForCausalLM, Gemma2ForCausalLM
FalconForCausalLM
GPT2LMHeadModel, GPTNeoXForCausalLM
StableLmForCausalLM
OlmoForCausalLM

Quantization Options

Type	Bits	Quality	Size	Use Case
Q8_0	8	Best	Large	Maximum quality
Q5_K_M	5.5	Better	Medium	Quality-focused
Q4_K_M	4.5	Good	Small	Recommended default
Q3_K_M	3.4	Lower	Smaller	Memory constrained

Directory Structure

/opt/ohhhllama/
├── proxy.py                 # Main proxy server
├── ohhhllama.conf           # Configuration
├── scripts/
│   └── process-queue.sh     # Queue processor
├── huggingface/
│   ├── hf_backend.py        # HuggingFace module
│   ├── requirements.txt
│   └── .venv/               # Python environment
└── ...

/data/
├── ollama/                  # Ollama model storage
│   ├── models/
│   └── modelfiles/
└── huggingface/             # HuggingFace cache
    └── gguf/                # Downloaded GGUF files

/var/lib/ohhhllama/
└── queue.db                 # SQLite queue database

Service Management

# Proxy service
sudo systemctl status ollama-proxy
sudo systemctl restart ollama-proxy
sudo journalctl -u ollama-proxy -f

# Queue timer
sudo systemctl list-timers ollama-queue.timer
sudo systemctl start ollama-queue.service  # Process now

# Queue processor logs
sudo journalctl -u ollama-queue.service -n 50

Scheduled Downloads

By default, queued downloads run at 10 PM daily. To change:

sudo nano /etc/systemd/system/ollama-queue.timer
sudo systemctl daemon-reload
sudo systemctl restart ollama-queue.timer

Timer format uses systemd calendar syntax:

OnCalendar=*-*-* 22:00:00 - Daily at 10 PM
OnCalendar=*-*-* 03:00:00 - Daily at 3 AM

Troubleshooting

Models not downloading

Check queue status: ohhhllama → View queue
Check logs: sudo journalctl -u ollama-queue.service -n 50
Process manually: sudo systemctl start ollama-queue.service

HuggingFace downloads failing

Verify venv exists: ls /opt/ohhhllama/huggingface/.venv
Check disk space: df -h /data

Test manually:

/opt/ohhhllama/huggingface/.venv/bin/python3 \
  /opt/ohhhllama/huggingface/hf_backend.py \
  TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF

Proxy not responding

Check service: sudo systemctl status ollama-proxy
Check Ollama container: sudo docker ps | grep ollama
Restart: sudo systemctl restart ollama-proxy

Uninstallation

cd /path/to/ohhhllama
sudo ./uninstall.sh

License

MIT License - see LICENSE

Contributing

Contributions welcome! Please open an issue or PR on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
config		config
docs		docs
huggingface		huggingface
proxy		proxy
scripts		scripts
systemd		systemd
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
install.sh		install.sh
uninstall.sh		uninstall.sh

Folders and files

Latest commit

History

Repository files navigation

ohhhllama

Features

Quick Start

Installation

Usage

Interactive Menu

Quick Status

Queue Models via API

Architecture

Configuration

API Reference

Standard Ollama Endpoints

ohhhllama Extensions

Queue Status

Health Check

Queue HuggingFace Model

HuggingFace Integration

Supported Sources

Supported Architectures

Quantization Options

Directory Structure

Service Management

Scheduled Downloads

Troubleshooting

Models not downloading

HuggingFace downloads failing

Proxy not responding

Uninstallation

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages