🦙 Nano-Llama

Local AI in your terminal. Fast, private, zero setup.

A polished TUI chat application powered by llama.cpp.
Runs 100% locally — no internet, no telemetry, no cloud.

Main UI (it may change in the future)

✨ Features

🚀 Instant startup — TUI ready in under 200ms
🔒 Fully offline — zero network requests, zero telemetry
🎯 Zero config — auto-detects hardware, picks optimal settings
💬 Streaming responses — smooth token-by-token output
🧠 Smart context — automatic sliding-window management
🎨 Beautiful TUI — colors, spinners, keyboard-driven interface
🔄 Hot-swap models — switch models without restarting
⚡ GPU accelerated — CUDA and Metal support out of the box
🧭 Startup launcher — centered prompt/search with device + model info
⚙️ Floating settings menu — edit model path, llama.cpp path, font size, and generation parameters

Screenshots

Fun little conversation with a small model I tried running locally on my laptop
just zoom it a bit :)

Quick Start

# 1. Place a GGUF model
mkdir -p runtime/models
cp your-model.gguf runtime/models/

# 2. Place llama-server in runtime/llama.cpp/ (next to the binary)

# 3. Run
./openllama

⌨️ Keyboard Shortcuts

Key	Action
`Enter`	Send message
`Shift+Enter`	New line in input
`Tab`	Open settings menu
`Esc`	Open settings menu / close overlay / cancel stream
`Ctrl+N`	New chat
`Ctrl+M`	Switch model
`Ctrl+T`	Switch template
`Ctrl+Q`	Quit

Startup + Settings Flow

On launch, OpenLlama shows a centered startup screen with:
- available RAM
- GPU status
- currently selected model
- a centered prompt/search input
Press Enter on the startup prompt to begin chat mode (input moves to bottom panel).
Press Tab (or Esc when not streaming) to open the floating settings menu.
In settings, you can edit:
- llama.cpp install path
- models directory path
- font size (terminal-friendly scale)
- generation params (temperature, top_p, top_k, repeat_penalty, max_tokens)
- selected model

Build from Source

git clone https://github.com/VectorBlue-06/open-llama.git
cd open-llama
make build

Documentation

For complete documentation including configuration, templates, architecture, and troubleshooting:

→ Full Documentation

Quick links:

System Requirements

Resource	Minimum	Recommended
RAM	4 GB	16 GB+
CPU	4 cores	8+ cores
GPU	Optional	NVIDIA CUDA / Apple Metal

Privacy

Zero telemetry — no data ever leaves your machine
No analytics — no usage tracking of any kind
Localhost only — server never exposed to network
No cloud — everything runs locally

License

MIT — Use it however you want.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets		assets
cmd/openllama		cmd/openllama
configs		configs
docs		docs
internal		internal
scripts		scripts
.gitignore		.gitignore
DOCUMENTATION.md		DOCUMENTATION.md
LICENSE		LICENSE
Makefile		Makefile
PLAN.md		PLAN.md
README.md		README.md
go.mod		go.mod
go.sum		go.sum
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦙 Nano-Llama

Main UI (it may change in the future)

✨ Features

Screenshots

Quick Start

⌨️ Keyboard Shortcuts

Startup + Settings Flow

Build from Source

Documentation

System Requirements

Privacy

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🦙 Nano-Llama

Main UI (it may change in the future)

✨ Features

Screenshots

Quick Start

⌨️ Keyboard Shortcuts

Startup + Settings Flow

Build from Source

Documentation

System Requirements

Privacy

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages