GitHub - iBz-04/offeline: run llms and slms on your hardware & browser

Offeline

Offeline is a privacy-first desktop & web app for running LLMs locally on your hardware using multiple backend options

Open source version

demo_vid.mp4

Production desktop demo

latest_offeline.mp4

Features

Core Capabilities

Complete Privacy: All AI models run locally on your hardware. No data is sent to external servers. Process everything on your machine.
Multi-Platform: Use via browser (web app) or native desktop application with Electron
Offline-Capable: Download models once, use them offline indefinitely (WebGPU mode)
Multiple AI Backends: Choose your preferred inference engine:
- WebGPU - Run models directly in your browser using GPU acceleration
- Ollama - Manage and run models with Ollama backend
- llama.cpp - CPU/GPU optimized inference on desktop

Chat & Interaction

Rich Chat Interface: Clean, intuitive conversation interface with real-time streaming responses
File Embeddings: Load and ask questions about documents (PDF, MD, DOCX, TXT, CSV, RTF) - fully locally!
Voice Support: Interact with the AI using voice messages
Regenerate Responses: Quickly regenerate AI responses without retyping prompts
Chat History: Persistent, organized conversation history across sessions
Export Conversations: Save your chats as JSON or Markdown
RAG capabilities: Use retrieval augmented generation to chat with your documents
Knowledge base: A smarter way to manage your documents

AI Customization

Custom Memory/Instructions: Add custom system prompts and memory to personalize AI behavior
Web Search Integration: Optional real-time web search capabilities with Tavily or DuckDuckGo (when enabled)
Light & Dark Mode: Toggle between themes for comfortable usage
Markdown & Code Syntax Highlighting: Beautifully rendered markdown and syntax-highlighted code blocks
Model Selection: Easily switch between different open-source models

Supported Models

Llama 2 & 3 - Meta's popular language models
Gemma - Google's efficient models
Mistral - Mistral AI's powerful models
And more - Support for any GGUF-compatible models

Supported Backends

Backend	Platform	Type	Notes
WebGPU	Browser	GPU	Native browser acceleration, no installation needed
Ollama	Desktop (Windows/Mac/Linux)	CPU/GPU	Easy model management, requires Ollama installation
llama.cpp	Desktop (Windows/Mac/Linux)	CPU/GPU	Direct integration, optimized performance

Installation

Web Application

Prerequisites

Node.js 18+
npm or pnpm package manager
Modern browser with WebGPU support (Chrome/Edge/Firefox with flags)

Setup

Clone the repository:

git clone https://github.com/iBz-04/offeline
cd offeline

Install dependencies:
```
npm install
# or
pnpm install
```
Run development server:
```
npm run dev
# or
pnpm dev
```
Open in browser:

Navigate to http://localhost:3000

Production Build

npm run build
npm run start

Desktop Application

Prerequisites

Node.js 18+
pnpm (recommended)
For Ollama backend: Ollama installed and running
For llama.cpp backend: Models in GGUF format

Setup

Install desktop dependencies:
```
cd desktop
pnpm install
```
Development Mode:
```
pnpm electron:dev
```
Production Build:
```
pnpm electron:prod
```

Desktop Features

Native application experience
Seamless Ollama integration (auto-start/stop)
Direct llama.cpp support
System tray integration
Model management UI

Requirements

Browser (WebGPU)

GPU: GPU with WebGPU support
- 3B models: ~3GB VRAM
- 7B models: ~6GB VRAM
- Larger models: Proportionally more VRAM
Browser: Chrome/Edge 113+, Firefox with WebGPU enabled

Desktop (Ollama/llama.cpp)

RAM: 8GB+ recommended for 7B models
CPU: Multi-core processor recommended
GPU: Optional but recommended for faster inference
- NVIDIA (CUDA support)
- Apple Silicon (Metal support)
- AMD (Vulkan support)

Tip: Smaller models (3B) are more efficient and suitable for file embeddings on resource-constrained systems.

Development

Project Structure

offeline/
├── src/                    # Next.js frontend
│   ├── app/               # Next.js app directory
│   ├── components/        # React components
│   ├── hooks/             # Custom React hooks
│   ├── lib/               # Utilities (search, embedding, tools)
│   ├── providers/         # Context providers
│   └── types/             # TypeScript definitions
├── desktop/               # Electron desktop app
│   ├── main/              # Main process
│   └── preload/           # Preload scripts
└── package.json          # Root dependencies

Tech Stack

Frontend:

Next.js 14 - React framework
TypeScript - Type safety
Tailwind CSS - Styling
Radix UI - Component library
Framer Motion - Animations

Backends:

@mlc-ai/web-llm - WebGPU inference
Ollama - Model server
node-llama-cpp - Direct llama.cpp binding

Utilities:

LangChain - AI/LLM utilities
Transformers.js - ONNX model support
react-markdown - Markdown rendering
Zustand - State management

Building

# Web development
npm run dev

# Desktop development
cd desktop && pnpm electron:dev

# Full build
npm run build

Roadmap

Completed

Web application with WebGPU support
Desktop application (Electron)
Ollama integration
llama.cpp integration
File embeddings (PDF, DOCX, TXT, etc.)
Web search integration (Tavily, DuckDuckGo)
Chat history & export
Custom memory/instructions
Voice message support

In Progress

Enhanced model management
Performance optimizations
Additional search backends

Future

Advanced RAG (Retrieval-Augmented Generation)
Plugin system
Cloud sync (optional, privacy-preserving)
API for third-party integrations

Browser Support

Browser	WebGPU Support	Status
Chrome/Edge	113+	Full support
Firefox	120+	Full support (enable `dom.webgpu.enabled`)
Safari	17+ (macOS)	Full support

Check WebGPU browser compatibility for detailed information.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Credits

Offeline is built with:

HuggingFace - Model hub
LangChain - LLM framework
Next.js - React framework
Electron - Desktop framework
Ollama - Model server
node-llama-cpp - llama.cpp Node.js binding
Open-source LLM community

Contributing

Contributions are welcome! Feel free to open issues and pull requests.

Quick Start (Windows)

Below are common commands for Windows PowerShell; pnpm is recommended (pnpm-lock.yaml is included).

# 1) Clone and enter the project
git clone https://github.com/iBz-04/offeline; cd offeline

# 2) Install dependencies (root web app)
pnpm install

# 3) Start the web app (Next.js)
pnpm dev

# 4) IN ANOTHER terminal Start the second web instance
pnpm dev -- -p 3001

# 4) In another terminal: run the desktop app 
cd desktop; pnpm install; pnpm build; pnpm electron:dev

Notes:

WebGPU works best on recent Chrome/Edge on Windows 10/11. If disabled, ensure your GPU drivers are up to date and that Chrome/Edge are current (113+).
Ollama and llama.cpp backends are available in the desktop app. Install Ollama if you want to use Ollama models.

Configuration & Tips

Web search backends: You can choose between Tavily (requires API key) and DuckDuckGo (no key). In the UI, open Search settings and paste your Tavily key. Alternatively, you can set the environment variable NEXT_PUBLIC_TAVILY_API_KEY before starting the web app.
File embeddings: For best performance on low-spec machines, prefer smaller models (e.g., 3B) for embedding and chat.
Desktop llama.cpp: The Electron app uses node-llama-cpp under the hood. Use GGUF models. GPU acceleration depends on your platform and build.

Troubleshooting

WebGPU not available
- Update Chrome/Edge to the latest version
- Update graphics drivers
- Check chrome://gpu to confirm WebGPU status
Tavily key errors
- Make sure you’ve saved the key in the Search settings UI
- Or set NEXT_PUBLIC_TAVILY_API_KEY in your environment
- The app will fall back to DuckDuckGo if Tavily isn’t configured
Desktop app doesn’t start
- Run from the desktop folder: pnpm install then pnpm electron:dev
- Ensure Node.js 18+ is installed
- On first run, electron-builder may install native deps (let it finish)

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github		.github
desktop		desktop
public		public
src		src
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
citation.cff		citation.cff
components.json		components.json
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

License

iBz-04/offeline

Folders and files

Latest commit

History

Repository files navigation

Offeline

Open source version

Production desktop demo

Table of Contents

Features

Core Capabilities

Chat & Interaction

AI Customization

Supported Models

Supported Backends

Installation

Web Application

Prerequisites

Setup

Production Build

Desktop Application

Prerequisites

Setup

Desktop Features

Requirements

Browser (WebGPU)

Desktop (Ollama/llama.cpp)

Development

Project Structure

Tech Stack

Building

Roadmap

Completed

In Progress

Future

Browser Support

License

Credits

Contributing

Quick Start (Windows)

Configuration & Tips

Troubleshooting

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages