Offeline is a privacy-first desktop & web app for running LLMs locally on your hardware using multiple backend options
demo_vid.mp4
latest_offeline.mp4
- Complete Privacy: All AI models run locally on your hardware. No data is sent to external servers. Process everything on your machine.
- Multi-Platform: Use via browser (web app) or native desktop application with Electron
- Offline-Capable: Download models once, use them offline indefinitely (WebGPU mode)
- Multiple AI Backends: Choose your preferred inference engine:
- WebGPU - Run models directly in your browser using GPU acceleration
- Ollama - Manage and run models with Ollama backend
- llama.cpp - CPU/GPU optimized inference on desktop
- Rich Chat Interface: Clean, intuitive conversation interface with real-time streaming responses
- File Embeddings: Load and ask questions about documents (PDF, MD, DOCX, TXT, CSV, RTF) - fully locally!
- Voice Support: Interact with the AI using voice messages
- Regenerate Responses: Quickly regenerate AI responses without retyping prompts
- Chat History: Persistent, organized conversation history across sessions
- Export Conversations: Save your chats as JSON or Markdown
- RAG capabilities: Use retrieval augmented generation to chat with your documents
- Knowledge base: A smarter way to manage your documents
- Custom Memory/Instructions: Add custom system prompts and memory to personalize AI behavior
- Web Search Integration: Optional real-time web search capabilities with Tavily or DuckDuckGo (when enabled)
- Light & Dark Mode: Toggle between themes for comfortable usage
- Markdown & Code Syntax Highlighting: Beautifully rendered markdown and syntax-highlighted code blocks
- Model Selection: Easily switch between different open-source models
- Llama 2 & 3 - Meta's popular language models
- Gemma - Google's efficient models
- Mistral - Mistral AI's powerful models
- And more - Support for any GGUF-compatible models
| Backend | Platform | Type | Notes |
|---|---|---|---|
| WebGPU | Browser | GPU | Native browser acceleration, no installation needed |
| Ollama | Desktop (Windows/Mac/Linux) | CPU/GPU | Easy model management, requires Ollama installation |
| llama.cpp | Desktop (Windows/Mac/Linux) | CPU/GPU | Direct integration, optimized performance |
- Node.js 18+
- npm or pnpm package manager
- Modern browser with WebGPU support (Chrome/Edge/Firefox with flags)
-
Clone the repository:
git clone https://github.com/iBz-04/offeline cd offeline -
Install dependencies:
npm install # or pnpm install -
Run development server:
npm run dev # or pnpm dev -
Open in browser:
Navigate to http://localhost:3000
npm run build
npm run start- Node.js 18+
- pnpm (recommended)
- For Ollama backend: Ollama installed and running
- For llama.cpp backend: Models in GGUF format
-
Install desktop dependencies:
cd desktop pnpm install -
Development Mode:
pnpm electron:dev
-
Production Build:
pnpm electron:prod
- Native application experience
- Seamless Ollama integration (auto-start/stop)
- Direct llama.cpp support
- System tray integration
- Model management UI
- GPU: GPU with WebGPU support
- 3B models: ~3GB VRAM
- 7B models: ~6GB VRAM
- Larger models: Proportionally more VRAM
- Browser: Chrome/Edge 113+, Firefox with WebGPU enabled
- RAM: 8GB+ recommended for 7B models
- CPU: Multi-core processor recommended
- GPU: Optional but recommended for faster inference
- NVIDIA (CUDA support)
- Apple Silicon (Metal support)
- AMD (Vulkan support)
Tip: Smaller models (3B) are more efficient and suitable for file embeddings on resource-constrained systems.
offeline/
├── src/ # Next.js frontend
│ ├── app/ # Next.js app directory
│ ├── components/ # React components
│ ├── hooks/ # Custom React hooks
│ ├── lib/ # Utilities (search, embedding, tools)
│ ├── providers/ # Context providers
│ └── types/ # TypeScript definitions
├── desktop/ # Electron desktop app
│ ├── main/ # Main process
│ └── preload/ # Preload scripts
└── package.json # Root dependencies
Frontend:
- Next.js 14 - React framework
- TypeScript - Type safety
- Tailwind CSS - Styling
- Radix UI - Component library
- Framer Motion - Animations
Backends:
- @mlc-ai/web-llm - WebGPU inference
- Ollama - Model server
- node-llama-cpp - Direct llama.cpp binding
Utilities:
- LangChain - AI/LLM utilities
- Transformers.js - ONNX model support
- react-markdown - Markdown rendering
- Zustand - State management
# Web development
npm run dev
# Desktop development
cd desktop && pnpm electron:dev
# Full build
npm run build- Web application with WebGPU support
- Desktop application (Electron)
- Ollama integration
- llama.cpp integration
- File embeddings (PDF, DOCX, TXT, etc.)
- Web search integration (Tavily, DuckDuckGo)
- Chat history & export
- Custom memory/instructions
- Voice message support
- Enhanced model management
- Performance optimizations
- Additional search backends
- Advanced RAG (Retrieval-Augmented Generation)
- Plugin system
- Cloud sync (optional, privacy-preserving)
- API for third-party integrations
| Browser | WebGPU Support | Status |
|---|---|---|
| Chrome/Edge | 113+ | Full support |
| Firefox | 120+ | Full support (enable dom.webgpu.enabled) |
| Safari | 17+ (macOS) | Full support |
Check WebGPU browser compatibility for detailed information.
This project is licensed under the MIT License - see the LICENSE file for details.
Offeline is built with:
- HuggingFace - Model hub
- LangChain - LLM framework
- Next.js - React framework
- Electron - Desktop framework
- Ollama - Model server
- node-llama-cpp - llama.cpp Node.js binding
- Open-source LLM community
Contributions are welcome! Feel free to open issues and pull requests.
Below are common commands for Windows PowerShell; pnpm is recommended (pnpm-lock.yaml is included).
# 1) Clone and enter the project
git clone https://github.com/iBz-04/offeline; cd offeline
# 2) Install dependencies (root web app)
pnpm install
# 3) Start the web app (Next.js)
pnpm dev
# 4) IN ANOTHER terminal Start the second web instance
pnpm dev -- -p 3001
# 4) In another terminal: run the desktop app
cd desktop; pnpm install; pnpm build; pnpm electron:devNotes:
- WebGPU works best on recent Chrome/Edge on Windows 10/11. If disabled, ensure your GPU drivers are up to date and that Chrome/Edge are current (113+).
- Ollama and llama.cpp backends are available in the desktop app. Install Ollama if you want to use Ollama models.
- Web search backends: You can choose between Tavily (requires API key) and DuckDuckGo (no key). In the UI, open Search settings and paste your Tavily key. Alternatively, you can set the environment variable
NEXT_PUBLIC_TAVILY_API_KEYbefore starting the web app. - File embeddings: For best performance on low-spec machines, prefer smaller models (e.g., 3B) for embedding and chat.
- Desktop llama.cpp: The Electron app uses
node-llama-cppunder the hood. Use GGUF models. GPU acceleration depends on your platform and build.
-
WebGPU not available
- Update Chrome/Edge to the latest version
- Update graphics drivers
- Check chrome://gpu to confirm WebGPU status
-
Tavily key errors
- Make sure you’ve saved the key in the Search settings UI
- Or set
NEXT_PUBLIC_TAVILY_API_KEYin your environment - The app will fall back to DuckDuckGo if Tavily isn’t configured
-
Desktop app doesn’t start
- Run from the
desktopfolder:pnpm installthenpnpm electron:dev - Ensure Node.js 18+ is installed
- On first run,
electron-buildermay install native deps (let it finish)
- Run from the
