Loci is the most advanced privacy-first, programmable local AI inference engine in 2026. Built in Rust with ggml/llama.cpp at its core, Loci delivers industrial-grade performance, deep control, and full platform support.
Loci's philosophy: Make local AI truly controllable, programmable, and commercializable.
中文文档 | Documentation | API Reference | Plugin Market
- Logit-level intervention: Zero-copy direct modification of token probabilities
- Full callback chain: pre_process → transform_logits → post_process → on_token_generated
- Inference control flow: Suspend/Resume for native Agent tool calls
- Constrained sampling: Enforced JSON / Regex / Grammar structured output
- Paged Attention + Swap: Stable 128k+ context
- Radix Tree prefix caching: 5–10× speedup for shared system prompts
- Kernel fusion: 30% latency reduction
- Cutting-edge quantization: IQ2_XXS (16×), BitNet b1.58 (20×)
- Native plugins: Maximum performance dynamic libraries
- WASM plugins: Secure sandbox for third-party extensions
- Unified registry with digital signature verification
- Model encryption: AES-256-GCM with zeroized keys
- Multi-tenancy: Full resource isolation and quotas
- Cloud-native: Official Docker + Helm Chart
- Desktop: Windows / macOS / Linux
- Mobile: Android (NDK) / iOS (Metal)
- Embedded ready: ARM / RISC-V path
- Vision encoder (CLIP ViT-L/14)
- Image → embedding zero-copy injection
- Audio support reserved
| Model | Size | Quant | Load Time |
|---|---|---|---|
| Phi-3-mini | 3.8B | Q4_K_M | 92ms |
| Llama-3-8B | 8B | Q4_K_M | 185ms |
| Gemma-2-9B | 9B | Q5_K_M | 328ms |
| Hardware | Tokens/s |
|---|---|
| Apple M3 Pro | 58.3 |
| NVIDIA RTX 4090 | 112.7 |
| AMD RX 7900 XTX | 89.4 |
Full report: PERFORMANCE_WHITEPAPER.md
cargo install loci
loci serve --model models/llama-3-8b-q4_k_m.gguf --port 8080OpenAI-compatible API ready at http://localhost:8080/v1
Visit: https://plugins.loci.dev
Supports Native + WASM plugins with one-click installation and signature verification.
docker run -p 8080:8080 ghcr.io/decade-afk/loci:latestHelm chart available for Kubernetes deployment.
We welcome contributions! See:
MIT License - see LICENSE
Built with ❤️ by decade-afk and the Loci community | 2026