|
4 | 4 | [](LICENSE) |
5 | 5 | [](https://github.com/google-ai-edge/gallery) |
6 | 6 |
|
7 | | -**A security-hardened fork of [Google AI Edge Gallery](https://github.com/google-ai-edge/gallery) — with biometric lock, encrypted chat history, llama.cpp support, and GGUF model import.** |
| 7 | +**A security-hardened fork of [Google AI Edge Gallery](https://github.com/google-ai-edge/gallery) — with Unique Hybrid Features,biometric lock, encrypted chat history, llama.cpp support, and GGUF model import.** |
8 | 8 |
|
9 | 9 | ## Disclaimer |
10 | 10 |
|
@@ -33,6 +33,46 @@ Box is an Android app for running large language models entirely on-device. It i |
33 | 33 |
|
34 | 34 | On the inference side, Box integrates llama.cpp alongside the upstream LiteRT runtime. This lets you sideload any GGUF model file and choose between CPU, GPU, or NPU acceleration per model — so you are not limited to the curated model list. |
35 | 35 |
|
| 36 | +# 🔒 Box: The Only Android App That Fuses Google's LiteRT with llama.cpp + Biometric Security |
| 37 | + |
| 38 | +**What makes Box unique?** While other on-device LLM apps force you to choose between Google's optimized LiteRT ecosystem (limited model selection) or the open GGUF ecosystem (limited hardware acceleration), **Box runs both side-by-side** — letting you import any GGUF model while keeping LiteRT's NPU acceleration for compatible models. |
| 39 | + |
| 40 | +## Why This Matters (And Why No One Else Does It) |
| 41 | + |
| 42 | +| Feature | Google AI Edge Gallery | llama.cpp-only apps (OfflineLLM) | **Box (This Project)** | |
| 43 | +|---------|----------------------|--------------------------------|------------------------| |
| 44 | +| LiteRT + NPU acceleration | ✅ | ❌ | ✅ | |
| 45 | +| Import any GGUF model | ❌ | ✅ | ✅ | |
| 46 | +| Encrypted chat history | ❌ | ❌ | ✅ | |
| 47 | +| Biometric app lock | ❌ | ❌ | ✅ | |
| 48 | +| Per-model accelerator choice | ❌ | ❌ | ✅ | |
| 49 | +| Hard offline mode (airgap) | ❌ | ❌ | ✅ | |
| 50 | + |
| 51 | +**The technical breakthrough:** Box doesn't just bundle both runtimes — it lets you choose **per-model** whether to run on CPU, GPU (via OpenCL/Vulkan), or NPU (via QNN delegate). No other Android app gives you this granular control. |
| 52 | + |
| 53 | +## The Hybrid Architecture Most Developers Think Is Impossible |
| 54 | +User's GGUF file → llama.cpp → CPU/GPU |
| 55 | +Google's .litertlm → LiteRT → NPU (Qualcomm/MediaTek) |
| 56 | +↓ |
| 57 | +Same chat interface, same encrypted history |
| 58 | + |
| 59 | + |
| 60 | +Most developers assume you have to pick one inference engine. Box proves otherwise — and adds enterprise-grade security on top. |
| 61 | + |
| 62 | +## Built By Someone Who Already Did The Hard Part |
| 63 | + |
| 64 | +This isn't a theoretical project. I built **OfflineLLM** (pure llama.cpp app) first, then forked Google AI Edge Gallery to add llama.cpp support. The result: an app that inherits Google's polished UI and multimodal features (Ask Image, Audio Scribe, Agent Skills) while adding the open model flexibility that Google's curated allowlist prevents. |
| 65 | + |
| 66 | +## For Security-Conscious Users Running Sensitive Conversations |
| 67 | + |
| 68 | +- SQLCipher AES-256 encrypted Room database |
| 69 | +- Biometric re-authentication on every foreground |
| 70 | +- Hard offline switch — blocks all network traffic |
| 71 | +- Input sanitization before inference AND persistence |
| 72 | +- On-device security audit log |
| 73 | + |
| 74 | +**Bottom line:** If you want to run Qwen3.6,Llama 3, Mistral, or any GGUF model *with* the option of NPU acceleration when available — while keeping your conversations encrypted and offline — Box is currently the only Android app that delivers all of that in one package. |
| 75 | + |
36 | 76 | --- |
37 | 77 |
|
38 | 78 | ## What's different from upstream |
|
0 commit comments