Skip to content

Commit 2fbbb91

Browse files
authored
Update README.md
1 parent eda6b67 commit 2fbbb91

1 file changed

Lines changed: 2 additions & 6 deletions

File tree

README.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -51,17 +51,13 @@ On the inference side, Box integrates llama.cpp alongside the upstream LiteRT ru
5151
**The technical breakthrough:** Box doesn't just bundle both runtimes — it lets you choose **per-model** whether to run on CPU, GPU (via OpenCL/Vulkan), or NPU (via QNN delegate). No other Android app gives you this granular control.
5252

5353
## The Hybrid Architecture Most Developers Think Is Impossible
54-
User's GGUF file → llama.cpp → CPU/GPU
54+
User's GGUF file → llama.cpp → CPU/GPU/NPU
5555
Google's .litertlm → LiteRT → NPU (Qualcomm/MediaTek)
56-
57-
Same chat interface, same encrypted history
5856

5957

60-
Most developers assume you have to pick one inference engine. Box proves otherwise — and adds enterprise-grade security on top.
6158

62-
## Built By Someone Who Already Did The Hard Part
59+
Most developers assume you have to pick one inference engine. Box proves otherwise — and adds enterprise-grade security on top.
6360

64-
This isn't a theoretical project. I built **OfflineLLM** (pure llama.cpp app) first, then forked Google AI Edge Gallery to add llama.cpp support. The result: an app that inherits Google's polished UI and multimodal features (Ask Image, Audio Scribe, Agent Skills) while adding the open model flexibility that Google's curated allowlist prevents.
6561

6662
## For Security-Conscious Users Running Sensitive Conversations
6763

0 commit comments

Comments
 (0)