You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* perf(android): reduce APK size with ABI splitting, R8, and lib exclusions
- Split release APK per ABI (arm64 + x64 only), dropping universal fat build
- Enable R8 minification and resource shrinking for release builds
- Exclude Vulkan debug validation layer from release packaging
- Exclude unused MediaPipe image generation libs from release packaging
* refactor(models): add HNSW vector index and app-owned RetrievalResult
Add @HnswIndex(dimensions: 384) to BiomarkerResult.embedding so
ObjectBox can run native nearest-neighbor search — eliminating the
separate koshika_vectors.db SQLite store that flutter_gemma required.
Introduce RetrievalResult as an app-owned type so the RAG pipeline
(ChatContextBuilder, CitationExtractor) no longer imports from
flutter_gemma. Add LlmModelConfig and LlmModelRegistry with 4 curated
public GGUF models (SmolLM2 360M, Qwen3 0.6B default, Llama 3.2 1B,
Gemma 3 1B) and a custom-URL escape hatch.
Regenerate ObjectBox bindings to pick up the HNSW annotation.
* feat(services): add model downloader with progress and resume support
Add ModelDownloader to handle GGUF file downloads from arbitrary URLs
(no HuggingFace token needed — all curated models are public).
Supports progress callbacks for UI, cancellation via a flag checked
between chunks, and download resume via HTTP Range headers using a
.part temp file. Static helpers expose the models directory path so
services and the splash migration can share one download location.
* refactor(ai): replace flutter_gemma services with llamadart
Swap the entire inference layer from flutter_gemma (MediaPipe/LiteRT,
~133 MB native libs) to llamadart (llama.cpp via Dart Native Assets,
~5.3 MB compact CPU build).
LlmService replaces GemmaService: model-agnostic, takes an
LlmModelConfig, uses ChatML prompt format compatible with Qwen3,
SmolLM2, and Llama-3. No HuggingFace token needed — all GGUF models
are public.
LlmEmbeddingService replaces EmbeddingService: bge-small-en-v1.5 at
384 dimensions (down from 768). Embeddings are now async (Future) not
sync, so all callers in VectorStoreService are updated with await.
VectorStoreService is fully rewritten to use ObjectBox HNSW queries
instead of a separate SQLite vector DB. No koshika_vectors.db.
* refactor(app): wire llamadart services into shell, add lite/full entry points
Introduce kAiEnabled global flag (set by entry point) to gate all AI
features: Chat tab visibility, AI Models section in Settings, and
vector indexing after PDF import.
Add main_full.dart (aiEnabled: true) and main_lite.dart (aiEnabled:
false) as separate Gradle flavor entry points. SplashScreen initialises
LlmService and LlmEmbeddingService only when kAiEnabled, and runs a
one-time migration to re-embed any stale 768-dim vectors from the old
EmbeddingGemma model.
Settings AI section is fully redesigned: users can pick from 4 curated
GGUF models or paste a custom URL, with inline download/load controls
and progress. No HuggingFace token prompt — all models are public.
* feat(android): add lite/full build flavors and llamadart backend config
Add two Gradle product flavors sharing the appType dimension:
- full: includes all llama.cpp native libs (~5.3 MB, arm64)
- lite: strips libllama.so, libggml*.so, libmtmd.so, libllamadart.so
Uses androidComponents.onVariants for per-variant jniLibs exclusion —
flavor-level packaging.jniLibs.excludes applies globally in Gradle and
would strip AI libs from both variants.
Configure llamadart's Native Assets hook in pubspec.yaml to use
cpu_profile: compact (1 baseline CPU variant instead of 7) and
backends: [cpu] (no Vulkan), reducing native lib overhead from 99 MB
to 5.3 MB. Final APK sizes: full arm64 = 41.8 MB, lite = 35.7 MB
(down from 150 MB on the pre-Phase-2 branch).
CI builds both flavors and uploads all four APKs (arm64/x64 × lite/full)
to the GitHub release.
* fix(ai): harden model switching, downloads, and migration
- cancel and await in-flight model downloads before switching configs\n- ignore stale download callbacks so old transfers cannot mutate the new model state\n- dispose LlamaEngine instances on unload and error paths in both chat and embedding services\n- reject non-success HTTP responses before caching model files and normalize forced-cancel errors\n- share GGUF URL validation and filename derivation between settings and model config creation\n- move embedding migration into a reusable post-load flow used by splash and settings\n- keep flutter analyze and both lite/full release APK builds passing
0 commit comments