perf(android): replace flutter_gemma with llamadart — 150 MB → 42 MB APK#3
Conversation
…ions - Split release APK per ABI (arm64 + x64 only), dropping universal fat build - Enable R8 minification and resource shrinking for release builds - Exclude Vulkan debug validation layer from release packaging - Exclude unused MediaPipe image generation libs from release packaging
Add @HnswIndex(dimensions: 384) to BiomarkerResult.embedding so ObjectBox can run native nearest-neighbor search — eliminating the separate koshika_vectors.db SQLite store that flutter_gemma required. Introduce RetrievalResult as an app-owned type so the RAG pipeline (ChatContextBuilder, CitationExtractor) no longer imports from flutter_gemma. Add LlmModelConfig and LlmModelRegistry with 4 curated public GGUF models (SmolLM2 360M, Qwen3 0.6B default, Llama 3.2 1B, Gemma 3 1B) and a custom-URL escape hatch. Regenerate ObjectBox bindings to pick up the HNSW annotation.
Add ModelDownloader to handle GGUF file downloads from arbitrary URLs (no HuggingFace token needed — all curated models are public). Supports progress callbacks for UI, cancellation via a flag checked between chunks, and download resume via HTTP Range headers using a .part temp file. Static helpers expose the models directory path so services and the splash migration can share one download location.
Swap the entire inference layer from flutter_gemma (MediaPipe/LiteRT, ~133 MB native libs) to llamadart (llama.cpp via Dart Native Assets, ~5.3 MB compact CPU build). LlmService replaces GemmaService: model-agnostic, takes an LlmModelConfig, uses ChatML prompt format compatible with Qwen3, SmolLM2, and Llama-3. No HuggingFace token needed — all GGUF models are public. LlmEmbeddingService replaces EmbeddingService: bge-small-en-v1.5 at 384 dimensions (down from 768). Embeddings are now async (Future) not sync, so all callers in VectorStoreService are updated with await. VectorStoreService is fully rewritten to use ObjectBox HNSW queries instead of a separate SQLite vector DB. No koshika_vectors.db.
…y points Introduce kAiEnabled global flag (set by entry point) to gate all AI features: Chat tab visibility, AI Models section in Settings, and vector indexing after PDF import. Add main_full.dart (aiEnabled: true) and main_lite.dart (aiEnabled: false) as separate Gradle flavor entry points. SplashScreen initialises LlmService and LlmEmbeddingService only when kAiEnabled, and runs a one-time migration to re-embed any stale 768-dim vectors from the old EmbeddingGemma model. Settings AI section is fully redesigned: users can pick from 4 curated GGUF models or paste a custom URL, with inline download/load controls and progress. No HuggingFace token prompt — all models are public.
Add two Gradle product flavors sharing the appType dimension: - full: includes all llama.cpp native libs (~5.3 MB, arm64) - lite: strips libllama.so, libggml*.so, libmtmd.so, libllamadart.so Uses androidComponents.onVariants for per-variant jniLibs exclusion — flavor-level packaging.jniLibs.excludes applies globally in Gradle and would strip AI libs from both variants. Configure llamadart's Native Assets hook in pubspec.yaml to use cpu_profile: compact (1 baseline CPU variant instead of 7) and backends: [cpu] (no Vulkan), reducing native lib overhead from 99 MB to 5.3 MB. Final APK sizes: full arm64 = 41.8 MB, lite = 35.7 MB (down from 150 MB on the pre-Phase-2 branch). CI builds both flavors and uploads all four APKs (arm64/x64 × lite/full) to the GitHub release.
…ction # Conflicts: # .github/workflows/release.yml # android/app/build.gradle.kts
📝 WalkthroughWalkthroughThe PR replaces flutter_gemma with llamadart for on-device LLM and embedding models, introduces Android product flavors (lite/full) and flavor-specific release APKs, adds new model/download/embedding services and model registry, converts ObjectBox embeddings to an HNSW index, and gates AI initialization and UI by a new Changes
Sequence DiagramsequenceDiagram
actor User
participant ChatScreen
participant LlmService
participant ModelDownloader
participant LlamaEngine
participant ObjectBox
rect rgba(76, 175, 80, 0.5)
Note over ChatScreen,LlmService: Download model flow
User->>ChatScreen: Tap "Download Model"
ChatScreen->>LlmService: downloadModel()
LlmService->>ModelDownloader: download(url, filename, onProgress)
ModelDownloader->>ModelDownloader: resume/stream HTTP chunks
ModelDownloader-->>LlmService: return local path (on success)
LlmService->>LlmService: emit status=ready
end
rect rgba(33, 150, 243, 0.5)
Note over ChatScreen,LlamaEngine: Load & generate
User->>ChatScreen: Tap "Load Model"
ChatScreen->>LlmService: loadModel()
LlmService->>LlamaEngine: loadModelFromPath(path)
LlmService->>LlmService: emit status=loaded
User->>ChatScreen: Send message
ChatScreen->>LlmService: generateResponse(message, context?)
LlmService->>LlamaEngine: generate(prompt)
LlamaEngine-->>LlmService: stream tokens
LlmService-->>ChatScreen: stream<String> tokens
ChatScreen->>ChatScreen: render streaming response
end
rect rgba(156, 39, 176, 0.5)
Note over ChatScreen,ObjectBox: Semantic search via ObjectBox HNSW
ChatScreen->>LlmService: embed(query)
LlmService->>LlmService: return embedding
ChatScreen->>ObjectBox: BiomarkerResult_.embedding.nearestNeighborsF32(embedding, topK)
ObjectBox-->>ChatScreen: List<RetrievalResult>
ChatScreen->>ChatScreen: append sources to response
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/screens/chat_screen.dart (1)
369-377:⚠️ Potential issue | 🟡 MinorMissing
awaitonstopGeneration()call.
_startNewChat()callsllmService.stopGeneration()withoutawait, but the same method is awaited in_stopGeneration()(line 349) and_loadSession()(line 217). This inconsistency could cause a race condition if generation is still running when the state is cleared.🔧 Proposed fix
- void _startNewChat() { + Future<void> _startNewChat() async { _generationSubscription?.cancel(); _generationSubscription = null; - llmService.stopGeneration(); + await llmService.stopGeneration(); setState(() { _currentSession = null; _messages.clear(); }); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/screens/chat_screen.dart` around lines 369 - 377, The _startNewChat() method calls llmService.stopGeneration() without awaiting it, risking a race condition; make _startNewChat() async and await llmService.stopGeneration() just like in _stopGeneration() and _loadSession(), then proceed to cancel _generationSubscription, clear _currentSession and _messages inside setState; ensure you preserve existing subscription cancellation and nulling of _generationSubscription and handle any errors consistently with the other callers.
🧹 Nitpick comments (1)
android/app/build.gradle.kts (1)
30-36: Release build uses debug signing — intentional for development?The release build type is configured with
signingConfig = signingConfigs.getByName("debug"). This is acceptable for testing and CI artifact generation, but production releases will require a proper release keystore configuration.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@android/app/build.gradle.kts` around lines 30 - 36, The release build currently sets signingConfig = signingConfigs.getByName("debug") inside the buildTypes { release { ... } } block, which uses the debug keystore; replace this with a proper release signing configuration by creating and referencing a release SigningConfig (e.g., signingConfigs.create("release") or signingConfigs.getByName("release")) and point the release buildType to that release SigningConfig, or add logic to load the release keystore properties (storeFile, storePassword, keyAlias, keyPassword) and assign them to the release SigningConfig so production builds are signed with the correct keystore instead of debug.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/models/llm_model_config.dart`:
- Around line 85-97: LlmModelConfig.custom currently accepts any downloadUrl;
add fail-fast validation inside the factory method LlmModelConfig.custom to
parse the URL (use Uri.parse) and ensure it uses the https scheme and the path
ends with ".gguf" (and is non-empty); if validation fails, throw an
ArgumentError with a clear message. Also trim/normalize the input string before
parsing and keep the rest of the returned LlmModelConfig fields unchanged so
invalid/insecure URLs never reach the downloader flow.
In `@lib/screens/settings_screen.dart`:
- Around line 950-977: The URL validation in _submit currently checks the raw
string with url.toLowerCase().endsWith('.gguf'), which allows query-only URLs
and misses cases where the path segment is what matters; update _submit to parse
Uri.parse(url) and validate that uri.pathSegments is non-empty and
uri.pathSegments.last.toLowerCase().endsWith('.gguf') (or, if pathSegments is
empty, check query parameters for a value ending in .gguf) and set _urlError if
not; also update _filenameFromUrl to derive the filename from
uri.pathSegments.last when present, otherwise try extracting a .gguf filename
from query parameter values before falling back to the 'Custom Model' default so
displayName and LlmModelRegistry.custom(name: ...) are always valid.
In `@lib/screens/splash_screen.dart`:
- Around line 70-95: The migration currently only runs when the embedder is
ready at splash; fix by ensuring _migrateEmbeddingsIfNeeded() is invoked after
every successful embedding model load by chaining it to
LlmEmbeddingService.loadModel() wherever that method is called (not just in
splash_screen.dart), or centralize the behavior by modifying
LlmEmbeddingService.loadModel() to call a post-load hook/event that triggers
_migrateEmbeddingsIfNeeded() on success; update all callers (e.g., settings code
that triggers embeddingService.loadModel()) to rely on that hook or to
explicitly .then((_) => _migrateEmbeddingsIfNeeded()) so stale 768-dim rows are
re-embedded immediately after any later embedder download/load.
In `@lib/services/llm_embedding_service.dart`:
- Around line 147-150: unloadModel() currently calls await
_engine!.unloadModel() but leaves the old LlamaEngine instance in _engine,
leaking native resources; update unloadModel() to fully dispose and clear the
engine by awaiting _engine?.dispose() (or dispose/unload as appropriate), then
set _engine = null, mirroring the error-path cleanup used in loadModel(); apply
the same dispose-and-null pattern to the other unload/error paths referenced
around the 187–195 region so every code path disposes the LlamaEngine and nulls
_engine.
In `@lib/services/llm_service.dart`:
- Around line 103-129: switchModel currently proceeds while an existing
downloadModel may still be streaming the previous file, causing stale callbacks
to update state and risking partial-file deletion; before tearing down the
current model or calling ModelDownloader.deleteModel(_config.filename), detect
any in-flight download (e.g. a tracked Future/Task or StreamSubscription like
_currentDownloadTask/_currentDownloadSubscription started by downloadModel),
request cancellation on it, await its completion, and remove its
progress/completion listeners so callbacks cannot update _modelInfo/_config
after the switch; only after the in-flight download has fully settled, call
unloadModel, delete the file, set _config = newConfig, persist prefs, and call
_updateStatus. Also ensure download completion callbacks verify the model id
(compare against _config.id) before updating state.
- Around line 219-223: unloadModel() currently calls _engine!.unloadModel() but
never disposes or nulls _engine, leaking native resources when switchModel()
loads a new engine; update unloadModel() to follow the pattern used in
loadModel()’s error handler by calling _engine!.dispose() after unloadModel(),
then set _engine = null and update the model status (e.g. via
_updateStatus(_modelInfo.copyWith(status: ModelStatus.unloaded))) so the engine
is fully cleaned up before a new model is loaded.
In `@lib/services/model_downloader.dart`:
- Around line 90-103: Before streaming/writing the response body, check
response.statusCode after request.close() and reject any non-success codes
(accept 206 Partial Content or any 2xx full responses); if the status is not
2xx/206 (e.g., 404/500/416) throw an error (or return a failure) and do not
proceed to set startByte/total or write/rename the .part file. Concretely, in
the download method where request.close() returns response (symbols:
request.close(), response, startByte, total, response.contentLength), add an
early guard that reads/consumes any error body if needed, logs/throws an
HttpException with response.statusCode and reasonPhrase, and aborts before any
file write/rename operations so failed HTTP responses never corrupt the local
cache.
- Around line 77-90: The download can be force-closed by cancel() which calls
_client.close(force: true) but exceptions thrown before the chunk loop (e.g.,
during request.close() or reading the response) are not normalized to
DownloadCancelledException; wrap the existing try block that creates _client,
calls request.close(), and iterates the response stream in a try { ... } on
Object catch (e, st) { if (_cancelled) throw DownloadCancelledException();
rethrow; } so any transport errors caused by a forced close are re-thrown as
DownloadCancelledException while other errors continue to propagate; reference
_client, cancel(), request.close(), the response stream iteration, and
DownloadCancelledException when making the change.
In `@pubspec.yaml`:
- Around line 52-54: android-x64 is missing the cpu_profile setting which causes
x64 builds to default to full; add cpu_profile: compact under the android-x64
platform entry to match android-arm64. Locate the android-x64 platform block
(label "android-x64") in pubspec.yaml and add the line cpu_profile: compact
alongside backends: [cpu] so both platforms use the same compact CPU profile.
---
Outside diff comments:
In `@lib/screens/chat_screen.dart`:
- Around line 369-377: The _startNewChat() method calls
llmService.stopGeneration() without awaiting it, risking a race condition; make
_startNewChat() async and await llmService.stopGeneration() just like in
_stopGeneration() and _loadSession(), then proceed to cancel
_generationSubscription, clear _currentSession and _messages inside setState;
ensure you preserve existing subscription cancellation and nulling of
_generationSubscription and handle any errors consistently with the other
callers.
---
Nitpick comments:
In `@android/app/build.gradle.kts`:
- Around line 30-36: The release build currently sets signingConfig =
signingConfigs.getByName("debug") inside the buildTypes { release { ... } }
block, which uses the debug keystore; replace this with a proper release signing
configuration by creating and referencing a release SigningConfig (e.g.,
signingConfigs.create("release") or signingConfigs.getByName("release")) and
point the release buildType to that release SigningConfig, or add logic to load
the release keystore properties (storeFile, storePassword, keyAlias,
keyPassword) and assign them to the release SigningConfig so production builds
are signed with the correct keystore instead of debug.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8da3eb0c-fb2f-4d9f-a781-6272298ef7f2
⛔ Files ignored due to path filters (1)
pubspec.lockis excluded by!**/*.lock
📒 Files selected for processing (26)
.github/workflows/release.ymlandroid/app/build.gradle.ktsandroid/app/proguard-rules.prolib/constants/ai_prompts.dartlib/main.dartlib/main_full.dartlib/main_lite.dartlib/models/biomarker_result.dartlib/models/llm_model_config.dartlib/models/model_info.dartlib/models/retrieval_result.dartlib/objectbox-model.jsonlib/objectbox.g.dartlib/screens/chat_screen.dartlib/screens/reports_screen.dartlib/screens/settings_screen.dartlib/screens/splash_screen.dartlib/services/chat_context_builder.dartlib/services/citation_extractor.dartlib/services/embedding_service.dartlib/services/gemma_service.dartlib/services/llm_embedding_service.dartlib/services/llm_service.dartlib/services/model_downloader.dartlib/services/vector_store_service.dartpubspec.yaml
💤 Files with no reviewable changes (3)
- android/app/proguard-rules.pro
- lib/services/gemma_service.dart
- lib/services/embedding_service.dart
| // AI services — only when enabled (full flavor) | ||
| if (app_main.kAiEnabled) { | ||
| app_main.llmService = LlmService(LlmModelRegistry.defaultModel); | ||
| await app_main.llmService.initialize(); | ||
|
|
||
| app_main.embeddingService = LlmEmbeddingService(); | ||
| await app_main.embeddingService.initialize(); | ||
|
|
||
| app_main.vectorStoreService = VectorStoreService( | ||
| app_main.embeddingService, | ||
| ); | ||
|
|
||
| // Auto-load models if already downloaded | ||
| if (app_main.llmService.currentModelInfo.status == ModelStatus.ready) { | ||
| // ignore: unawaited_futures | ||
| app_main.llmService.loadModel(); | ||
| } | ||
| if (app_main.embeddingService.currentModelInfo.status == | ||
| ModelStatus.ready) { | ||
| // ignore: unawaited_futures | ||
| app_main.embeddingService.loadModel().then((_) { | ||
| // Re-embed any results that have stale or missing embeddings | ||
| _migrateEmbeddingsIfNeeded(); | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
Run the 768→384 migration on the first successful embedder load, not only during splash.
Lines 87-94 only schedule _migrateEmbeddingsIfNeeded() when the embedding model is already ready at startup. After an upgrade, users who download/load the new embedder later in Settings keep stale 768-dim rows until the next app restart, so semantic search/chat retrieval is wrong for that whole session.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/screens/splash_screen.dart` around lines 70 - 95, The migration currently
only runs when the embedder is ready at splash; fix by ensuring
_migrateEmbeddingsIfNeeded() is invoked after every successful embedding model
load by chaining it to LlmEmbeddingService.loadModel() wherever that method is
called (not just in splash_screen.dart), or centralize the behavior by modifying
LlmEmbeddingService.loadModel() to call a post-load hook/event that triggers
_migrateEmbeddingsIfNeeded() on success; update all callers (e.g., settings code
that triggers embeddingService.loadModel()) to rely on that hook or to
explicitly .then((_) => _migrateEmbeddingsIfNeeded()) so stale 768-dim rows are
re-embedded immediately after any later embedder download/load.
| android-x64: | ||
| backends: [cpu] | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What are the valid cpu_profile options for llamadart native backends configuration?
💡 Result:
The valid cpu_profile options for llamadart native backends configuration, specifically for Android arm64 platforms under llamadart_native_backends in pubspec.yaml, are "full" (default, bundles all Android ARM CPU variants for best runtime optimization coverage) and "compact" (bundles only the baseline CPU variant for smaller package size). An advanced option "cpu_variants" can override cpu_profile with an exact list of canonical variants like "android_armv8.2_2", but it is not a cpu_profile value itself.
Citations:
- 1: https://pub.dev/documentation/llamadart/latest/index.html
- 2: https://github.com/leehack/llamadart
- 3: https://pub.dev/packages/llamadart
🌐 Web query:
llamadart android-x64 cpu_profile configuration native backends
💡 Result:
Llamadart is a Dart/Flutter plugin for running llama.cpp GGUF models locally, supporting Android-x64 among other platforms. For Android-x64 (primarily for emulators), native backends default to CPU and Vulkan, configurable via pubspec.yaml. Configuration is done in pubspec.yaml under hooks for llamadart_native_backends: hooks: user_defines: llamadart: llamadart_native_backends: platforms: android-x64: backends: [cpu, vulkan, opencl] # Optional: specify backends (defaults: cpu, vulkan) cpu_profile: full # Or 'compact' for smaller bundle cpu_profile controls CPU optimization variants bundled: - full (default): All Android CPU variants for broad compatibility and optimization. - compact: Baseline CPU variant only, smaller package size. - cpu_variants: Advanced, exact list like ['android_armv8.0_1'] (but ARM-focused; x64 uses x86_64 optimizations). CPU is always included as fallback. Vulkan requires Vulkan loader; OpenCL opt-in. Apple platforms non-configurable (Metal). On first run, it auto-downloads/builds native assets via Pure Native Assets (zero manual config needed). Example full config: hooks: user_defines: llamadart: llamadart_native_backends: platforms: android-x64: backends: [vulkan] cpu_profile: compact android-arm64: backends: [vulkan] cpu_profile: full Usage in code auto-selects best available backend.
Citations:
- 1: https://github.com/leehack/llamadart
- 2: https://pub.dev/packages/llamadart
- 3: https://pub.dev/documentation/llamadart/latest/index.html
Add missing cpu_profile: compact for android-x64 platform.
The android-arm64 platform specifies both backends: [cpu] and cpu_profile: compact, but android-x64 only has backends: [cpu]. Without an explicit cpu_profile, x64 builds will default to full (all CPU optimization variants), resulting in a larger bundle than arm64. This asymmetry undermines the APK size reduction goals of this PR.
🔧 Proposed fix
android-x64:
backends: [cpu]
+ cpu_profile: compact📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| android-x64: | |
| backends: [cpu] | |
| android-x64: | |
| backends: [cpu] | |
| cpu_profile: compact |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pubspec.yaml` around lines 52 - 54, android-x64 is missing the cpu_profile
setting which causes x64 builds to default to full; add cpu_profile: compact
under the android-x64 platform entry to match android-arm64. Locate the
android-x64 platform block (label "android-x64") in pubspec.yaml and add the
line cpu_profile: compact alongside backends: [cpu] so both platforms use the
same compact CPU profile.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da98430ce9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| final response = await request.close(); | ||
|
|
||
| // Determine total size | ||
| int total; | ||
| if (response.statusCode == HttpStatus.partialContent) { |
There was a problem hiding this comment.
Validate HTTP status before treating model download as successful
This download path starts consuming and persisting the response body without first rejecting non-success HTTP statuses, so 403/404/500 responses (often HTML or JSON error payloads) can be saved as .gguf files and then marked as successfully downloaded. In that scenario the UI transitions to ready, but model load fails later with a misleading error and the bad file remains on disk. Fail fast on any status other than 200/206 before writing the temp file.
Useful? React with 👍 / 👎.
| if (_isGenerating) await stopGeneration(); | ||
| if (_engine != null) { | ||
| await _engine!.unloadModel(); | ||
| } |
There was a problem hiding this comment.
Dispose and clear LlamaEngine when unloading chat model
loadModel() allocates a new LlamaEngine, but unloadModel() only calls unloadModel() on the existing engine and leaves _engine set. On repeated load/unload cycles (including model switches), this can leave prior engine instances undisposed while new ones are created, which risks leaking native llama resources and increasing memory pressure over time. The unload path should dispose and null _engine before subsequent loads.
Useful? React with 👍 / 👎.
| Future<void> unloadModel() async { | ||
| if (_engine != null) { | ||
| await _engine!.unloadModel(); | ||
| } |
There was a problem hiding this comment.
Dispose and clear LlamaEngine when unloading embedder
The embedder service mirrors the same lifecycle issue: loadModel() creates a new LlamaEngine, but unloadModel() never disposes or clears _engine. Repeatedly loading/unloading embeddings can therefore accumulate undisposed native engine state, which is especially risky on memory-constrained devices. Dispose and set _engine to null in the unload flow.
Useful? React with 👍 / 👎.
- cancel and await in-flight model downloads before switching configs\n- ignore stale download callbacks so old transfers cannot mutate the new model state\n- dispose LlamaEngine instances on unload and error paths in both chat and embedding services\n- reject non-success HTTP responses before caching model files and normalize forced-cancel errors\n- share GGUF URL validation and filename derivation between settings and model config creation\n- move embedding migration into a reusable post-load flow used by splash and settings\n- keep flutter analyze and both lite/full release APK builds passing
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lib/main.dart (1)
219-236: Consider using an enum or constants for navigation indices.The hardcoded index logic (
kAiEnabled ? 3 : 2) works but is fragile if more screens are added. A minor improvement would be to derive indices from the_screenslist directly, though this is acceptable for the current scope.♻️ Optional refactor to derive indices
_NavItem( - index: kAiEnabled ? 3 : 2, + index: _screens.length - 1, currentIndex: _currentIndex, icon: Icons.settings_outlined, activeIcon: Icons.settings, label: 'Settings', - onTap: () => - setState(() => _currentIndex = kAiEnabled ? 3 : 2), + onTap: () => + setState(() => _currentIndex = _screens.length - 1), ),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/main.dart` around lines 219 - 236, The code uses hardcoded navigation indices (e.g., kAiEnabled ? 3 : 2) in the _NavItem constructors and onTap handlers which is brittle; change the logic to derive indices from the _screens list (or use named constants/enum) instead of literal numbers: compute the index for the Chat and Settings items from _screens.indexOf(...) or from enum values and replace occurrences in _NavItem(index: ...), currentIndex comparison, and setState assignments that reference _currentIndex so indices remain correct when screens change (refer to _NavItem, _currentIndex, kAiEnabled and the _screens list).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@lib/main.dart`:
- Around line 219-236: The code uses hardcoded navigation indices (e.g.,
kAiEnabled ? 3 : 2) in the _NavItem constructors and onTap handlers which is
brittle; change the logic to derive indices from the _screens list (or use named
constants/enum) instead of literal numbers: compute the index for the Chat and
Settings items from _screens.indexOf(...) or from enum values and replace
occurrences in _NavItem(index: ...), currentIndex comparison, and setState
assignments that reference _currentIndex so indices remain correct when screens
change (refer to _NavItem, _currentIndex, kAiEnabled and the _screens list).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f3f1ec2b-8519-4aa3-aa6d-ca390d6bbc60
📒 Files selected for processing (7)
lib/main.dartlib/models/llm_model_config.dartlib/screens/settings_screen.dartlib/screens/splash_screen.dartlib/services/llm_embedding_service.dartlib/services/llm_service.dartlib/services/model_downloader.dart
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/screens/settings_screen.dart
Summary
flutter_gemma(~133 MB MediaPipe/LiteRT native libs) withllamadart(~5.3 MB, llama.cpp via Dart Native Assets)koshika_vectors.dbAPK Sizes (arm64)
Commits
refactor(models)—@HnswIndex(dimensions: 384), app-ownedRetrievalResult,LlmModelRegistrywith 4 curated models, ObjectBox bindings regeneratedfeat(services)—ModelDownloaderwith HTTP resume, progress callbacks, no token neededrefactor(ai)—LlmService+LlmEmbeddingServicereplacingGemmaService+EmbeddingService;VectorStoreServicerewritten for ObjectBox HNSW; bge-small-en-v1.5 at 384-dim (was 768)refactor(app)—kAiEnabledflag,appMain(),main_full.dart/main_lite.dartentry points, splash migration for 768→384 re-embedding, settings AI section redesignfeat(android)— GradleproductFlavors(lite/full),androidComponents.onVariantsfor per-variant jniLibs exclusion, llamadarthooksconfig (cpu_profile: compact,backends: [cpu]), CI builds both flavorsKey Technical Notes
Why
llamadartnotllama_cpp_dart:llama_cpp_darthas noflutter: plugin:declaration in its pubspec — native libs never compile on Android.llamadartuses Dart's Native Assets system to auto-download pre-compiled binaries duringflutter build.Why
androidComponents.onVariantsnotproductFlavors { packaging }: Flavor-levelpackaging.jniLibs.excludesin Gradle applies globally across all variants. The Variant API (androidComponents.onVariants) correctly scopes exclusions per-variant.llamadart native lib reduction: Default bundle is 99 MB (7 CPU variants + Vulkan). Configured
cpu_profile: compact(1 baseline variant) andbackends: [cpu](no Vulkan) inpubspec.yamlhooks to bring this down to 5.3 MB.Test Plan
flutter analyze— 0 issueslibllama.so/libggml*.soSummary by CodeRabbit
New Features
Improvements