Skip to content

Commit bac5f9d

Browse files
perf(android): replace flutter_gemma with llamadart — 150 MB → 42 MB APK (#3)
* perf(android): reduce APK size with ABI splitting, R8, and lib exclusions - Split release APK per ABI (arm64 + x64 only), dropping universal fat build - Enable R8 minification and resource shrinking for release builds - Exclude Vulkan debug validation layer from release packaging - Exclude unused MediaPipe image generation libs from release packaging * refactor(models): add HNSW vector index and app-owned RetrievalResult Add @HnswIndex(dimensions: 384) to BiomarkerResult.embedding so ObjectBox can run native nearest-neighbor search — eliminating the separate koshika_vectors.db SQLite store that flutter_gemma required. Introduce RetrievalResult as an app-owned type so the RAG pipeline (ChatContextBuilder, CitationExtractor) no longer imports from flutter_gemma. Add LlmModelConfig and LlmModelRegistry with 4 curated public GGUF models (SmolLM2 360M, Qwen3 0.6B default, Llama 3.2 1B, Gemma 3 1B) and a custom-URL escape hatch. Regenerate ObjectBox bindings to pick up the HNSW annotation. * feat(services): add model downloader with progress and resume support Add ModelDownloader to handle GGUF file downloads from arbitrary URLs (no HuggingFace token needed — all curated models are public). Supports progress callbacks for UI, cancellation via a flag checked between chunks, and download resume via HTTP Range headers using a .part temp file. Static helpers expose the models directory path so services and the splash migration can share one download location. * refactor(ai): replace flutter_gemma services with llamadart Swap the entire inference layer from flutter_gemma (MediaPipe/LiteRT, ~133 MB native libs) to llamadart (llama.cpp via Dart Native Assets, ~5.3 MB compact CPU build). LlmService replaces GemmaService: model-agnostic, takes an LlmModelConfig, uses ChatML prompt format compatible with Qwen3, SmolLM2, and Llama-3. No HuggingFace token needed — all GGUF models are public. LlmEmbeddingService replaces EmbeddingService: bge-small-en-v1.5 at 384 dimensions (down from 768). Embeddings are now async (Future) not sync, so all callers in VectorStoreService are updated with await. VectorStoreService is fully rewritten to use ObjectBox HNSW queries instead of a separate SQLite vector DB. No koshika_vectors.db. * refactor(app): wire llamadart services into shell, add lite/full entry points Introduce kAiEnabled global flag (set by entry point) to gate all AI features: Chat tab visibility, AI Models section in Settings, and vector indexing after PDF import. Add main_full.dart (aiEnabled: true) and main_lite.dart (aiEnabled: false) as separate Gradle flavor entry points. SplashScreen initialises LlmService and LlmEmbeddingService only when kAiEnabled, and runs a one-time migration to re-embed any stale 768-dim vectors from the old EmbeddingGemma model. Settings AI section is fully redesigned: users can pick from 4 curated GGUF models or paste a custom URL, with inline download/load controls and progress. No HuggingFace token prompt — all models are public. * feat(android): add lite/full build flavors and llamadart backend config Add two Gradle product flavors sharing the appType dimension: - full: includes all llama.cpp native libs (~5.3 MB, arm64) - lite: strips libllama.so, libggml*.so, libmtmd.so, libllamadart.so Uses androidComponents.onVariants for per-variant jniLibs exclusion — flavor-level packaging.jniLibs.excludes applies globally in Gradle and would strip AI libs from both variants. Configure llamadart's Native Assets hook in pubspec.yaml to use cpu_profile: compact (1 baseline CPU variant instead of 7) and backends: [cpu] (no Vulkan), reducing native lib overhead from 99 MB to 5.3 MB. Final APK sizes: full arm64 = 41.8 MB, lite = 35.7 MB (down from 150 MB on the pre-Phase-2 branch). CI builds both flavors and uploads all four APKs (arm64/x64 × lite/full) to the GitHub release. * fix(ai): harden model switching, downloads, and migration - cancel and await in-flight model downloads before switching configs\n- ignore stale download callbacks so old transfers cannot mutate the new model state\n- dispose LlamaEngine instances on unload and error paths in both chat and embedding services\n- reject non-success HTTP responses before caching model files and normalize forced-cancel errors\n- share GGUF URL validation and filename derivation between settings and model config creation\n- move embedding migration into a reusable post-load flow used by splash and settings\n- keep flutter analyze and both lite/full release APK builds passing
1 parent 74029b4 commit bac5f9d

27 files changed

Lines changed: 1802 additions & 1290 deletions

.github/workflows/release.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,10 @@ jobs:
3434
- name: Generate ObjectBox bindings
3535
run: dart run build_runner build --delete-conflicting-outputs
3636

37-
- name: Build release APK
38-
run: flutter build apk --release --split-per-abi --target-platform android-arm64,android-x64
37+
- name: Build release APKs
38+
run: |
39+
flutter build apk --flavor lite -t lib/main_lite.dart --release --split-per-abi --target-platform android-arm64,android-x64
40+
flutter build apk --flavor full -t lib/main_full.dart --release --split-per-abi --target-platform android-arm64,android-x64
3941
4042
- name: Rename APKs with version
4143
run: |

android/app/build.gradle.kts

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,7 @@ android {
2020
}
2121

2222
defaultConfig {
23-
// TODO: Specify your own unique Application ID (https://developer.android.com/studio/build/application-id.html).
2423
applicationId = "dev.koshika.koshika"
25-
// You can update the following values to match your application needs.
26-
// For more information, see: https://flutter.dev/to/review-gradle-config.
2724
minSdk = flutter.minSdkVersion
2825
targetSdk = flutter.targetSdkVersion
2926
versionCode = flutter.versionCode
@@ -32,8 +29,6 @@ android {
3229

3330
buildTypes {
3431
release {
35-
// TODO: Add your own signing config for the release build.
36-
// Signing with the debug keys for now, so `flutter run --release` works.
3732
signingConfig = signingConfigs.getByName("debug")
3833
isMinifyEnabled = true
3934
isShrinkResources = true
@@ -48,6 +43,32 @@ android {
4843
excludes += "**/libimagegenerator_gpu.so"
4944
}
5045
}
46+
47+
flavorDimensions += "appType"
48+
productFlavors {
49+
create("lite") {
50+
dimension = "appType"
51+
applicationIdSuffix = ".lite"
52+
versionNameSuffix = "-lite"
53+
}
54+
create("full") {
55+
dimension = "appType"
56+
}
57+
}
58+
}
59+
60+
// Strip llama.cpp native libs from the lite flavor APK.
61+
// This runs after packaging to remove AI inference libraries,
62+
// keeping the lite APK small (~15MB smaller).
63+
androidComponents {
64+
onVariants(selector().withFlavor("appType" to "lite")) { variant ->
65+
variant.packaging.jniLibs.excludes.addAll(listOf(
66+
"**/libllama.so",
67+
"**/libggml*.so",
68+
"**/libmtmd.so",
69+
"**/libllamadart.so",
70+
))
71+
}
5172
}
5273

5374
flutter {

android/app/proguard-rules.pro

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,3 @@
1-
# MediaPipe optional proto classes (not needed at runtime)
2-
-dontwarn com.google.mediapipe.proto.**
3-
-dontwarn com.google.mediapipe.framework.GraphProfiler
4-
51
# ML Kit optional language model classes (only Latin script is used)
62
-dontwarn com.google.mlkit.vision.text.chinese.**
73
-dontwarn com.google.mlkit.vision.text.devanagari.**

lib/constants/ai_prompts.dart

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,11 @@
33
/// Keeping prompts here (rather than inlined in service files) makes them
44
/// easy to review, iterate, and A/B test without touching service logic.
55
abstract final class AiPrompts {
6-
/// System prompt for the Gemma chat model.
6+
/// System prompt injected before every user query.
77
///
8-
/// Injected as a [Message.systemInfo] turn before every user query.
98
/// Lab data context is added separately to the user turn so that
109
/// small (1B-param) models cannot ignore it.
11-
static const String gemmaSystemPrompt = '''
10+
static const String systemPrompt = '''
1211
You are Koshika AI, a helpful on-device health assistant built into the Koshika app. You help users understand their lab report results.
1312
1413
CRITICAL RULES:

lib/main.dart

Lines changed: 72 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
import 'dart:ui';
22

33
import 'package:flutter/material.dart';
4-
import 'package:flutter_gemma/flutter_gemma.dart';
54

65
import 'screens/chat_screen.dart';
76
import 'screens/dashboard_screen.dart';
@@ -11,25 +10,67 @@ import 'screens/settings_screen.dart';
1110
import 'screens/splash_screen.dart';
1211
import 'services/objectbox_store.dart';
1312
import 'services/biomarker_dictionary.dart';
14-
import 'services/embedding_service.dart';
15-
import 'services/gemma_service.dart';
13+
import 'services/llm_embedding_service.dart';
14+
import 'services/llm_service.dart';
1615
import 'services/vector_store_service.dart';
1716
import 'theme/app_colors.dart';
1817
import 'theme/koshika_design_system.dart';
1918

2019
/// Global references — initialized in SplashScreen before navigation.
2120
late ObjectBoxStore objectbox;
2221
late BiomarkerDictionary biomarkerDictionary;
23-
late GemmaService gemmaService;
24-
late EmbeddingService embeddingService;
22+
late LlmService llmService;
23+
late LlmEmbeddingService embeddingService;
2524
late VectorStoreService vectorStoreService;
25+
Future<void>? _embeddingMigrationTask;
2626

27-
Future<void> main() async {
27+
/// Whether AI features are enabled. Set by the entry point
28+
/// (main_full.dart vs main_lite.dart).
29+
bool kAiEnabled = true;
30+
31+
/// Default entry point — full flavor with AI enabled.
32+
/// For flavor-specific builds, use main_full.dart or main_lite.dart.
33+
Future<void> main() async => appMain(aiEnabled: true);
34+
35+
/// Shared app bootstrap — called from entry-point files.
36+
Future<void> appMain({required bool aiEnabled}) async {
2837
WidgetsFlutterBinding.ensureInitialized();
29-
await FlutterGemma.initialize(maxDownloadRetries: 5);
38+
kAiEnabled = aiEnabled;
3039
runApp(const KoshikaApp());
3140
}
3241

42+
Future<void> migrateEmbeddingsIfNeeded() {
43+
if (!kAiEnabled || !embeddingService.isLoaded) {
44+
return Future.value();
45+
}
46+
47+
return _embeddingMigrationTask ??= _runEmbeddingMigration().whenComplete(() {
48+
_embeddingMigrationTask = null;
49+
});
50+
}
51+
52+
Future<void> _runEmbeddingMigration() async {
53+
try {
54+
final allResults = objectbox.biomarkerResultBox.getAll();
55+
if (allResults.isEmpty) return;
56+
57+
final needsMigration = allResults.any(
58+
(result) =>
59+
result.embedding == null ||
60+
result.embedding!.isEmpty ||
61+
result.embedding!.length != 384,
62+
);
63+
if (!needsMigration) return;
64+
65+
debugPrint(
66+
'KoshikaApp: migrating ${allResults.length} embeddings to 384-dim',
67+
);
68+
await vectorStoreService.rebuildIndex(allResults);
69+
} catch (e) {
70+
debugPrint('Embedding migration failed (non-fatal): $e');
71+
}
72+
}
73+
3374
class KoshikaApp extends StatelessWidget {
3475
const KoshikaApp({super.key});
3576

@@ -123,19 +164,19 @@ class HomeScreen extends StatefulWidget {
123164
class _HomeScreenState extends State<HomeScreen> {
124165
int _currentIndex = 0;
125166

167+
/// Screens available depends on whether AI is enabled.
168+
List<Widget> get _screens => kAiEnabled
169+
? const [
170+
DashboardScreen(),
171+
ReportsScreen(),
172+
ChatScreen(),
173+
SettingsScreen(),
174+
]
175+
: const [DashboardScreen(), ReportsScreen(), SettingsScreen()];
176+
126177
Widget _buildCurrentScreen() {
127-
switch (_currentIndex) {
128-
case 0:
129-
return const DashboardScreen();
130-
case 1:
131-
return const ReportsScreen();
132-
case 2:
133-
return const ChatScreen();
134-
case 3:
135-
return const SettingsScreen();
136-
default:
137-
return const DashboardScreen();
138-
}
178+
if (_currentIndex < _screens.length) return _screens[_currentIndex];
179+
return const DashboardScreen();
139180
}
140181

141182
@override
@@ -175,21 +216,23 @@ class _HomeScreenState extends State<HomeScreen> {
175216
label: 'Reports',
176217
onTap: () => setState(() => _currentIndex = 1),
177218
),
219+
if (kAiEnabled)
220+
_NavItem(
221+
index: 2,
222+
currentIndex: _currentIndex,
223+
icon: Icons.chat_outlined,
224+
activeIcon: Icons.chat,
225+
label: 'Chat',
226+
onTap: () => setState(() => _currentIndex = 2),
227+
),
178228
_NavItem(
179-
index: 2,
180-
currentIndex: _currentIndex,
181-
icon: Icons.chat_outlined,
182-
activeIcon: Icons.chat,
183-
label: 'Chat',
184-
onTap: () => setState(() => _currentIndex = 2),
185-
),
186-
_NavItem(
187-
index: 3,
229+
index: kAiEnabled ? 3 : 2,
188230
currentIndex: _currentIndex,
189231
icon: Icons.settings_outlined,
190232
activeIcon: Icons.settings,
191233
label: 'Settings',
192-
onTap: () => setState(() => _currentIndex = 3),
234+
onTap: () =>
235+
setState(() => _currentIndex = kAiEnabled ? 3 : 2),
193236
),
194237
],
195238
),

lib/main_full.dart

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
import 'main.dart';
2+
3+
/// Entry point for the full flavor — includes all AI features.
4+
Future<void> main() async => appMain(aiEnabled: true);

lib/main_lite.dart

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
import 'main.dart';
2+
3+
/// Entry point for the lite flavor — no AI, no model downloads.
4+
Future<void> main() async => appMain(aiEnabled: false);

lib/models/biomarker_result.dart

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,9 @@ class BiomarkerResult {
6161
@Index()
6262
DateTime testDate;
6363

64-
/// Embedding vector for RAG (stored as float list, used with HNSW later)
64+
/// Embedding vector for RAG (384-dim from bge-small-en-v1.5).
6565
@Property(type: PropertyType.floatVector)
66+
@HnswIndex(dimensions: 384)
6667
List<double>? embedding;
6768

6869
BiomarkerResult({

0 commit comments

Comments
 (0)