|
1 | 1 | # Koshika |
2 | 2 |
|
3 | 3 |  |
4 | | - |
| 4 | + |
5 | 5 |  |
| 6 | + |
6 | 7 |
|
7 | 8 | **Your health data lives in your cell, not the cloud.** |
8 | | -Koshika is an offline-first, privacy-focused Flutter application designed to extract, parse, and trend biomarker data from unstructured PDF lab reports entirely on-device. |
9 | 9 |
|
10 | | -## Features |
| 10 | +Koshika is an offline-first, privacy-focused health app that extracts biomarker data from PDF lab reports, tracks trends over time, and lets you discuss your results with an on-device AI — all without a single byte leaving your phone. |
| 11 | + |
| 12 | +--- |
11 | 13 |
|
12 | | -* **On-Device PDF Parsing:** Extracts raw text and structure from PDFs securely using `syncfusion_flutter_pdf` without sending your personal health data to the cloud. |
13 | | -* **OCR Fallback for Scanned Reports:** When a PDF page has little or no selectable text, the app can render that page locally and run on-device OCR to recover lab values. |
14 | | -* **Intelligent Regex Engine:** A multi-pattern matching fallback system specifically designed to handle unstructured formats typical of Indian pathology labs (Thyrocare, SRL, Dr. Lal, etc.). |
15 | | -* **Fuzzy Term Matching:** Standardizes raw lab terminology into an internal dictionary schema so that variations (e.g., "FASTING SUGAR" vs "Glucose F") align perfectly for historical tracking. |
16 | | -* **Private Database:** Data is persisted in lightning-fast `ObjectBox` document stores. |
17 | | -* **Historical Trends & Detail Views:** Review biomarker history with charts, reference range gauges, and flag badges for abnormal values. |
18 | | -* **FHIR R4 Export:** Export imported reports and biomarker observations as a shareable FHIR bundle. |
19 | | -* **Beautiful Visualizations:** Dynamic charting with `fl_chart` to track your health trends over time. |
| 14 | +## Why Koshika? |
20 | 15 |
|
21 | | -## Platforms |
| 16 | +Indian pathology labs produce PDF reports in dozens of inconsistent formats. Most health apps either can't parse them, or require uploading to a cloud service to try. |
22 | 17 |
|
23 | | -- Android |
24 | | -- iOS |
25 | | -- Web (Planned) |
| 18 | +Koshika solves this differently: |
26 | 19 |
|
27 | | -## Architecture & Tech Stack |
| 20 | +- **Parses locally.** A multi-pattern regex engine + fuzzy matching handles the formatting chaos of Thyrocare, SRL, Dr. Lal PathLabs, and others — directly on your device. |
| 21 | +- **Understands your data.** Biomarkers are normalized to a standard dictionary, flagged against reference ranges, and tracked historically with trend charts. |
| 22 | +- **Runs AI on-device.** Gemma 3 1B runs inference locally via MediaPipe. Ask questions about your reports and get citation-backed answers grounded in your actual lab values. |
| 23 | +- **Never phones home.** No accounts, no telemetry, no cloud sync. Your health data stays in ObjectBox on your device. |
28 | 24 |
|
29 | | -This project uses standard Flutter `StatefulWidget` tree passing for local state, persisting to an **ObjectBox NoSQL database** for offline persistence. |
| 25 | +--- |
30 | 26 |
|
31 | | -- **Frontend:** Flutter |
32 | | -- **Local DB:** ObjectBox |
33 | | -- **Extraction:** syncfusion_flutter_pdf |
34 | | -- **OCR Fallback:** google_mlkit_text_recognition + pdfx |
35 | | -- **Text Analysis:** custom Multi-Regex Engine + string_similarity matching |
36 | | -- **Export:** FHIR R4 bundle generation |
| 27 | +## Features |
37 | 28 |
|
38 | | -## Project Structure |
| 29 | +### PDF Parsing |
| 30 | +- Extracts structured data from digital PDFs using `syncfusion_flutter_pdf` |
| 31 | +- OCR fallback for scanned/image-based pages using Google ML Kit |
| 32 | +- Staged import progress with clear error messaging for unsupported layouts |
| 33 | +- Fuzzy term matching normalizes lab-specific naming ("FASTING SUGAR" → "Glucose, Fasting") across 63 biomarker definitions in 10 medical categories |
| 34 | + |
| 35 | +### On-Device AI |
| 36 | +- **Gemma 3 1B IT** — instruction-tuned LLM running locally via `flutter_gemma` + MediaPipe |
| 37 | +- GPU-first inference with automatic CPU fallback |
| 38 | +- Streaming token-by-token responses |
| 39 | +- **EmbeddingGemma 300M** — on-device embeddings for semantic search (~75 MB, 768-dim) |
| 40 | +- **RAG pipeline** — embeds your query, searches an HNSW vector index of your lab results, injects the top-5 matches as context, and generates grounded responses with source citations `[1]`, `[2]` |
| 41 | +- Graceful degradation — keyword search works seamlessly when the embedding model isn't loaded |
| 42 | + |
| 43 | +### Dashboard & Trends |
| 44 | +- Health overview with tracked biomarker count, abnormal flags, and borderline detection (within 10% of reference boundaries) |
| 45 | +- "Attention Needed" panel for out-of-range results |
| 46 | +- Category-level trend indicators |
| 47 | +- Biomarker detail view with interactive `fl_chart` trend visualization, reference range gauge, and color-coded history |
| 48 | +- Borderline flag detection reflected across the entire app |
| 49 | + |
| 50 | +### Privacy & Export |
| 51 | +- All processing happens on-device — parsing, AI inference, embeddings, search |
| 52 | +- ObjectBox local database with no network dependency |
| 53 | +- FHIR R4 Bundle export for sharing with healthcare providers |
| 54 | +- Native share sheet integration via `share_plus` |
| 55 | + |
| 56 | +### Onboarding |
| 57 | +- Animated splash screen with branded fade/scale animation |
| 58 | +- 3-screen onboarding flow (Welcome, How it Works, Privacy) |
| 59 | +- Subsequent launches skip directly to home |
| 60 | + |
| 61 | +--- |
| 62 | + |
| 63 | +## Architecture |
| 64 | + |
| 65 | +| Layer | Technology | |
| 66 | +|-------|------------| |
| 67 | +| Frontend | Flutter (StatefulWidget) | |
| 68 | +| Local DB | ObjectBox | |
| 69 | +| PDF Extraction | syncfusion_flutter_pdf | |
| 70 | +| OCR Fallback | google_mlkit_text_recognition + pdfx | |
| 71 | +| Text Analysis | Custom multi-regex engine + string_similarity | |
| 72 | +| On-Device LLM | flutter_gemma (Gemma 3 1B IT, MediaPipe) | |
| 73 | +| Embeddings | EmbeddingGemma 300M (HNSW via SQLite VectorStore) | |
| 74 | +| Charts | fl_chart | |
| 75 | +| Export | FHIR R4 (fhir_r4 package) | |
| 76 | + |
| 77 | +### Project Structure |
39 | 78 |
|
40 | 79 | ```text |
41 | 80 | lib/ |
42 | | -├── models/ # ObjectBox Entities (Patient, LabReport, BiomarkerResult) |
43 | | -├── screens/ # UI Views (ReportDetails, Home, Dashboard) |
44 | | -├── services/ # Core Logic (PdfExtractor, LabParser, StoreOrchestrator) |
45 | | -└── main.dart # App Entry Point & Navigation |
46 | | -assets/ |
47 | | -└── data/ # Local JSON dictionaries mapping lab terminology |
48 | | -``` |
| 81 | +├── models/ # ObjectBox entities (Patient, LabReport, BiomarkerResult, ChatMessage) |
| 82 | +├── screens/ # UI (Dashboard, Reports, Chat, Settings, BiomarkerDetail, Onboarding) |
| 83 | +├── services/ # Core logic (PDF extraction, parsing, AI, embeddings, vector search, FHIR) |
| 84 | +├── widgets/ # Reusable components (trend chart, gauge, flag badge, chat bubble) |
| 85 | +└── main.dart # App entry, theme, navigation |
49 | 86 |
|
50 | | -## Current App Flow |
| 87 | +assets/data/ # Biomarker dictionary (63 definitions, 10 categories) |
| 88 | +``` |
51 | 89 |
|
52 | | -- `Dashboard`: shows the latest biomarker snapshot, out-of-range markers, and category-wise summaries. |
53 | | -- `Reports`: imports PDF lab reports, stores parsed results locally, and exports data as FHIR JSON. |
54 | | -- `Reports`: shows staged progress while importing, including OCR fallback warnings for scanned or mixed PDFs. |
55 | | -- `Biomarker Detail`: displays trend charts, reference ranges, and report history for a selected biomarker. |
56 | | -- `AI Chat`: currently a placeholder for future on-device health-data chat. |
| 90 | +--- |
57 | 91 |
|
58 | 92 | ## Getting Started |
59 | 93 |
|
60 | 94 | ### Prerequisites |
61 | | -* Flutter SDK (Compatible with Dart `>=3.9.2`) |
62 | | -* Code Editor (VS Code / Android Studio) |
| 95 | +- Flutter SDK (Dart >=3.9.2) |
| 96 | +- Android device or emulator (API 26+) |
63 | 97 |
|
64 | 98 | ### Installation |
65 | 99 |
|
66 | | -1. Clone the repository: |
67 | 100 | ```bash |
68 | 101 | git clone https://github.com/priyavratuniyal/koshika.git |
69 | 102 | cd koshika |
70 | | -``` |
71 | | - |
72 | | -2. Fetch Flutter packages: |
73 | | -```bash |
74 | 103 | flutter pub get |
75 | | -``` |
76 | | - |
77 | | -3. Generate the ObjectBox bindings: |
78 | | -```bash |
79 | 104 | dart run build_runner build --delete-conflicting-outputs |
80 | | -``` |
81 | | - |
82 | | -4. Run the app: |
83 | | -```bash |
84 | 105 | flutter run |
85 | 106 | ``` |
86 | 107 |
|
87 | | -### Optional: Auto-format on commit |
| 108 | +### Auto-format on commit (optional) |
88 | 109 |
|
89 | | -This repo includes a Husky `pre-commit` hook that auto-formats staged Dart files before the commit is created. |
| 110 | +This repo uses a Husky pre-commit hook that auto-formats staged Dart files: |
90 | 111 |
|
91 | | -If you're cloning fresh, install the hook setup with: |
92 | 112 | ```bash |
93 | 113 | npm install |
94 | 114 | ``` |
95 | 115 |
|
96 | | -## Current Limitations |
| 116 | +--- |
| 117 | + |
| 118 | +## Roadmap |
| 119 | + |
| 120 | +- [x] PDF parsing with multi-pattern regex engine |
| 121 | +- [x] OCR fallback for scanned reports |
| 122 | +- [x] Fuzzy biomarker matching (63 definitions, 10 categories) |
| 123 | +- [x] Dashboard with health overview and trend indicators |
| 124 | +- [x] Biomarker detail view with trend charts and reference gauges |
| 125 | +- [x] Borderline detection (10% margin flagging) |
| 126 | +- [x] FHIR R4 export |
| 127 | +- [x] On-device LLM (Gemma 3 1B IT via MediaPipe) |
| 128 | +- [x] Semantic search with EmbeddingGemma 300M |
| 129 | +- [x] RAG pipeline with citation-backed responses |
| 130 | +- [x] Animated splash screen and onboarding flow |
| 131 | +- [ ] Conversation memory (persistent chat sessions) |
| 132 | +- [ ] Health Connect wearable integration |
| 133 | +- [ ] Computed health risk scores (FIB-4, eGFR, APRI) |
| 134 | +- [ ] Biomarker anomaly detection (EWMA, personal baselines) |
| 135 | +- [ ] Nutritional and lifestyle recommendations |
| 136 | +- [ ] LLM-assisted PDF extraction fallback |
| 137 | +- [ ] Encrypted local storage with biometric lock |
| 138 | +- [ ] Multi-patient profile support |
| 139 | + |
| 140 | +--- |
| 141 | + |
| 142 | +## Known Limitations |
97 | 143 |
|
98 | | -- OCR support for scanned or photographed PDF reports is experimental and still needs validation against more real-world lab formats. |
99 | | -- Web is planned, but the current implementation is focused on local mobile workflows. |
100 | | -- The chat assistant is not implemented yet. |
| 144 | +- OCR for scanned reports is experimental — accuracy varies across lab formats |
| 145 | +- Web platform is planned but not yet implemented |
| 146 | +- The on-device LLM (1B parameters) is best suited for simple explanations; complex medical reasoning has limits inherent to the model size |
| 147 | + |
| 148 | +--- |
101 | 149 |
|
102 | 150 | ## Contributing |
103 | | -Pull requests are welcome! If you find a lab report format that our parser fails to scrape, please consider opening an issue or contributing a regex fallback. |
104 | 151 |
|
105 | | -When contributing, please keep each commit focused on one logical change and use clear conventional commit messages such as `feat:`, `fix:`, `chore:`, or `refactor:`. |
| 152 | +Pull requests are welcome. |
| 153 | + |
| 154 | +If you find a lab report format that the parser fails to handle, please open an issue with a redacted sample (remove personal information) or contribute a regex pattern. |
| 155 | + |
| 156 | +When contributing, keep each commit focused on one logical change and use conventional commit messages (`feat:`, `fix:`, `chore:`, `refactor:`). |
| 157 | + |
| 158 | +--- |
106 | 159 |
|
107 | 160 | ## License |
108 | | -This project is open-source and available under standard open source provisions. |
| 161 | + |
| 162 | +[MIT](LICENSE) |
0 commit comments