Skip to content

Commit 5f40856

Browse files
docs(readme): rewrite with AI features, roadmap, and architecture table
1 parent 0c51321 commit 5f40856

1 file changed

Lines changed: 116 additions & 62 deletions

File tree

README.md

Lines changed: 116 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -1,108 +1,162 @@
11
# Koshika
22

33
![Status](https://img.shields.io/badge/Status-Active_Development-brightgreen)
4-
![Platform](https://img.shields.io/badge/Platform-Android%20|%20iOS%20|%20Web-blue)
4+
![Platform](https://img.shields.io/badge/Platform-Android%20|%20iOS-blue)
55
![Dart](https://img.shields.io/badge/Dart-%3E%3D3.9.2-0175C2)
6+
![License](https://img.shields.io/badge/License-MIT-green)
67

78
**Your health data lives in your cell, not the cloud.**
8-
Koshika is an offline-first, privacy-focused Flutter application designed to extract, parse, and trend biomarker data from unstructured PDF lab reports entirely on-device.
99

10-
## Features
10+
Koshika is an offline-first, privacy-focused health app that extracts biomarker data from PDF lab reports, tracks trends over time, and lets you discuss your results with an on-device AI — all without a single byte leaving your phone.
11+
12+
---
1113

12-
* **On-Device PDF Parsing:** Extracts raw text and structure from PDFs securely using `syncfusion_flutter_pdf` without sending your personal health data to the cloud.
13-
* **OCR Fallback for Scanned Reports:** When a PDF page has little or no selectable text, the app can render that page locally and run on-device OCR to recover lab values.
14-
* **Intelligent Regex Engine:** A multi-pattern matching fallback system specifically designed to handle unstructured formats typical of Indian pathology labs (Thyrocare, SRL, Dr. Lal, etc.).
15-
* **Fuzzy Term Matching:** Standardizes raw lab terminology into an internal dictionary schema so that variations (e.g., "FASTING SUGAR" vs "Glucose F") align perfectly for historical tracking.
16-
* **Private Database:** Data is persisted in lightning-fast `ObjectBox` document stores.
17-
* **Historical Trends & Detail Views:** Review biomarker history with charts, reference range gauges, and flag badges for abnormal values.
18-
* **FHIR R4 Export:** Export imported reports and biomarker observations as a shareable FHIR bundle.
19-
* **Beautiful Visualizations:** Dynamic charting with `fl_chart` to track your health trends over time.
14+
## Why Koshika?
2015

21-
## Platforms
16+
Indian pathology labs produce PDF reports in dozens of inconsistent formats. Most health apps either can't parse them, or require uploading to a cloud service to try.
2217

23-
- Android
24-
- iOS
25-
- Web (Planned)
18+
Koshika solves this differently:
2619

27-
## Architecture & Tech Stack
20+
- **Parses locally.** A multi-pattern regex engine + fuzzy matching handles the formatting chaos of Thyrocare, SRL, Dr. Lal PathLabs, and others — directly on your device.
21+
- **Understands your data.** Biomarkers are normalized to a standard dictionary, flagged against reference ranges, and tracked historically with trend charts.
22+
- **Runs AI on-device.** Gemma 3 1B runs inference locally via MediaPipe. Ask questions about your reports and get citation-backed answers grounded in your actual lab values.
23+
- **Never phones home.** No accounts, no telemetry, no cloud sync. Your health data stays in ObjectBox on your device.
2824

29-
This project uses standard Flutter `StatefulWidget` tree passing for local state, persisting to an **ObjectBox NoSQL database** for offline persistence.
25+
---
3026

31-
- **Frontend:** Flutter
32-
- **Local DB:** ObjectBox
33-
- **Extraction:** syncfusion_flutter_pdf
34-
- **OCR Fallback:** google_mlkit_text_recognition + pdfx
35-
- **Text Analysis:** custom Multi-Regex Engine + string_similarity matching
36-
- **Export:** FHIR R4 bundle generation
27+
## Features
3728

38-
## Project Structure
29+
### PDF Parsing
30+
- Extracts structured data from digital PDFs using `syncfusion_flutter_pdf`
31+
- OCR fallback for scanned/image-based pages using Google ML Kit
32+
- Staged import progress with clear error messaging for unsupported layouts
33+
- Fuzzy term matching normalizes lab-specific naming ("FASTING SUGAR" → "Glucose, Fasting") across 63 biomarker definitions in 10 medical categories
34+
35+
### On-Device AI
36+
- **Gemma 3 1B IT** — instruction-tuned LLM running locally via `flutter_gemma` + MediaPipe
37+
- GPU-first inference with automatic CPU fallback
38+
- Streaming token-by-token responses
39+
- **EmbeddingGemma 300M** — on-device embeddings for semantic search (~75 MB, 768-dim)
40+
- **RAG pipeline** — embeds your query, searches an HNSW vector index of your lab results, injects the top-5 matches as context, and generates grounded responses with source citations `[1]`, `[2]`
41+
- Graceful degradation — keyword search works seamlessly when the embedding model isn't loaded
42+
43+
### Dashboard & Trends
44+
- Health overview with tracked biomarker count, abnormal flags, and borderline detection (within 10% of reference boundaries)
45+
- "Attention Needed" panel for out-of-range results
46+
- Category-level trend indicators
47+
- Biomarker detail view with interactive `fl_chart` trend visualization, reference range gauge, and color-coded history
48+
- Borderline flag detection reflected across the entire app
49+
50+
### Privacy & Export
51+
- All processing happens on-device — parsing, AI inference, embeddings, search
52+
- ObjectBox local database with no network dependency
53+
- FHIR R4 Bundle export for sharing with healthcare providers
54+
- Native share sheet integration via `share_plus`
55+
56+
### Onboarding
57+
- Animated splash screen with branded fade/scale animation
58+
- 3-screen onboarding flow (Welcome, How it Works, Privacy)
59+
- Subsequent launches skip directly to home
60+
61+
---
62+
63+
## Architecture
64+
65+
| Layer | Technology |
66+
|-------|------------|
67+
| Frontend | Flutter (StatefulWidget) |
68+
| Local DB | ObjectBox |
69+
| PDF Extraction | syncfusion_flutter_pdf |
70+
| OCR Fallback | google_mlkit_text_recognition + pdfx |
71+
| Text Analysis | Custom multi-regex engine + string_similarity |
72+
| On-Device LLM | flutter_gemma (Gemma 3 1B IT, MediaPipe) |
73+
| Embeddings | EmbeddingGemma 300M (HNSW via SQLite VectorStore) |
74+
| Charts | fl_chart |
75+
| Export | FHIR R4 (fhir_r4 package) |
76+
77+
### Project Structure
3978

4079
```text
4180
lib/
42-
├── models/ # ObjectBox Entities (Patient, LabReport, BiomarkerResult)
43-
├── screens/ # UI Views (ReportDetails, Home, Dashboard)
44-
├── services/ # Core Logic (PdfExtractor, LabParser, StoreOrchestrator)
45-
└── main.dart # App Entry Point & Navigation
46-
assets/
47-
└── data/ # Local JSON dictionaries mapping lab terminology
48-
```
81+
├── models/ # ObjectBox entities (Patient, LabReport, BiomarkerResult, ChatMessage)
82+
├── screens/ # UI (Dashboard, Reports, Chat, Settings, BiomarkerDetail, Onboarding)
83+
├── services/ # Core logic (PDF extraction, parsing, AI, embeddings, vector search, FHIR)
84+
├── widgets/ # Reusable components (trend chart, gauge, flag badge, chat bubble)
85+
└── main.dart # App entry, theme, navigation
4986
50-
## Current App Flow
87+
assets/data/ # Biomarker dictionary (63 definitions, 10 categories)
88+
```
5189

52-
- `Dashboard`: shows the latest biomarker snapshot, out-of-range markers, and category-wise summaries.
53-
- `Reports`: imports PDF lab reports, stores parsed results locally, and exports data as FHIR JSON.
54-
- `Reports`: shows staged progress while importing, including OCR fallback warnings for scanned or mixed PDFs.
55-
- `Biomarker Detail`: displays trend charts, reference ranges, and report history for a selected biomarker.
56-
- `AI Chat`: currently a placeholder for future on-device health-data chat.
90+
---
5791

5892
## Getting Started
5993

6094
### Prerequisites
61-
* Flutter SDK (Compatible with Dart `>=3.9.2`)
62-
* Code Editor (VS Code / Android Studio)
95+
- Flutter SDK (Dart >=3.9.2)
96+
- Android device or emulator (API 26+)
6397

6498
### Installation
6599

66-
1. Clone the repository:
67100
```bash
68101
git clone https://github.com/priyavratuniyal/koshika.git
69102
cd koshika
70-
```
71-
72-
2. Fetch Flutter packages:
73-
```bash
74103
flutter pub get
75-
```
76-
77-
3. Generate the ObjectBox bindings:
78-
```bash
79104
dart run build_runner build --delete-conflicting-outputs
80-
```
81-
82-
4. Run the app:
83-
```bash
84105
flutter run
85106
```
86107

87-
### Optional: Auto-format on commit
108+
### Auto-format on commit (optional)
88109

89-
This repo includes a Husky `pre-commit` hook that auto-formats staged Dart files before the commit is created.
110+
This repo uses a Husky pre-commit hook that auto-formats staged Dart files:
90111

91-
If you're cloning fresh, install the hook setup with:
92112
```bash
93113
npm install
94114
```
95115

96-
## Current Limitations
116+
---
117+
118+
## Roadmap
119+
120+
- [x] PDF parsing with multi-pattern regex engine
121+
- [x] OCR fallback for scanned reports
122+
- [x] Fuzzy biomarker matching (63 definitions, 10 categories)
123+
- [x] Dashboard with health overview and trend indicators
124+
- [x] Biomarker detail view with trend charts and reference gauges
125+
- [x] Borderline detection (10% margin flagging)
126+
- [x] FHIR R4 export
127+
- [x] On-device LLM (Gemma 3 1B IT via MediaPipe)
128+
- [x] Semantic search with EmbeddingGemma 300M
129+
- [x] RAG pipeline with citation-backed responses
130+
- [x] Animated splash screen and onboarding flow
131+
- [ ] Conversation memory (persistent chat sessions)
132+
- [ ] Health Connect wearable integration
133+
- [ ] Computed health risk scores (FIB-4, eGFR, APRI)
134+
- [ ] Biomarker anomaly detection (EWMA, personal baselines)
135+
- [ ] Nutritional and lifestyle recommendations
136+
- [ ] LLM-assisted PDF extraction fallback
137+
- [ ] Encrypted local storage with biometric lock
138+
- [ ] Multi-patient profile support
139+
140+
---
141+
142+
## Known Limitations
97143

98-
- OCR support for scanned or photographed PDF reports is experimental and still needs validation against more real-world lab formats.
99-
- Web is planned, but the current implementation is focused on local mobile workflows.
100-
- The chat assistant is not implemented yet.
144+
- OCR for scanned reports is experimental — accuracy varies across lab formats
145+
- Web platform is planned but not yet implemented
146+
- The on-device LLM (1B parameters) is best suited for simple explanations; complex medical reasoning has limits inherent to the model size
147+
148+
---
101149

102150
## Contributing
103-
Pull requests are welcome! If you find a lab report format that our parser fails to scrape, please consider opening an issue or contributing a regex fallback.
104151

105-
When contributing, please keep each commit focused on one logical change and use clear conventional commit messages such as `feat:`, `fix:`, `chore:`, or `refactor:`.
152+
Pull requests are welcome.
153+
154+
If you find a lab report format that the parser fails to handle, please open an issue with a redacted sample (remove personal information) or contribute a regex pattern.
155+
156+
When contributing, keep each commit focused on one logical change and use conventional commit messages (`feat:`, `fix:`, `chore:`, `refactor:`).
157+
158+
---
106159

107160
## License
108-
This project is open-source and available under standard open source provisions.
161+
162+
[MIT](LICENSE)

0 commit comments

Comments
 (0)