Skip to content

ThandoSomacele/covercheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

43 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ‡ΏπŸ‡¦ CoverCheck - Medical Aid Assistant

Your South African medical aid questions answered with accurate, up-to-date information.

A modern AI-powered medical aid assistant that helps South Africans understand and compare medical aid plans from the top 3 providers. Built with quality data, semantic search, and RAG (Retrieval-Augmented Generation).

🎯 Project Status

βœ… Phase 3 Complete: Production-ready medical aid assistant with modern UI

  • βœ… Phase 1: High-quality scraping from top 3 providers (23/26 documents, 88% quality)
  • βœ… Phase 2: PostgreSQL + pgvector database, semantic search, cloud LLM integration
  • βœ… Phase 3: Public platform with provider selector, enhanced citations, responsive design
  • πŸ”„ Phase 4 (Current): Deployment preparation and optimization

The application is functional and ready for testing!

✨ Features

Current Features

  • πŸ₯ Top 3 SA Providers - Discovery Health, Bonitas Medical Fund, Momentum Health
  • πŸ’¬ AI-Powered Chat - Ask questions in natural language, get accurate answers
  • πŸ” Semantic Search - Query expansion and intent detection for better results
  • πŸ“š Source Citations - Every answer backed by official documentation with links
  • 🎨 Modern UI - Next.js/Vercel-inspired design with dark mode
  • πŸ“± Responsive - Optimized for desktop, tablet, and mobile
  • 🎯 Provider Filter - Search across all providers or focus on one
  • ⚑ Real-time Streaming - Answers stream in as they're generated
  • πŸ‡ΏπŸ‡¦ SA Context - Rands (R), SA English spelling, and medical aid terminology

Technical Features

  • πŸ—„οΈ Vector Database - PostgreSQL with pgvector for semantic search
  • πŸ€– RAG System - Retrieval-Augmented Generation with cloud LLMs
  • πŸ”„ Fallback Strategy - Multiple LLM models (Gemini, Llama, Mistral)
  • 🎭 Smart Re-ranking - Intent-based result boosting
  • πŸ“Š Query Expansion - Automatic synonym and related term expansion

πŸš€ Quick Start

Prerequisites

  • Node.js 18+
  • npm or yarn
  • PostgreSQL with pgvector extension (for Phase 2+)
  • Ollama (for local embeddings)

Installation

# Clone the repository
git clone https://github.com/ThandoSomacele/covercheck.git
cd covercheck

# Install dependencies
npm install

# Set up environment variables
cp .env.example .env
# Edit .env and add your API keys and database connection string

Environment Variables

Create a .env file in the project root with the following variables:

# Database Configuration
DB_CONNECTION_STRING=postgresql://user:password@host:port/database

# OpenRouter API Configuration
OPENROUTER_API_KEY=sk-or-v1-your_api_key_here

⚠️ SECURITY WARNING:

  • NEVER commit your .env file to git
  • NEVER hardcode API keys or secrets in source code
  • The .env file is already in .gitignore for your protection
  • Use .env.example as a template (it contains no real secrets)

Running the Application

# Start the development server
npm run dev

# Open http://localhost:5173 in your browser

Database Setup

# Set up the database schema
psql $DB_CONNECTION_STRING -f scripts/db-setup.sql

# Load scraped documents into the database
npx tsx scripts/load-documents.ts

# Verify data loaded correctly
npx tsx scripts/check-db-stats.ts

# Optimize database performance
psql $DB_CONNECTION_STRING -f scripts/optimize-db.sql

Scraping (Optional)

# Scrape all 3 providers
npm run scrape

# Scrape individual providers
npm run scrape:discovery
npm run scrape:bonitas
npm run scrape:momentum

πŸ“ Project Structure

covercheck/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ lib/
β”‚   β”‚   β”œβ”€β”€ insurance/              # Static insurance data
β”‚   β”‚   β”‚   β”œβ”€β”€ documents-sa.ts     # SA medical aid documents
β”‚   β”‚   β”‚   └── insurance-*-glossary.ts
β”‚   β”‚   └── server/
β”‚   β”‚       β”œβ”€β”€ rag.ts              # RAG logic (future)
β”‚   β”‚       └── scrapers/           # Web scraping system
β”‚   β”‚           β”œβ”€β”€ BaseScraper.ts
β”‚   β”‚           β”œβ”€β”€ DiscoveryHealthScraper.ts
β”‚   β”‚           β”œβ”€β”€ BonitasScraper.ts
β”‚   β”‚           β”œβ”€β”€ MomentumHealthScraper.ts
β”‚   β”‚           └── ScraperOrchestrator.ts
β”‚   └── routes/
β”‚       β”œβ”€β”€ +page.svelte            # Chat UI (future)
β”‚       └── api/chat/+server.ts     # API endpoint (future)
β”œβ”€β”€ scripts/                         # Utility scripts
β”‚   β”œβ”€β”€ scrape.ts                   # Main scraping CLI
β”‚   β”œβ”€β”€ validate-content.ts         # Quality validation
β”‚   └── analyze-scraped-data.ts     # Data analysis
β”œβ”€β”€ docs/                            # Complete documentation
β”‚   β”œβ”€β”€ README.md                   # Documentation index
β”‚   β”œβ”€β”€ SCRAPING.md                 # Scraping system guide
β”‚   β”œβ”€β”€ VERIFIED_URLS.md            # Verified provider URLs
β”‚   └── SCRAPER_FIX_PLAN.md         # Quality improvement process
β”œβ”€β”€ scraped-data/                    # JSON output (gitignored)
β”œβ”€β”€ legacy/                          # Old implementations
└── COVERCHECK_ROADMAP.md           # Development roadmap

πŸ“Š Data Quality

Current Scrape Results

Provider Documents Quality Rate Avg. Content
Discovery Health 13/14 93% 10,000 chars
Momentum Health 9/10 90% 10,500 chars
Bonitas Medical Fund 1/2 50% 193,754 chars
Total 23/26 88% ~10,000 chars

Content Coverage

βœ… Plan Information

  • All major plans from each provider
  • Plan benefits and exclusions
  • Coverage details

βœ… Benefits Documentation

  • Hospital benefits
  • Day-to-day benefits
  • Chronic illness benefits

βœ… Support Information

  • Claims processes
  • Comparison tools
  • Contact information

πŸ› οΈ Development Roadmap

βœ… Phase 1: Data Scraping (Complete)

  • Research and verify provider URLs
  • Build scraping system with Playwright
  • Implement quality validation
  • Scrape top 3 SA providers
  • Achieve 20+ quality documents

βœ… Phase 2: Database & RAG (Complete)

  • Set up PostgreSQL + pgvector
  • Design database schema
  • Process and chunk documents
  • Generate embeddings with Ollama
  • Implement semantic search with query expansion
  • Build RAG pipeline with cloud LLMs
  • Implement streaming responses
  • Add source citations

βœ… Phase 3: Public Platform (Complete)

  • Modern SvelteKit UI with dark mode
  • Provider selector component
  • Enhanced citation display with relevance scores
  • Responsive design for mobile/tablet
  • CoverCheck branding and logo

πŸ”„ Phase 4: Deployment Preparation (Current)

  • Production environment configuration
  • Database optimization and indexes
  • Error handling improvements
  • Deployment documentation
  • Analytics tracking
  • Final testing and QA

πŸ“… Phase 5: Production Launch (Next)

  • Deploy to Vercel + Railway/Supabase
  • Monitor performance and errors
  • Collect user feedback
  • Content update automation

See CLAUDE.md and DEPLOYMENT.md for detailed information.

πŸ“š Documentation

All documentation is in the docs/ directory:

πŸ§ͺ Technology Stack

Current:

  • SvelteKit - Full-stack framework
  • TypeScript - Type safety
  • Playwright - Web scraping
  • Cheerio - HTML parsing

Planned:

  • PostgreSQL + pgvector - Vector database
  • Ollama - Local AI model runner
  • OpenRouter - Cloud AI API (alternative)
  • Svelte 5 - Reactive UI

πŸ”§ Adding New Providers

Want to add another medical aid provider? See the Adding New Scrapers guide.

Quick overview:

  1. Create a new scraper class extending BaseScraper
  2. Define target URLs and selectors
  3. Register in ScraperOrchestrator
  4. Test and validate quality

🀝 Contributing

This is a learning project documenting the journey of building a production RAG system. Contributions, suggestions, and feedback are welcome!

πŸ“ License

See LICENSE file for details.

πŸ™ Acknowledgments

Built with:

Data sourced from:


Made with ❀️ for South Africans who deserve simple, accurate medical aid information.

Current Version: Phase 3 Complete (Production-Ready) Last Updated: 2025-11-20

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors