🏥 ClaimPilot: AI-Powered Medical Insurance Claim Processor ClaimPilot is a FastAPI-based backend system that automates the processing of health insurance claim PDFs. It uses LLM-powered agents to classify document types (e.g., bill, discharge summary), extract structured data, validate document presence, and return a claim approval/rejection decision.
🚀 Overview This project was built as part of a multi-agent AI assignment, targeting real-world challenges in healthcare claim automation. It accepts one or more unstructured medical PDFs and returns structured output via an API.
🧠 Architecture & Logic
Framework: FastAPI (asynchronous backend server) PDF Text Extraction: [PyMuPDF (fitz)] for extracting full document text LLM Inference: Groq API using llama3-70b-8192 model Agents: BillAgent: Extracts hospital_name, total_amount, date_of_service DischargeAgent: Extracts patient_name, diagnosis, admission_date, discharge_date
Pipeline Flow: Accept PDFs via /process-claim endpoint Extract raw text using PyMuPDF Use Groq LLM to detect present document types (bill, discharge_summary, id_card) For each type found, invoke the corresponding agent for structured field extraction Validate presence of required docs and detect inconsistencies Return final claim decision (approve/reject) based on validation
🤖 AI Tool Usage
Groq LLaMA 3 (llama3-70b-8192):
Classification of document types using prompt engineering Extraction of structured fields from long unstructured text Handled JSON-style responses through crafted prompts with error-tolerant parsing
ChatGPT (used during development):
Helped design prompt structure for bill/discharge extraction Assisted in refining validation strategies and LLM output handling
📝 Prompt Examples 📄 Document Type Detection Given the following medical document text, identify which types are present:
- bill
- discharge_summary
- id_card
Return a JSON array like: ["bill", "discharge_summary"]
Text: {doc_text[:8000]}
💰 Bill Information Extraction You are a medical billing assistant. From the following text, extract:
- hospital_name
- total_amount (as number)
- date_of_service (YYYY-MM-DD)
Respond ONLY with JSON like: { "hospital_name": "...", "total_amount": 12345, "date_of_service": "2024-04-01" }
Text: {text[:2500]}
🏥 Discharge Summary Extraction You are a medical assistant. From the discharge summary below, extract:
- patient_name
- diagnosis
- admission_date
- discharge_date
Return: { "type": "discharge_summary", "patient_name": "...", "diagnosis": "...", "admission_date": "YYYY-MM-DD", "discharge_date": "YYYY-MM-DD" }
Text: {text[:2500]}
🔍 Validation Logic
Ensures all 3 types (bill, discharge_summary, id_card) are present Checks consistency of patient names across documents Flags missing admission/discharge dates or fields Rejects incomplete or mismatched claims
⚙️ Setup Instructions
git clone
pip install -r requirements.txt
echo "GROQ_API_KEY=your-key-here" > .env
uvicorn main:app --reload
Sample cURL:
curl -X POST "http://localhost:8000/process-claim"
-F "files=@path/to/document.pdf"
📦 Output Example { "processed_documents": [ { "type": "bill", "extracted": { "hospital_name": "ABC Hospital", "total_amount": 12500, "date_of_service": "2024-04-10" } }, { "type": "discharge_summary", "extracted": { "patient_name": "John Doe", "diagnosis": "Fracture", "admission_date": "2024-04-01", "discharge_date": "2024-04-10" } } ], "validation": { "missing_documents": [], "discrepancies": [] }, "claim_decision": { "status": "approved", "reason": "All required documents present and data is consistent" } }
LLM limitations: Long PDFs might get truncated before the discharge summary is processed. Token limits: Extraction agents process only the first ~2500 characters due to Groq API constraints. No image/OCR parsing: Assumes PDFs are text-based and not scanned image PDFs. ID card extraction: Not yet implemented.
🔮 Future Improvements
Add multi-page chunking with LangGraph or CrewAI Integrate OCR for scanned medical documents Enhance ID card parsing Add PDF preview and UI layer for hospital staff
🧠 Technologies Used FastAPI, Groq API (LLaMA3-70b), PyMuPDF, Regex, Async/Await, LLM Prompt Engineering, JSON Extraction, Modular Agent Design