Skip to content

JeevanChevula/ClaimPilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏥 ClaimPilot: AI-Powered Medical Insurance Claim Processor ClaimPilot is a FastAPI-based backend system that automates the processing of health insurance claim PDFs. It uses LLM-powered agents to classify document types (e.g., bill, discharge summary), extract structured data, validate document presence, and return a claim approval/rejection decision.

🚀 Overview This project was built as part of a multi-agent AI assignment, targeting real-world challenges in healthcare claim automation. It accepts one or more unstructured medical PDFs and returns structured output via an API.

🧠 Architecture & Logic

Framework: FastAPI (asynchronous backend server) PDF Text Extraction: [PyMuPDF (fitz)] for extracting full document text LLM Inference: Groq API using llama3-70b-8192 model Agents: BillAgent: Extracts hospital_name, total_amount, date_of_service DischargeAgent: Extracts patient_name, diagnosis, admission_date, discharge_date

Pipeline Flow: Accept PDFs via /process-claim endpoint Extract raw text using PyMuPDF Use Groq LLM to detect present document types (bill, discharge_summary, id_card) For each type found, invoke the corresponding agent for structured field extraction Validate presence of required docs and detect inconsistencies Return final claim decision (approve/reject) based on validation

🤖 AI Tool Usage

Groq LLaMA 3 (llama3-70b-8192):

Classification of document types using prompt engineering Extraction of structured fields from long unstructured text Handled JSON-style responses through crafted prompts with error-tolerant parsing

ChatGPT (used during development):

Helped design prompt structure for bill/discharge extraction Assisted in refining validation strategies and LLM output handling

📝 Prompt Examples 📄 Document Type Detection Given the following medical document text, identify which types are present:

  • bill
  • discharge_summary
  • id_card

Return a JSON array like: ["bill", "discharge_summary"]

Text: {doc_text[:8000]}

💰 Bill Information Extraction You are a medical billing assistant. From the following text, extract:

  • hospital_name
  • total_amount (as number)
  • date_of_service (YYYY-MM-DD)

Respond ONLY with JSON like: { "hospital_name": "...", "total_amount": 12345, "date_of_service": "2024-04-01" }

Text: {text[:2500]}

🏥 Discharge Summary Extraction You are a medical assistant. From the discharge summary below, extract:

  • patient_name
  • diagnosis
  • admission_date
  • discharge_date

Return: { "type": "discharge_summary", "patient_name": "...", "diagnosis": "...", "admission_date": "YYYY-MM-DD", "discharge_date": "YYYY-MM-DD" }

Text: {text[:2500]}

🔍 Validation Logic

Ensures all 3 types (bill, discharge_summary, id_card) are present Checks consistency of patient names across documents Flags missing admission/discharge dates or fields Rejects incomplete or mismatched claims

⚙️ Setup Instructions

1. Clone the repo

git clone

2. Install dependencies

pip install -r requirements.txt

3. Add your Groq API key (via .env or environment variable)

echo "GROQ_API_KEY=your-key-here" > .env

4. Run the server

uvicorn main:app --reload

5. Test via cURL or Postman

Sample cURL: curl -X POST "http://localhost:8000/process-claim"
-F "files=@path/to/document.pdf"

📦 Output Example { "processed_documents": [ { "type": "bill", "extracted": { "hospital_name": "ABC Hospital", "total_amount": 12500, "date_of_service": "2024-04-10" } }, { "type": "discharge_summary", "extracted": { "patient_name": "John Doe", "diagnosis": "Fracture", "admission_date": "2024-04-01", "discharge_date": "2024-04-10" } } ], "validation": { "missing_documents": [], "discrepancies": [] }, "claim_decision": { "status": "approved", "reason": "All required documents present and data is consistent" } }

⚠️ Limitations

LLM limitations: Long PDFs might get truncated before the discharge summary is processed. Token limits: Extraction agents process only the first ~2500 characters due to Groq API constraints. No image/OCR parsing: Assumes PDFs are text-based and not scanned image PDFs. ID card extraction: Not yet implemented.

🔮 Future Improvements

Add multi-page chunking with LangGraph or CrewAI Integrate OCR for scanned medical documents Enhance ID card parsing Add PDF preview and UI layer for hospital staff

🧠 Technologies Used FastAPI, Groq API (LLaMA3-70b), PyMuPDF, Regex, Async/Await, LLM Prompt Engineering, JSON Extraction, Modular Agent Design

About

AI-powered backend for medical insurance claim processing using Groq LLaMA-3 and FastAPI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages