1. Clone the repo

🏥 ClaimPilot: AI-Powered Medical Insurance Claim Processor ClaimPilot is a FastAPI-based backend system that automates the processing of health insurance claim PDFs. It uses LLM-powered agents to classify document types (e.g., bill, discharge summary), extract structured data, validate document presence, and return a claim approval/rejection decision.

🚀 Overview This project was built as part of a multi-agent AI assignment, targeting real-world challenges in healthcare claim automation. It accepts one or more unstructured medical PDFs and returns structured output via an API.

🧠 Architecture & Logic

Framework: FastAPI (asynchronous backend server) PDF Text Extraction: [PyMuPDF (fitz)] for extracting full document text LLM Inference: Groq API using llama3-70b-8192 model Agents: BillAgent: Extracts hospital_name, total_amount, date_of_service DischargeAgent: Extracts patient_name, diagnosis, admission_date, discharge_date

Pipeline Flow: Accept PDFs via /process-claim endpoint Extract raw text using PyMuPDF Use Groq LLM to detect present document types (bill, discharge_summary, id_card) For each type found, invoke the corresponding agent for structured field extraction Validate presence of required docs and detect inconsistencies Return final claim decision (approve/reject) based on validation

🤖 AI Tool Usage

Groq LLaMA 3 (llama3-70b-8192):

Classification of document types using prompt engineering Extraction of structured fields from long unstructured text Handled JSON-style responses through crafted prompts with error-tolerant parsing

ChatGPT (used during development):

Helped design prompt structure for bill/discharge extraction Assisted in refining validation strategies and LLM output handling

📝 Prompt Examples 📄 Document Type Detection Given the following medical document text, identify which types are present:

bill
discharge_summary
id_card

Return a JSON array like: ["bill", "discharge_summary"]

Text: {doc_text[:8000]}

💰 Bill Information Extraction You are a medical billing assistant. From the following text, extract:

hospital_name
total_amount (as number)
date_of_service (YYYY-MM-DD)

Respond ONLY with JSON like: { "hospital_name": "...", "total_amount": 12345, "date_of_service": "2024-04-01" }

Text: {text[:2500]}

🏥 Discharge Summary Extraction You are a medical assistant. From the discharge summary below, extract:

patient_name
diagnosis
admission_date
discharge_date

Return: { "type": "discharge_summary", "patient_name": "...", "diagnosis": "...", "admission_date": "YYYY-MM-DD", "discharge_date": "YYYY-MM-DD" }

Text: {text[:2500]}

🔍 Validation Logic

Ensures all 3 types (bill, discharge_summary, id_card) are present Checks consistency of patient names across documents Flags missing admission/discharge dates or fields Rejects incomplete or mismatched claims

⚙️ Setup Instructions

1. Clone the repo

git clone

2. Install dependencies

pip install -r requirements.txt

3. Add your Groq API key (via .env or environment variable)

echo "GROQ_API_KEY=your-key-here" > .env

4. Run the server

uvicorn main:app --reload

5. Test via cURL or Postman

Sample cURL: curl -X POST "http://localhost:8000/process-claim"
-F "files=@path/to/document.pdf"

📦 Output Example { "processed_documents": [ { "type": "bill", "extracted": { "hospital_name": "ABC Hospital", "total_amount": 12500, "date_of_service": "2024-04-10" } }, { "type": "discharge_summary", "extracted": { "patient_name": "John Doe", "diagnosis": "Fracture", "admission_date": "2024-04-01", "discharge_date": "2024-04-10" } } ], "validation": { "missing_documents": [], "discrepancies": [] }, "claim_decision": { "status": "approved", "reason": "All required documents present and data is consistent" } }

⚠️ Limitations

LLM limitations: Long PDFs might get truncated before the discharge summary is processed. Token limits: Extraction agents process only the first ~2500 characters due to Groq API constraints. No image/OCR parsing: Assumes PDFs are text-based and not scanned image PDFs. ID card extraction: Not yet implemented.

🔮 Future Improvements

Add multi-page chunking with LangGraph or CrewAI Integrate OCR for scanned medical documents Enhance ID card parsing Add PDF preview and UI layer for hospital staff

🧠 Technologies Used FastAPI, Groq API (LLaMA3-70b), PyMuPDF, Regex, Async/Await, LLM Prompt Engineering, JSON Extraction, Modular Agent Design

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
agents		agents
.gitignore		.gitignore
README.md		README.md
dockerfile		dockerfile
git		git
groq_llm.py		groq_llm.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1. Clone the repo

2. Install dependencies

3. Add your Groq API key (via .env or environment variable)

4. Run the server

5. Test via cURL or Postman

About

Uh oh!

Releases

Packages

Languages

JeevanChevula/ClaimPilot

Folders and files

Latest commit

History

Repository files navigation

1. Clone the repo

2. Install dependencies

3. Add your Groq API key (via .env or environment variable)

4. Run the server

5. Test via cURL or Postman

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages