Skip to content

VandanaJn/ng12-risk-assessor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NG12 Cancer Risk Assessor & Chatbot

A Clinical Decision Support System and Conversational Agent powered by Google Vertex AI (Gemini 1.5) and RAG using the NICE NG12 Guidelines.

Features

  • Risk Assessment: Evaluates patient symptoms against NG12 guidelines to determine referral urgency.
  • Evidence-Based: Uses a RAG pipeline to retrieve and cite specific sections of the NG12 PDF.
  • Conversational Interface: Chat with the guidelines to ask follow-up questions.
  • Modular Architecture: FastAPI backend, ChromaDB vector store, and a clean Vector-based frontend.

Prerequisites

  • Python 3.11+
  • Google Cloud Project with Vertex AI enabled.
  • Valid GOOGLE_APPLICATION_CREDENTIALS (or gcloud auth application-default login).

Setup

  1. Environment Setup (Windows): I have created a virtual environment for you. Activate it or run commands via the path:

    .\venv\Scripts\activate
  2. Google Cloud Auth: Authenticate with your specific Google Cloud project.

    gcloud auth application-default login --project <YOUR_PROJECT_ID>
  3. Data Ingestion: Run the ingestion script using the virtual environment python:

    .\venv\Scripts\python -m app.services.ingestion_service

    Note: Ensure your project has the Vertex AI API enabled.

  4. Run the Application:

    .\venv\Scripts\uvicorn app.main:app --reload
  5. Access the UI: Open http://localhost:8000 in your browser.

Docker Build

docker build -t ng12-assessor .
docker run -p 8080:8080 -e GOOGLE_APPLICATION_CREDENTIALS=/path/to/creds.json ng12-assessor

Project Structure

  • app/api: FastAPI routes.
  • app/services: Business logic (Agent, RAG, Patient data).
  • app/data: Local storage for PDF and Vector DB.
  • app/static: Frontend HTML/JS.

Architectural Decisions & Tradeoffs

1. Model: Gemini 2.0 Flash (Experimental)

  • Choice: Switched from Gemini 1.5 Pro to Gemini 2.0 Flash.
  • Reason: 2.0 Flash offers extremely low latency and a massive context window (1M+ tokens), making it ideal for interactive chat and processing large guidelines.
  • Tradeoff: slightly less "deep reasoning" capability than the Ultra/Opus class models, but for guideline retrieval, speed and context retrieval are more important.

2. Backend: FastAPI (Async)

  • Choice: Built on FastAPI with uvicorn.
  • Reason: LLM and RAG operations are I/O bound. FastAPI's native async/await support allows handling multiple concurrent chat requests without blocking, unlike Flask.
  • Tradeoff: Slightly more boilerplate than Flask, but essential for scalable AI apps.

3. Vector Store: ChromaDB (Local)

  • Choice: Used ChromaDB with local file persistence.
  • Reason: "Batteries-included" solution that requires no external infrastructure or API keys (unlike Pinecone), making the project easy to clone and run.
  • Tradeoff: Not suitable for production scaling to millions of documents. For production, we would migrate to Vertex AI Vector Search.

4. RAG Implementation: Query Rewriting

  • Choice: Implemented a "Condense Question" step where the LLM rewrites user queries based on history (e.g., "And for lung?" -> "What are the referral criteria for lung cancer?").
  • Reason: Essential for multi-turn chat. Without it, RAG fails on follow-up questions that lack explicit keywords.
  • Tradeoff: Adds a small latency overhead (one extra LLM call per turn), but drastically improves answer quality.

5. Ingestion: Batched Processing

  • Choice: Implemented manual batching (100 items/batch).
  • Reason: Vertex AI Embedding API has a hard limit of 250 instances per request.
  • Tradeoff: Code complexity vs. API reliability.

6. Embedding Model: text-embedding-004

  • Choice: Used text-embedding-004.
  • Reason: Latest stable embedding model offering improved semantic representations compared to older gecko models.
  • Tradeoff: Specific regional availability (us-central1), requiring explicit location configuration.

7. Chunk Size: 500 Characters

  • Choice: Split PDF into 500 character chunks (with 200 overlap).
  • Reason: Smaller chunks provide more precise context retrieval for specific medical criteria, reducing noise in the LLM prompt.
  • Tradeoff: Risk of splitting a long sentence or list across chunks, handled partially by the 200-character overlap.

About

ng12-risk-assessor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published