Skip to content

Leveraging on vision-language LLM to help users to understand about their medical report, advice on lifestyle changes , etc

Notifications You must be signed in to change notification settings

Jeanetted3v/Medical-Report-Copilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Medical-Report-Copilot

Leveraging on vision-language LLM to help users to understand about their medical report, advice on lifestyle changes , etc

Project Motivation

  1. Many seniors struggle to understand their medical reports due to complex terminology. As a result, they often turn to Google, ChatGPT, or their children for help—frequently receiving inconsistent or unclear information. This confusion can lead to skepticism toward doctors' diagnoses and treatment plans.

This project aims to build a GenAI-powered chatbot that helps seniors interpret their medical reports in clear, simple language. By offering trustworthy, conversational explanations, the tool empowers them to better understand their health and make informed decisions with greater confidence.

  1. Also using this as a playground to test various extraction techniques and tools

Proposed Solution Architecture

Architecture Diagram

This app takes in pdf of medical reports including medical images, checks if it is an image problem or merely a text problem. If it is an image problem, it'll convert pdf to one image per page, employs a multimodal LLM to 'read' the contents and provide interpretations and recommendations. If it is a mere text problem, it'll use a PDF parser to parse out text before using LLM to provide interpretations and recommendations

Features

  1. Deployed MedGemma 4b (multimodal) via Google VertexAI vLLM. More about MedGemma
  2. Configurable between MedGemma, OpenAI, AzureOpenAI
  3. Stores text, images, interpretations in database, convert to embeddings for future retrieval upon request
  4. Abstracted interpretation as "Memory" (To develop)

Tech Stack

  • Agentic framework: Pydantic AI. (Previously using LiteLLM but find Pydantic AI simplier and more flexible)
  • LLM Model: AzureOpenAI, OpenAI, MedGemma
  • Database and vector database: PostgreSQL, PGVector
  • Observility for agentic calls: TraceLoop

Documenting progress and findings

  • 16June2025: testing pipleine using the eye medical report. It seems to be scanned pdf, which pdfplumber isn't able to handle. Need to use OCR or LLM's imageURL instead.
  • 17June2025: Only managed to test Tesseract (OCR). MinerU and Docling couldn't work. Might be due to my local machine.
  • 18June2025:
    • Realized that this project might not need multi-layer memory. Decide to fall back to PSQL for simpler memory storage.
    • Came across NanonetsOCR which uses Qwen2.5-VL-3B. Tested it out on hugging face and works well with my existing data. Might consider replacing tesseract with this. To further test it for latency.
    • When parsing the whole pdf as image to LLM, it tends to 'read' the text part of the pdf even through it was instructed to only analyze the medical image. Thus I created an additional step to extract out the medical image before analysing/interpreting it. Proven effective.
  • 19June2025:
    • Tested Nanonets OCR via huggingface and the extraction of tables was good. Having challenge running it on local and on colab though.
    • Image extraction works ok though some parts of the text/table are still included in the cropped image. Tweaked prompts to get better extraction.

Reference & Thoughts

About

Leveraging on vision-language LLM to help users to understand about their medical report, advice on lifestyle changes , etc

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published