A Fine tunning + RAG pipline for Medical QA using the MedQA dataset
The Medical QA pipelines. A: The SLM Fine-Tuning pipeline. The SLM is fine-tuned first, then the doctor prompt the fine-tuned LLM with a medical question. B: The document- based RAG pipeline. The doctor prompts the non-fine-tuned LLM, meanwhile, the relevant information for the given ques- tion is retrieved from a knowledge base and augmented to the doctor prompt
- Read the detailed report here.
conda create --name MedQA -c conda-forge python=3.11
conda activate MedQA
pip install -r requirements.txtor
- Train your model using the command
python fine-tune.py --model-name=unsloth/Qwen3-0.6B- Serve the fine-tuned model using llama-cpp server
llama-server -m <your_model.gguf> --host <machine_ip> --port <machine_port> -ngl < num_layers_to_offload_into_gpu>or
- Build the faiss index using the command
python RAG.py --build_index=true- Host you model in an LLM inference engine such as ollama or llama-cpp server
- list of Qwen3 Model can be found here., medllama3-v20,medgemma,
- list of fine-tuned models can be found here.
- Test the fine-tuned model QA:
python MedQA.py --model_name=Qwen3-8B --inference_api=<llama-sever-api-url>--test_size=100- Test the RAG QA:
- Extract the zip files [data/docs_emb_qwen3.pkl.zip, data/MedQA_en_documents.pkl.zip] if did not build the faiss DB
python RAG.py --build_index=false --model_name=Qwen3-8B --inference_api=<llama-sever-api-url> --test_size=100