This project focuses on fine-tuning large language models (LLMs) for medical O1 reasoning using Supervised Fine-Tuning (SFT). The goal is to improve model performance in answering medical reasoning questions accurately and reliably.
We have fine-tuned the following models:
- DeepSeek-R1-Distill-Llama-8B
- Llama-3-8B-bnb-4bit
- Mistral-7B-Instruct-v0.2-bnb-4bit
These models are optimized for handling complex medical reasoning tasks with improved domain-specific accuracy.
Fine-tuning is the process of training a pre-trained Large Language Model (LLM) on a specific dataset to improve its performance on a targeted task. It involves:
- Supervised Fine-Tuning (SFT): Training the model using labeled medical reasoning datasets.
- Low-Rank Adaptation (LoRA) / QLoRA: Efficient fine-tuning methods that allow large models to adapt without extensive computational resources.
- Evaluation: Measuring perplexity, accuracy, and reasoning coherence.
While general-purpose LLMs are powerful, they often lack:
- Domain-Specific Knowledge: General LLMs may not have deep understanding of medical concepts.
- Reasoning Accuracy: Without fine-tuning, responses can be vague or incorrect.
- Terminology Alignment: Medical jargon and precise wording require adaptation.
Fine-tuning ensures that the models perform well in clinical reasoning, diagnosis suggestions, and evidence-based medical responses.
- A distilled version of DeepSeek-R1, designed for efficiency.
- Optimized for reasoning tasks with reduced compute requirements.
- Supports multi-turn conversations and structured medical queries.
- Quantized 4-bit version of Llama 3 (8B) using
bitsandbytesfor efficient inference. - Provides improved reasoning with lower latency.
- Fine-tuned for medical decision support.
- Instruction-tuned variant of Mistral 7B.
- Optimized for instruction-following and multi-step reasoning.
- Performs well in explanatory medical answers and logical deductions.
- Dataset Used: FreedomIntelligence/medical-o1-reasoning-SFT. Curated medical reasoning dataset, including case studies and diagnostic Q&A.
- Training Strategy: Supervised fine-tuning using LoRA for efficiency.
- Evaluation Metrics: Perplexity, domain-specific accuracy, and human validation.
- Further fine-tuning with larger and more diverse datasets.
- Incorporating Reinforcement Learning from Human Feedback (RLHF).
- Testing performance in real-world clinical decision support.
Fine-tuning LLMs for medical reasoning improves their domain knowledge, logical coherence, and diagnostic accuracy. This project enhances LLM capabilities in medical O1 reasoning, making them more useful for healthcare applications.