LLM Fine-Tuning on Medical Reasoning Dataset

Overview

This project focuses on fine-tuning large language models (LLMs) for medical O1 reasoning using Supervised Fine-Tuning (SFT). The goal is to improve model performance in answering medical reasoning questions accurately and reliably.

Fine-Tuned Models

We have fine-tuned the following models:

DeepSeek-R1-Distill-Llama-8B
Llama-3-8B-bnb-4bit
Mistral-7B-Instruct-v0.2-bnb-4bit

These models are optimized for handling complex medical reasoning tasks with improved domain-specific accuracy.

What is LLM Fine-Tuning?

Fine-tuning is the process of training a pre-trained Large Language Model (LLM) on a specific dataset to improve its performance on a targeted task. It involves:

Supervised Fine-Tuning (SFT): Training the model using labeled medical reasoning datasets.
Low-Rank Adaptation (LoRA) / QLoRA: Efficient fine-tuning methods that allow large models to adapt without extensive computational resources.
Evaluation: Measuring perplexity, accuracy, and reasoning coherence.

Why Fine-Tuning is Necessary?

While general-purpose LLMs are powerful, they often lack:

Domain-Specific Knowledge: General LLMs may not have deep understanding of medical concepts.
Reasoning Accuracy: Without fine-tuning, responses can be vague or incorrect.
Terminology Alignment: Medical jargon and precise wording require adaptation.

Fine-tuning ensures that the models perform well in clinical reasoning, diagnosis suggestions, and evidence-based medical responses.

Model Details

1. DeepSeek-R1-Distill-Llama-8B

A distilled version of DeepSeek-R1, designed for efficiency.
Optimized for reasoning tasks with reduced compute requirements.
Supports multi-turn conversations and structured medical queries.

2. Llama-3-8B-bnb-4bit

Quantized 4-bit version of Llama 3 (8B) using bitsandbytes for efficient inference.
Provides improved reasoning with lower latency.
Fine-tuned for medical decision support.

3. Mistral-7B-Instruct-v0.2-bnb-4bit

Instruction-tuned variant of Mistral 7B.
Optimized for instruction-following and multi-step reasoning.
Performs well in explanatory medical answers and logical deductions.

Dataset & Training Approach

Dataset Used: FreedomIntelligence/medical-o1-reasoning-SFT. Curated medical reasoning dataset, including case studies and diagnostic Q&A.
Training Strategy: Supervised fine-tuning using LoRA for efficiency.
Evaluation Metrics: Perplexity, domain-specific accuracy, and human validation.

Future Work

Further fine-tuning with larger and more diverse datasets.
Incorporating Reinforcement Learning from Human Feedback (RLHF).
Testing performance in real-world clinical decision support.

Conclusion

Fine-tuning LLMs for medical reasoning improves their domain knowledge, logical coherence, and diagnostic accuracy. This project enhances LLM capabilities in medical O1 reasoning, making them more useful for healthcare applications.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
DeepSeek_R1_Medical_CoT_FT.ipynb		DeepSeek_R1_Medical_CoT_FT.ipynb
Llama3-8B_Medical_CoT_FT.ipynb		Llama3-8B_Medical_CoT_FT.ipynb
README.md		README.md
mistral-7b-instruct-v0.2_CoT_FT.ipynb		mistral-7b-instruct-v0.2_CoT_FT.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Fine-Tuning on Medical Reasoning Dataset

Overview

Fine-Tuned Models

What is LLM Fine-Tuning?

Why Fine-Tuning is Necessary?

Model Details

1. DeepSeek-R1-Distill-Llama-8B

2. Llama-3-8B-bnb-4bit

3. Mistral-7B-Instruct-v0.2-bnb-4bit

Dataset & Training Approach

Future Work

Conclusion

About

Uh oh!

Releases

Packages

Languages

renaldiangsar/Medical-LLM-Fine-Tuning

Folders and files

Latest commit

History

Repository files navigation

LLM Fine-Tuning on Medical Reasoning Dataset

Overview

Fine-Tuned Models

What is LLM Fine-Tuning?

Why Fine-Tuning is Necessary?

Model Details

1. DeepSeek-R1-Distill-Llama-8B

2. Llama-3-8B-bnb-4bit

3. Mistral-7B-Instruct-v0.2-bnb-4bit

Dataset & Training Approach

Future Work

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages