This repository provides a complete pipeline for fine-tuning the LLaMA-3 8B model with LoRA,
combined with retrieval-augmented generation (RAG) and evaluation tools.
The fine-tuned model derived from this workflow is referred to as WoodLLaMA.
- Fine-tuning: LoRA fine-tuning on LLaMA-3 8B with 4-bit quantization
- RAG: FAISS-based dense retrieval with re-ranking and context construction
- Evaluation:
- Semantic similarity (Cosine Similarity, BERTScore)
- Keyword coverage
- Perplexity
Clone the repository and install the required dependencies:
git clone https://github.com/[your-username]/llama3-finetune-rag.git
cd llama3-finetune-rag
pip install -r requirements.txtNote
Fine-tuning requires Ubuntu/Linux.
RAG and evaluation can be run on Windows or Linux.
llama3-finetune-rag/
├─ notebooks/
│ ├─ finetune_llama3_lora.ipynb # Fine-tuning workflow (LoRA, 4-bit)
│ ├─ rag_and_evaluation.ipynb # RAG pipeline and evaluation
│
├─ src/
│ ├─ config.py
│ ├─ data.py
│ ├─ gpu_utils.py
│ ├─ lora_setup.py
│ ├─ retriever.py
│ ├─ generator.py
│ ├─ eval_text.py
│ ├─ io_utils.py
│ └─ __init__.py
│
├─ requirements.txt
├─ LICENSE
└─ README.md
- Prepare your dataset in JSONL format
- Open
notebooks/finetune_llama3_lora.ipynb - Adjust dataset paths and hyperparameters in
src/config.py - Run the notebook cells sequentially to fine-tune the model
- The resulting fine-tuned model is referred to as WoodLLaMA
- Place your reference documents (CSV with Title, Abstract, Keywords, Authors, Year, DOI) in the project directory
- Open
notebooks/rag_and_evaluation.ipynb - Run the notebook to:
- Build FAISS index
- Perform retrieval and generation
- Evaluate responses with multiple metrics
- Python 3.10+
- PyTorch with CUDA (GPU required for fine-tuning and evaluation)
- Hugging Face Transformers, PEFT, Datasets
- FAISS, Sentence-Transformers, BERTScore
- Other dependencies are listed in
requirements.txt
This project is licensed under the MIT License – see the LICENSE file for details.
This repository accompanies ongoing research on domain-specific language modeling in wood science.
A corresponding paper is in preparation for submission.
Please check back for citation details once it is published.