Self-contained utilities for preparing NLP/VLM datasets, fine-tuning with LoRA or QLoRA, and running everything from a Typer CLI. Code lives in scripts/, datasets are stored under data/, and models/adapters are saved under models/.
scripts/— all Python modules (cli.py, config, data prep, training, etc.). Run viapython -m scripts.cli ...from the repo root.data/— prepared/tokenized datasets (data/processed/<task>by default).models/— model artifacts. Set caches here to keep base downloads local (see.env.example); fine-tuned adapters default tomodels/loraandmodels/qlora.requirements.txt— install dependencies into your venv..env.example— optional env vars for caching/GPU selection.
python -m venv .env
source .env/bin/activate
pip install -r requirements.txt(Optional) Create .env from .env.example to pin caches:
HF_HOME=./models/hf_cache
HF_DATASETS_CACHE=./data/hf_datasets
CUDA_VISIBLE_DEVICES=0- Prepare data (defaults store processed data in
data/processed/<task>)
python -m scripts.cli prepare-data \
--task question_answering \
--dataset-name squad \
--model-type llm \
--max-length 384For local files: --data-files '{"train": "train.json", "validation": "valid.json"}'
- LoRA fine-tune (adapters saved to
models/loraunless overridden)
python -m scripts.cli finetune-lora \
--task question_answering \
--model-type llm \
--train-data-dir data/processed/question_answering \
--output-dir models/lora/qa-finetune \
--num-epochs 3 \
--batch-size 4 \
--learning-rate 2e-4 \
--gradient-checkpointing # optional: saves memory, sets use_cache=False and uses non-reentrant checkpointing- QLoRA fine-tune (4-bit base, adapters saved to
models/qlora)
python -m scripts.cli finetune-qlora \
--task question_answering \
--model-type llm \
--train-data-dir data/processed/question_answering \
--output-dir models/qlora/qa-finetune \
--num-epochs 3 \
--batch-size 4 \
--learning-rate 2e-4 \
--gradient-checkpointing # optional- Default VLM:
Salesforce/blip2-flan-t5-xl. - Expect image column
image(PIL-compatible) and text prompt viatext/question/prompt. Labels viaanswer/caption/output. - Use
--model-type vlm; QLoRA loads 4-bit automatically. Set caches tomodels/hf_cachein.envto keep downloads insidemodels/.
- Models: Seq2Seq LLMs via
AutoModelForSeq2SeqLM; VLMs viaAutoModelForVision2Seq+AutoProcessor; optional 4-bit withBitsAndBytesConfig. - Data Prep: Task-specific tokenization; auto train/validation split if missing; saved to
data/processed/<task>. - Training:
Seq2SeqTrainer+DataCollatorForSeq2Seq; LoRA viaget_peft_model; QLoRA usesprepare_model_for_kbit_trainingfirst. Adapters/tokenizers saved inmodels/under chosen output dir. - Reproducibility:
set_seedseeds Python/NumPy/Torch; CLI loads.envautomatically.
- For CPU-only use, avoid
--model-type vlmor--finetune-qloraunless you have the RAM; bitsandbytes needs GPU. - If CUDA OOM, lower
--batch-size, raise gradient accumulation, or shorten--max-lengthduringprepare-data. - Keep datasets and caches tidy:
data/for datasets,models/for all model weights/adapters and HF cache.