A Brazilian Portuguese Legal Preference Dataset and RLHF Fine-Tuning Pipeline
LexPref-PTBR is an end-to-end RLHF pipeline focused on Brazilian consumer law reasoning. It covers synthetic preference pair generation, reward model training, and DPO fine-tuning on a small open-source LLM, with full experiment tracking via Weights & Biases.
Brazilian Portuguese is underrepresented in legal AI benchmarks. This project addresses that gap by combining domain-specific annotation expertise with a reproducible fine-tuning pipeline.
- Phase 1: Environment setup and library familiarization (
datasets,transformers,peft,trl) - Phase 2: PT-BR legal preference dataset construction with IAA simulation
- Phase 3: Reward model training + DPO fine-tuning with W&B logging
🔨 Active development — Phase 1 in progress
Fabio De Pinho | LLM Training Data Specialist