Skip to content

Latest commit

 

History

History
20 lines (14 loc) · 967 Bytes

File metadata and controls

20 lines (14 loc) · 967 Bytes

LexPref-PTBR

A Brazilian Portuguese Legal Preference Dataset and RLHF Fine-Tuning Pipeline

Overview

LexPref-PTBR is an end-to-end RLHF pipeline focused on Brazilian consumer law reasoning. It covers synthetic preference pair generation, reward model training, and DPO fine-tuning on a small open-source LLM, with full experiment tracking via Weights & Biases.

Motivation

Brazilian Portuguese is underrepresented in legal AI benchmarks. This project addresses that gap by combining domain-specific annotation expertise with a reproducible fine-tuning pipeline.

Pipeline Stages

  • Phase 1: Environment setup and library familiarization (datasets, transformers, peft, trl)
  • Phase 2: PT-BR legal preference dataset construction with IAA simulation
  • Phase 3: Reward model training + DPO fine-tuning with W&B logging

Status

🔨 Active development — Phase 1 in progress

Author

Fabio De Pinho | LLM Training Data Specialist