Skip to content

Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna

Notifications You must be signed in to change notification settings

s-smits/grpo-optuna

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GRPO Optuna

Experimental GRPO fine-tuning experiments driven by Optuna hyperparameter search.

Setup

uv venv .venv
uv sync

Tests

uv run -m pytest

CPU sanity check

Run a tiny-model sweep to verify the pipeline without GPUs:

uv run python main.py \
  --model-name hf-internal-testing/tiny-random-gpt2 \
  --output-dir outputs/tiny \
  --run-name tiny \
  --fast-dev-run \
  --report-to none \
  --trials 1 \
  --no-initial

About

Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages