Experimental GRPO fine-tuning experiments driven by Optuna hyperparameter search.
uv venv .venv
uv syncuv run -m pytestRun a tiny-model sweep to verify the pipeline without GPUs:
uv run python main.py \
--model-name hf-internal-testing/tiny-random-gpt2 \
--output-dir outputs/tiny \
--run-name tiny \
--fast-dev-run \
--report-to none \
--trials 1 \
--no-initial