Skip to content

feat: add FFT trainer worker class#113

Merged
ShubyM merged 3 commits into
gke-labs:mainfrom
ShubyM:feat/fft-worker
Jun 5, 2026
Merged

feat: add FFT trainer worker class#113
ShubyM merged 3 commits into
gke-labs:mainfrom
ShubyM:feat/fft-worker

Conversation

@ShubyM

@ShubyM ShubyM commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

This PR splits the existing trainer into the shape we need for full fine-tuning. The old trainer was really a LoRA worker because one process owns a base model and serves multiple jobs by creating and switching adapters. Full fine-tuning has a different ownership model where one worker process owns one trainable model for one job. To make that distinction explicit, this introduces a shared BaseTrainerWorker, keeps LoRA-specific adapter management in LoraTrainingWorker, and adds an FFTTrainingWorker for the single-model full fine-tuning path.

Most of the actual training math is shared between the two modes, so this also moves common forward/backward, padding, logprob, batching, and generation code into the base worker. The loss functions are factored into pure tensor operations in losses.py, which makes them easier to test directly and keeps the worker classes focused on model lifecycle and orchestration.

This PR does not wire full fine-tuning into the API server yet. It just adds the worker split and shared math needed for that follow-up.

@ShubyM ShubyM requested a review from droot June 5, 2026 17:29

@droot droot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great refactor! Thanks!

@ShubyM ShubyM merged commit fed9fd7 into gke-labs:main Jun 5, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants