Skip to content

Commit 6ce5dd2

Browse files
committed
pretrain docstring
1 parent 86f75c7 commit 6ce5dd2

1 file changed

Lines changed: 4 additions & 4 deletions

File tree

pretrain.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
"""
2-
Continues MLM pretraining of a base encoder (default: ModernBERT-large) on Sentry-grouping LLM analyses
3-
(per-row `prompt` + `thinking_output` + `response_output`, joined with the tokenizer's sep_token).
2+
Continues MLM pretraining of a base encoder on Sentry-grouping LLM prompts and completions:
3+
`prompt[SEP]thinking_output[SEP]response_output`
44
5-
Logs to wandb. Writes checkpoints + the final model to GCS. Unlike `train.py`, there's no async eval — the MLM loss
6-
in wandb is the only training-time signal.
5+
Logs to wandb. Writes checkpoints + the final model to GCS. Unlike `train.py`, there's no async eval. Just MLM loss on a
6+
subsample of val data run sync.
77
"""
88

99
import logging

0 commit comments

Comments
 (0)