[levanter] Add labeled eval spans#5723
Conversation
|
🤖 Specification Problem Approach Key code Tests |
| Pos = model_pos.resize(batch.tokens.shape[0]) | ||
| return named_lm_example_from_labeled(batch, Pos=Pos) | ||
| if batch.tokens.ndim == 2: | ||
| Pos = model_pos.resize(batch.tokens.shape[1]) |
There was a problem hiding this comment.
might be easiest to just pass axis names instead?
| eval_current: bool = True, | ||
| eval_ema: bool = True, | ||
| ) -> Callable[[StepInfo], None]: | ||
| """Build a callback that logs labeled eval metrics for current and/or eval model.""" |
There was a problem hiding this comment.
ema model is different from eval model
|
|
||
| for i in range(vocab_size): | ||
| bytes[i] = byte_length_of_token(tokenizer, i) | ||
| class LabeledEvaluator(Generic[Ex, M]): |
| self.accum_for_batch = self._make_accum_for_batch() | ||
|
|
||
| @classmethod | ||
| def from_labeled_lm( |
There was a problem hiding this comment.
function name weird. there is no labeled lm right?
|
Claude finished @dlwh's task in 8m 52s —— View job Code reviewNo high-signal issues found. Checked the diff for bugs and AGENTS.md/CLAUDE.md compliance. What I checked
Reviewer comments already on the PR |
Helw150
left a comment
There was a problem hiding this comment.
Thank you! This is very clean
Add an exclusive integer label contract for per-token LM evaluation, plus a LabeledEvaluator that aggregates losses and BPB over named label groups. This separates training loss weights from evaluation annotations and gives future trace-style evals a generic span-label target.