Describe the bug
At least one use of tokenizers in eval_harness.py requires MarinTokenizer methods e.g.:
|
combined_encodings = {"input_ids": tokenizer.encode_batch(combined_batch)} |
That encode_batch method is not supported by HF tokenizers while some other functions in that module are relying on mutation:
|
self.tokenizer.pad_token_id = self.tokenizer.eos_token_id |
or callable tokenizers:
|
encoding: Union[List[List[int]], List[int]] = self.tokenizer( |
AFAICT these should be all be the same tokenizer provided to _LmEvalHarnessWorker, which has no type annotation making it hard to infer intent on.
Should everything in eval_harness.py be ported to work through the MarinTokenizer interface? This came up first for me as an error in trying to run lm_eval on HF checkpoints in #4677.
I'm unclear on what the right way to fix this is, especially since test_iterate_tokenized_requests in test_eval_harness.py is creating an HF tokenizer and explicitly passing it to _iterate_tokenized_requests, which should fail. These tests are being skipped in CI now (see the PR).
Describe the bug
At least one use of tokenizers in
eval_harness.pyrequires MarinTokenizer methods e.g.:marin/lib/levanter/src/levanter/eval_harness.py
Line 1668 in f8d4889
That
encode_batchmethod is not supported by HF tokenizers while some other functions in that module are relying on mutation:marin/lib/levanter/src/levanter/eval_harness.py
Line 587 in f8d4889
or callable tokenizers:
marin/lib/levanter/src/levanter/eval_harness.py
Line 694 in f8d4889
AFAICT these should be all be the same tokenizer provided to
_LmEvalHarnessWorker, which has no type annotation making it hard to infer intent on.Should everything in
eval_harness.pybe ported to work through theMarinTokenizerinterface? This came up first for me as an error in trying to run lm_eval on HF checkpoints in #4677.I'm unclear on what the right way to fix this is, especially since test_iterate_tokenized_requests in
test_eval_harness.pyis creating an HF tokenizer and explicitly passing it to_iterate_tokenized_requests, which should fail. These tests are being skipped in CI now (see the PR).