[levanter] eval_harness.py mixes use of Marin and HF tokenizers

**Describe the bug**

At least one use of tokenizers in `eval_harness.py` requires MarinTokenizer methods e.g.:

https://github.com/marin-community/marin/blob/f8d48894ef38822773ea890ef2b119ee2607545f/lib/levanter/src/levanter/eval_harness.py#L1668

That `encode_batch` method is not supported by HF tokenizers while some other functions in that module are relying on mutation: https://github.com/marin-community/marin/blob/f8d48894ef38822773ea890ef2b119ee2607545f/lib/levanter/src/levanter/eval_harness.py#L587

or callable tokenizers: https://github.com/marin-community/marin/blob/f8d48894ef38822773ea890ef2b119ee2607545f/lib/levanter/src/levanter/eval_harness.py#L694

AFAICT these should be all be the same tokenizer provided to `_LmEvalHarnessWorker`, which has no type annotation making it hard to infer intent on.

Should everything in `eval_harness.py` be ported to work through the `MarinTokenizer` interface?  This came up first for me as an error in trying to run lm_eval on HF checkpoints in https://github.com/marin-community/marin/pull/4677.

I'm unclear on what the right way to fix this is, especially since [test_iterate_tokenized_requests](https://github.com/marin-community/marin/blob/f8d48894ef38822773ea890ef2b119ee2607545f/lib/levanter/tests/test_eval_harness.py#L98C5-L98C36) in `test_eval_harness.py` is creating an HF tokenizer and explicitly passing it to `_iterate_tokenized_requests`, which should fail.  These tests are being skipped in CI now (see the PR).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[levanter] eval_harness.py mixes use of Marin and HF tokenizers #4678

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[levanter] eval_harness.py mixes use of Marin and HF tokenizers #4678

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions