Skip to content

Conversation

@smarter
Copy link
Contributor

@smarter smarter commented Jan 22, 2026

Previously, token_batch_size was limited by max_position_embeddings, but this is the maximum length of one sequence, and a batch can contain multiple sequences. As far as I can tell, the real limit is the default value of model_max_length in the tokenizer (some models have model_max_length == max_position_embeddings which causes this confusion).

We also make the truncation logic more explicit by passing a max_length parameter instead of mutating model_max_len.

@smarter smarter force-pushed the tbs branch 4 times, most recently from 8942c43 to f677706 Compare January 22, 2026 19:04
With a recent pytorch we get:

bergson/process_preconditioners.py:59:68 - error: Argument of type "ProcessGroup | Unknown | int" cannot be assigned to parameter "group" of type "ProcessGroup | None"
         Type "ProcessGroup | Unknown | int" is not assignable to type "ProcessGroup | None"
           Type "int" is not assignable to type "ProcessGroup | None"
             "int" is not assignable to "ProcessGroup"
             "int" is not assignable to "None" (reportArgumentType)

The non ProcessGroup case should only happen when the `ranks` argument is passed
explicitly to `new_group`

Also upgrade pyright in the CI although the problem is unrelated.
Previously, token_batch_size was limited by max_position_embeddings, but this is
the maximum length of one sequence, and a batch can contain multiple sequences.
As far as I can tell, the real limit is the default value of model_max_length in
the tokenizer (some models have model_max_length == max_position_embeddings
which causes this confusion).

We also make the truncation logic more explicit by passing a max_length
parameter instead of mutating model_max_len.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant