Skip to content

Rename tokens to tokenIds for [Int] parameters#14

Merged
DePasqualeOrg merged 1 commit intomainfrom
rename-tokens-to-tokenids
Mar 5, 2026
Merged

Rename tokens to tokenIds for [Int] parameters#14
DePasqualeOrg merged 1 commit intomainfrom
rename-tokens-to-tokenids

Conversation

@DePasqualeOrg
Copy link
Copy Markdown
Owner

tokens has been renamed to tokenIds in all parameters and local bindings where the type is [Int] (integer IDs), aligning with the naming convention in Python's transformers.

Python transformers consistently distinguishes "tokens" (string representations) from "token_ids" (integer representations). The decode method in Python uses token_ids as its parameter name. This package previously used decode(tokens: [Int]), which was inconsistent.

Changes

Protocol requirement (Tokenizer):

  • decode(tokens:skipSpecialTokens:) -> decode(tokenIds:skipSpecialTokens:)

Protocol extension (Tokenizer):

  • decode(tokens:) -> decode(tokenIds:)

PreTrainedTokenizer:

  • decode(tokens:skipSpecialTokens:) -> decode(tokenIds:skipSpecialTokens:)

BertTokenizer:

  • decode(tokens:) -> decode(tokenIds:)
  • unTokenize(tokens:) -> unTokenize(tokenIds:) (internal)

Not changed: Decoder.decode(tokens: [String]) and related code, where "tokens" correctly refers to string tokens.

Breaking change

This is a source-breaking change for any code calling decode(tokens:) or decode(tokens:skipSpecialTokens:). The fix is a straightforward rename at call sites.

@DePasqualeOrg DePasqualeOrg merged commit 08eb813 into main Mar 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant