⚡ fastokens

fastokens is a fast BPE tokenizer for use with popular open-weight LLMs, built on top of a high-performance Rust backend.

fastokens can be installed from source:

git clone https://github.com/atero-ai/fast-tokens
uv pip install fast-tokens/python

The Python API lives in the python directory. To use fastokens as a drop-in replacement with transformers, or with NVIDIA Dynamo, see the usage examples below.

Performance

fastokens on average achieves a 10x+ faster tokenization compared to the tokenizers library. The gap widens as prompt sizes scale, as shown in the graphs below.

Faster tokenization directly impacts live workloads. Tested using SGLang's benchmark suite, fastokens reduces time-to-first-token (TTFT) across prompt sizes:

Note that fastokens is focused on inference and does not support all features of tokenizers. In particular, additional encoding outputs, and some normalizers/pretokenizers are not available.

Tested models

The following models have been tested, but fastokens should generally work with most BPE tokenizers supported by the transformers library, including:

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
openai/gpt-oss-120b
deepseek-ai/DeepSeek-V3.2
deepseek-ai/DeepSeek-V3
deepseek-ai/DeepSeek-R1
Qwen/Qwen3-Next-80B-A3B-Thinking
Qwen/Qwen3-Next-80B-A3B-Instruct
Qwen/Qwen3-235B-A22B-Instruct-2507
Qwen/Qwen3.5-397B-A17B
MiniMaxAI/MiniMax-M2.1
MiniMaxAI/MiniMax-M2.5
mistralai/Devstral-Small-2-24B-Instruct-2512
zai-org/GLM-4.7
zai-org/GLM-5

Usage

Using with transformers

Note that it currently works with transformers 4.57.1 (the version used by current sglang).

import fastokens
fastokens.patch_transformers()

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16")
tokens = tokenizer("Hello, world!")
assert tokens["input_ids"] == [22177, 1044, 4304, 1033]

Standalone usage

from fastokens._native import Tokenizer
tokenizer = Tokenizer.from_model("deepseek-ai/DeepSeek-V3.2")
tokens = tokenizer.encode("A very long prompt that is now lightning fast.")

Dynamo usage

fastokens is integrated with NVIDIA Dynamo's frontend, and can be used by passing the flag --tokenizer fastokens to the latest version (either build from source or wait for the official release, coming in the next few days).

Acknowledgements

This library builds on the well-known and widely used Hugging Face tokenizers library and uses code written for HF tokenizers in several flows.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.cargo		.cargo
.github/workflows		.github/workflows
assets		assets
examples		examples
python		python
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
NOTICES.txt		NOTICES.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ fastokens

Performance

Tested models

Usage

Using with transformers

Standalone usage

Dynamo usage

Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⚡ fastokens

Performance

Tested models

Usage

Using with transformers

Standalone usage

Dynamo usage

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages