Releases: jaco-bro/tokenizer
Releases · jaco-bro/tokenizer
BPE tokenizer in pure Zig, with a Python API.
pip install tokenizerz==0.0.1
A minimal BPE tokenizer written in Zig, now available as a Python package via pip install.
Key Features:
- Zig-native implementation with PCRE2 regex support (compiled during
pip installusing ziglang). - Python API: Simple
encode()/decode()interface and CLI tool (bpe --encode/decode). - Supports both tiktoken and huggingface models.
Install:
pip install tokenizerz==0.0.1Usage:
import tokenizerz
tokens = tokenizerz.encode("Hello, world!") # [9707, 11, 1879, 0]
text = tokenizerz.decode(tokens) # "Hello, world!" BPE tokenizer in zig
Works.