Skip to content

Releases: jaco-bro/tokenizer

BPE tokenizer in pure Zig, with a Python API.

09 May 21:29
e37bcf1

Choose a tag to compare

pip install tokenizerz==0.0.1

20 Apr 06:48
a6758d4

Choose a tag to compare

A minimal BPE tokenizer written in Zig, now available as a Python package via pip install.

Key Features:

  • Zig-native implementation with PCRE2 regex support (compiled during pip install using ziglang).
  • Python API: Simple encode()/decode() interface and CLI tool (bpe --encode/decode).
  • Supports both tiktoken and huggingface models.

Install:

pip install tokenizerz==0.0.1

Usage:

import tokenizerz  
tokens = tokenizerz.encode("Hello, world!")  # [9707, 11, 1879, 0]  
text = tokenizerz.decode(tokens)             # "Hello, world!"  

BPE tokenizer in zig

19 Apr 04:38
385ca72

Choose a tag to compare