Skip to content

friedhar/veloxbpe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Faster than OpenAI's Tiktoken

benchmark bar chart

volexbpe is a low latency high throughput Byte-Pair encoding derived tokenizer providng exceptional performance & streamline interface.

Built-In Supported Encodings

  • o200k_base - used in o3, o1, gpt-4o.
  • cl100k_base - used in gpt-4, gpt-3.5 turbo, gpt-3.5, most openai text embedding endpoints.
  • r50k_base - majority decreptad.
  • gpt-2 - gpt-2, open source

Install

pip install veloxbpe

Build & Install From Source

git clone https://github.com/friedhar/veloxbpe.git
maturin develop

Benchmark

All benchmarks can be run locally. After you've built from source, run

uv run bench/benchmark_bandwidth_0.py

TODO - Possible Road Map

  • Add support for custom BPE training.

About

A Fast Byte-Pair Encoding Tokenizer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published