bpe-tokenizer

Here are 60 public repositories matching this topic...

sefineh-ai / Amharic-Tokenizer

Syllable-aware BPE tokenizer for the Amharic language (አማርኛ) – fast, accurate, trainable.

nlp machine-learning deep-learning tokenizer python3 text-processing amharic african-languages language-processing ethiopic low-resource-languages bpe llm bpe-tokenizer amharic-tokenizer amharictokenizer amhtokenizer

Updated Nov 17, 2025
Python

gweidart / rs-bpe

Star

A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust

python rust openai pypi-package bpe byte-pair-encoding huggingface tokenizers llm tiktoken bpe-tokenizer byte-pair-tokenizer

Updated Mar 19, 2025
Python

extremecoder-rgb / MyGPT

Star

Implemented GPT from scratch

python cuda pytorch lora masking peft cross-entropy-loss multi-head-attention gelu adamw-optimizer temperature-scaling bpe-tokenizer standford-alpaca

Updated Oct 4, 2025
Jupyter Notebook

RahulDey12 / tiktoken-php

Sponsor

Star

A PHP implementation of OpenAI's BPE tokenizer tiktoken.

tiktoken bpe-tokenizer tiktoken-php special-tokens

Updated Jan 25, 2025
PHP

xmarva / transformer-architectures

Star

Teaching transformer-based architectures

nlp transformer attention-mechanism weights-and-biases positional-encoding bpe-tokenizer

Updated May 4, 2025
Jupyter Notebook

neuron-core / tokenizer

Sponsor

Star

High-Performance Tokenizer implementation in PHP.

php ai tokenizer ai-framework ai-agents llm bpe-tokenizer agentic-framework agentic-workflow

Updated Oct 21, 2025
PHP

jaco-bro / tokenizer

Star

BPE tokenizer for LLMs in Pure Zig

zig regex tokenizer pcre2 zig-package bpe-tokenizer

Updated Dec 29, 2025
Zig

jmaczan / bpe-tokenizer

Star

Byte-Pair Encoding tokenizer for training large language models on huge datasets

python machine-learning deep-learning tokenizer chunking from-scratch bpe byte-pair-encoding large-language-models llm bpe-tokenizer

Updated Jun 4, 2024
Python

U4RASD / r-bpe

Star

R-BPE: Improving BPE-Tokenizers with Token Reuse

low-resource-nlp large-language-models bpe-tokenizer vocabulary-adaptation

Updated Nov 26, 2025
Python

gianndev / Tok

Star

Tok: my own Tokenizer

tokenizer bpe bpe-tokenizer

Updated Sep 7, 2025
Jupyter Notebook

mrinalxdev / bpe-cpp

Star

implementation of Byte-Pair Encoding (BPE) for subword tokenization, written entirely in C++ . The tokenizer learns merges from raw text and supports encoding/decoding with UTF-8

machine-learning bpe-tokenizer

Updated Aug 21, 2025
C++

Lizhecheng02 / Kaggle-Automated_Essay_Scoring_2.0

Star

(1) Train large language models to help people with automatic essay scoring. (2) Extract essay features and train new tokenizer to build tree models for score prediction.

regression transformer classification vectorizer awp pooling huggingface treemodel kfold llm deberta-v3-large bpe-tokenizer

Updated Jul 3, 2024
Python

neluca / tinybpe

Star

🐍This is a fast, lightweight, and clean CPython extension for the Byte Pair Encoding (BPE) algorithm, which is commonly used in LLM tokenization and NLP tasks.

tokenizer cpython-extensions bpe llm bpe-tokenizer

Updated Apr 20, 2025
C

yuniko-software / qwen3-tokenizer-dotnet

Star

Multi-language BPE tokenizer implementation for Qwen3 models. Lightweight byte-pair encoding for C#/.NET

machine-learning csharp dotnet inference embedding-models onnx huggingface vector-database llm qwen bpe-tokenizer

Updated Dec 23, 2025
C#

Demon-Sheriff / tiny-BPE

Star

a parallel and minimal implementation of Byte Pair Encoding (BPE) from scratch in less than 200 lines of python.

python multiprocessing tokenization bpe-tokenizer

Updated Aug 30, 2025
Jupyter Notebook

willxxy / superbpe

Star

[Rust] Unofficial implementation of "SuperBPE: Space Travel for Language Models" in Rust

rust rust-lang bpe bytepairencoding bpe-tokenizer

Updated Apr 14, 2025
Rust

jmaczan / bpe.c

Star

High performance Byte-Pair Encoding tokenizer for large language models

c tokenizer clang bpe llm bpe-tokenizer

Updated Jun 23, 2024
C

franciszekparma / GBPET

Star

GPT-style language model with Byte Pair Encoding tokenizer, built from scratch in PyTorch.

python nlp machine-learning deep-learning pytorch transformer gpt language-model from-scratch bytepairencoding bpe-tokenizer

Updated Jan 20, 2026
Python

SauravP97 / hf-tokenizer-visualizer

Star

Visualize HuggingFace Byte-Pair Encoding (BPE) Tokenizer encoding process

python tokenizer artificial-intelligence huggingface bpe-tokenizer

Updated Feb 28, 2026
Python

estnafinema0 / russian-jokes-generator

Star

Transformer Models for Humorous Text Generation. Fine-tuned on Russian jokes dataset with ALiBi, RoPE, GQA, and SwiGLU.Plus a custom Byte-level BPE tokenizer.

nlp pytorch alibi transformer-models rotary-position-embedding grouped-query-attention swiglu bpe-tokenizer

Updated Mar 10, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the bpe-tokenizer topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bpe-tokenizer topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpe-tokenizer

Here are 60 public repositories matching this topic...

sefineh-ai / Amharic-Tokenizer

gweidart / rs-bpe

extremecoder-rgb / MyGPT

RahulDey12 / tiktoken-php

xmarva / transformer-architectures

neuron-core / tokenizer

jaco-bro / tokenizer

jmaczan / bpe-tokenizer

U4RASD / r-bpe

gianndev / Tok

mrinalxdev / bpe-cpp

Lizhecheng02 / Kaggle-Automated_Essay_Scoring_2.0

neluca / tinybpe

yuniko-software / qwen3-tokenizer-dotnet

Demon-Sheriff / tiny-BPE

willxxy / superbpe

jmaczan / bpe.c

franciszekparma / GBPET

SauravP97 / hf-tokenizer-visualizer

estnafinema0 / russian-jokes-generator

Improve this page

Add this topic to your repo