CharDLM

Simple character-level diffusion language model implemented in JAX.

Block decoding using NVIDIA's Fast-dLLM algorithm (Wu et al., 2025).

Demo

This is what Fast-dLLM decoding with block size of 4 looks like. Here I used a 10.8M parameter CharDLM model trained with 256 characters context length.

Sample evaluation:

Pre-training

The demo model is basically the chardlm-big implementation detailed in chardlm/model.py.

Model Specifications

Context length: 256 characters
Embedding dimension: 384
Number of heads: 6
Number of layers: 6
Dropout rate: 0.2
Total parameters: ~10.8M (~44 MB on disk)
Diffusion steps: 100
Noise schedule: Linear

Training was done on a single A100 GPU over 20k steps and it took me about 30 minutes overall.

The model did not fully converge when training was finish, so there is still a lot of room for improvement but I would like to save my wallet for other papers for now.

Setup

Dataset

The Tiny Shakespeare dataset is packaged with the repo, but if you want to donwload it yourself:

mkdir -p dataset
curl -o dataset/tiny_shakespeare.txt https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt

Installation

pip install -e .

Usage

Training

python train.py

Generation

python generate.py

Post-amble

This project was inspired by the announcement of Google DeepMind's Gemini Diffusion and was built on top of the Andrej Karpathy's NanoGPT (not nanochat GPT!). This means the LLM is a bit dated, for example, using absolute positional embeddings rather than RoPE. This choice was intentional and the beauty for me is in its simplicity; and if anything, this project proves that you can build a functional diffusion language model using absolute positional embeddings.

I was also heavily inspired by Nathan Barry's tiny-diffusion project which was built on top of nanochat GPT.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
chardlm		chardlm
dataset		dataset
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_test_set.py		eval_test_set.py
eval_train_set.py		eval_train_set.py
generate.py		generate.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
train.py		train.py
visualize_decoding.py		visualize_decoding.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CharDLM

Demo

Pre-training

Model Specifications

Setup

Dataset

Installation

Usage

Training

Generation

Post-amble

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CharDLM

Demo

Pre-training

Model Specifications

Setup

Dataset

Installation

Usage

Training

Generation

Post-amble

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages