FwPKM: Fast-weight Product Key Memory

📜Paper |

GitHub

Updates

2026-02-22: Initial release.

Get started

Install dependencies as follows:

git clone https://github.com/SakanaAI/fast-weight-product-key-memory.git
cd fast-weight-product-key-memory
# Preferably use a virtual environment
# Tested with Python 3.12.11
bash install.sh

Prepare pre-training and evaluation data as follows:

bash scripts/data/fineweb.sh
bash scripts/data/lc64_nanogpt.sh
bash scripts/data/lambada.sh
bash scripts/data/pile_domain.sh

Training

Example command for training GDN | PKM@6 + FwPKM@2,10 on 4 GPUs:

SEED=1; \
CFG=main/l12-gdn-pkm6-fwpkm2_10; \
MASTER_PORT=$(shuf -i 12300-65535 -n 1); \
deepspeed --include localhost:0,1,2,3 --master_port $MASTER_PORT --module src.pretrain \
    -c \
        cfgs/ike_config/fineweb_lc64/train.cfg \
    --model_type qwen3_next_mem \
    --pretrained_config_path cfgs/model_config/$CFG.json \
    --peak_lr 0.0003 \
    --min_lr 0.00003 \
    --seed $SEED \
    --override_attn_implementation flash_attention_2 \
    --micro_batch_size 8 --micro_valid_batch_size 32 \
    --log_grad_norms \
    --log_weight_norms \
    --save_log --save_model \
    --save_wandb \
    --wandb_project fwpkm_train \
    --wandb_run_name $CFG

See more examples in the scripts/train directory:

scripts/train/train_main.sh - commands for training models in the main experiments.
scripts/train/train_baseline.sh - commands for training baselines.
scripts/train/train_ablation.sh - commands for training models in ablated studies.

Evaluation

Individual evaluation commands are provided in the scripts/eval directory. Results will be logged to wandb.

scripts/eval/ppl_stream.sh - PPL stream evaluation.
scripts/eval/longbench_ppl.sh - LongBench PPL evaluation.
scripts/eval/gate.sh - Gate analysis.
scripts/eval/niah.sh - NIAH evaluation.
scripts/eval/pile_domain.sh - Pile domain evaluation.

To run individual evaluation scripts, provide WANDB_RUN_NAME, MODEL_CFG, CHECKPOINT_PATH, MODEL_TYPE as command arguments to the evaluation script. For example, for Pile domain evaluation:

WANDB_RUN_NAME=main/l12-fa-pkm6-fwpkm2_10/fa2swa_p0.9
MODEL_CFG=cfgs/model_config/main/l12-fa-pkm6-fwpkm2_10.json
CHECKPOINT_PATH=./experiments/lm/main/l12-fa-pkm6-fwpkm2_10/fa2swa_p0.9_2026-02-20-07-05-31/checkpoint/best
MODEL_TYPE=qwen3_next_mem
bash scripts/eval/pile_domain.sh \
    $WANDB_RUN_NAME \
    $MODEL_CFG \
    $CHECKPOINT_PATH \
    $MODEL_TYPE

Alternatively, you can run all evaluations for multiple checkpoints in one go with slurm. using the eval_wrapper.sh script by providing a checkpoint list tsv file, where each line corresponds to a checkpoint and contains the following tab-separated fields: <WANDB_RUN_NAME> <MODEL_CFG_PATH> <CHECKPOINT_PATH> <MODEL_TYPE>. For example:

CHECKPOINT_LIST_FILE=ckpt_lists/example.tsv
bash scripts/eval/eval_wrapper.sh $CHECKPOINT_LIST_FILE

Citation

If you find our work useful in your research, please consider citing:

@article{zhao2026fwpkm,
    title={Fast-weight Product Key Memory}, 
    author={Tianyu Zhao and Llion Jones},
    year={2026},
    eprint={2601.00671},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2601.00671}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
cfgs		cfgs
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
install.sh		install.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FwPKM: Fast-weight Product Key Memory

Updates

Get started

Training

Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

FwPKM: Fast-weight Product Key Memory

Updates

Get started

Training

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages