- 2026-02-22: Initial release.
Install dependencies as follows:
git clone https://github.com/SakanaAI/fast-weight-product-key-memory.git
cd fast-weight-product-key-memory
# Preferably use a virtual environment
# Tested with Python 3.12.11
bash install.shPrepare pre-training and evaluation data as follows:
bash scripts/data/fineweb.sh
bash scripts/data/lc64_nanogpt.sh
bash scripts/data/lambada.sh
bash scripts/data/pile_domain.shExample command for training GDN | PKM@6 + FwPKM@2,10 on 4 GPUs:
SEED=1; \
CFG=main/l12-gdn-pkm6-fwpkm2_10; \
MASTER_PORT=$(shuf -i 12300-65535 -n 1); \
deepspeed --include localhost:0,1,2,3 --master_port $MASTER_PORT --module src.pretrain \
-c \
cfgs/ike_config/fineweb_lc64/train.cfg \
--model_type qwen3_next_mem \
--pretrained_config_path cfgs/model_config/$CFG.json \
--peak_lr 0.0003 \
--min_lr 0.00003 \
--seed $SEED \
--override_attn_implementation flash_attention_2 \
--micro_batch_size 8 --micro_valid_batch_size 32 \
--log_grad_norms \
--log_weight_norms \
--save_log --save_model \
--save_wandb \
--wandb_project fwpkm_train \
--wandb_run_name $CFGSee more examples in the scripts/train directory:
scripts/train/train_main.sh- commands for training models in the main experiments.scripts/train/train_baseline.sh- commands for training baselines.scripts/train/train_ablation.sh- commands for training models in ablated studies.
Individual evaluation commands are provided in the scripts/eval directory. Results will be logged to wandb.
scripts/eval/ppl_stream.sh- PPL stream evaluation.scripts/eval/longbench_ppl.sh- LongBench PPL evaluation.scripts/eval/gate.sh- Gate analysis.scripts/eval/niah.sh- NIAH evaluation.scripts/eval/pile_domain.sh- Pile domain evaluation.
To run individual evaluation scripts, provide WANDB_RUN_NAME, MODEL_CFG, CHECKPOINT_PATH, MODEL_TYPE as command arguments to the evaluation script. For example, for Pile domain evaluation:
WANDB_RUN_NAME=main/l12-fa-pkm6-fwpkm2_10/fa2swa_p0.9
MODEL_CFG=cfgs/model_config/main/l12-fa-pkm6-fwpkm2_10.json
CHECKPOINT_PATH=./experiments/lm/main/l12-fa-pkm6-fwpkm2_10/fa2swa_p0.9_2026-02-20-07-05-31/checkpoint/best
MODEL_TYPE=qwen3_next_mem
bash scripts/eval/pile_domain.sh \
$WANDB_RUN_NAME \
$MODEL_CFG \
$CHECKPOINT_PATH \
$MODEL_TYPEAlternatively, you can run all evaluations for multiple checkpoints in one go with slurm. using the eval_wrapper.sh script by providing a checkpoint list tsv file, where each line corresponds to a checkpoint and contains the following tab-separated fields: <WANDB_RUN_NAME> <MODEL_CFG_PATH> <CHECKPOINT_PATH> <MODEL_TYPE>. For example:
CHECKPOINT_LIST_FILE=ckpt_lists/example.tsv
bash scripts/eval/eval_wrapper.sh $CHECKPOINT_LIST_FILEIf you find our work useful in your research, please consider citing:
@article{zhao2026fwpkm,
title={Fast-weight Product Key Memory},
author={Tianyu Zhao and Llion Jones},
year={2026},
eprint={2601.00671},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2601.00671},
}