Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
305 commits
Select commit Hold shift + click to select a range
0b635e6
build docker images for 2.9.x (#3273)
winglian Nov 20, 2025
006f226
Feat: add Olmo3 (BC with Olmo and Olmo2) (#3275)
NanoCode012 Nov 24, 2025
8990ca3
fix: removed unused "scikit-learn==1.4.2" (#3277)
ved1beta Nov 24, 2025
b234532
Feat: add peft_ensure_weight_tying (#3278)
NanoCode012 Nov 28, 2025
7fb6a94
chore: update pre-commit hooks (#3287)
github-actions[bot] Dec 1, 2025
c6ddcdd
feat: add exaone4 chat template and update enums (#3279)
nayohan Dec 1, 2025
4a0f98e
feat: upgrade liger to 0.6.4 (#3289)
NanoCode012 Dec 2, 2025
86d8cca
Feat: add trinity by ArceeAI (#3292)
NanoCode012 Dec 2, 2025
2b66ee1
Feat: add ministral3 (#3297)
NanoCode012 Dec 4, 2025
5992e60
fix: improve ministral3 docs to be clearer (#3300)
NanoCode012 Dec 4, 2025
75b20fb
Save processor in quantizer CLI (#3290)
salmanmohammadi Dec 6, 2025
b3f4aa1
fix bin size (#3307)
ved1beta Dec 8, 2025
4ac78aa
fix: update qwen3 jinja tokenization off a few tokens (#3295)
NanoCode012 Dec 9, 2025
2a664dc
support for xformers wheels for torch 2.9 (#3308)
winglian Dec 11, 2025
a1d07f4
Fix(misc): address PYTORCH_CUDA_ALLOC_CONF deprecate (#3313)
NanoCode012 Dec 17, 2025
83d4d97
Add QAT NVFP4 configs for blogpost (#3280) [skip ci]
salmanmohammadi Dec 17, 2025
2cf254b
Add `peft_autocast_adapter_dtype` config option (#3311) [skip ci]
xzuyn Dec 17, 2025
3e51a68
fix: Fix evaluation loss in KD trainer (#3271)
roycho96 Dec 17, 2025
2197b0b
feat: cheap ppl metric (#3317)
xzuyn Dec 18, 2025
3750d7d
add liger support kernal for dpo (#3302)
ved1beta Dec 18, 2025
bbd3486
Distributed Muon Optimizer (#3264)
salmanmohammadi Dec 19, 2025
07c41a6
fix preview docs failing due to running out of disk (#3326) [skip ci]
winglian Dec 19, 2025
43cef27
Fix typo in densemixer RuntimeError (#3327) [skip ci]
bethrezen Dec 22, 2025
faaff6c
allow users to set ndigits for rounding of metrics when logging (#3325)
ved1beta Dec 22, 2025
efeb5a4
fix check for fp8 capability (#3324)
winglian Dec 22, 2025
92ee425
feature: raise on long sequence drop (#3321)
kallewoof Dec 22, 2025
f2155ea
feat: add trackio as experiment tracking integration (#3253)
abidlabs Dec 23, 2025
97f1b17
Feat: add kimi linear support (#3257)
NanoCode012 Dec 25, 2025
372f664
feat: cleanup old flex mask patch, suppress Matmul bnb warn, and misc…
NanoCode012 Dec 25, 2025
418933f
feat: add internvl3_5 (#3141) [skip-ci]
NanoCode012 Dec 25, 2025
4f5e8a3
Feat: add MiMo and Plano (#3332) [skip-ci]
NanoCode012 Dec 25, 2025
a6080df
compute loss only if training and update token metric naming (#3293) …
ved1beta Dec 25, 2025
66a3de3
build examples readmes with quarto (#3046)
winglian Dec 25, 2025
11c0b5b
bartch upgrade dependencies (#3299)
winglian Dec 30, 2025
f45a97a
docs for checkpiont saving (#3335) [skip ci]
ved1beta Dec 30, 2025
e73dab6
support pydantic 2.12 (#3328)
winglian Dec 30, 2025
2b199f9
chore: update pre-commit hooks (#3340) [skip ci]
github-actions[bot] Jan 1, 2026
afe18ac
deprecate torch 2.7.1 (#3339)
winglian Jan 1, 2026
b26ba3a
don't build images w cuda 130 since we don't have flash attention whe…
winglian Jan 3, 2026
4e61b8a
use updated version of prebuilt wheels for flash attention for cu130 …
winglian Jan 5, 2026
ee59e4d
add cu130 + torch 2.9.1 to test matrices (#3343)
winglian Jan 5, 2026
8aab807
feat: Add SwanLab integration for experiment tracking (#3334)
PraMamba Jan 6, 2026
7bf6f70
fix total/trainable tokens log (#3344)
ved1beta Jan 6, 2026
e7f0d4b
Increased test coverage for lora/qlora (#3147)
ved1beta Jan 6, 2026
4ae6f76
bump bnb to v0.49.1 (#3351)
salmanmohammadi Jan 12, 2026
3e0bbd3
feat: add ARM64/AArch64 build support to Dockerfile-base (#3346)
1sand0s Jan 12, 2026
258ce8d
feat : scaled softmax support (#3338)
ved1beta Jan 13, 2026
359b7ad
fix: gemma3_text model loading vision config (#3354)
NanoCode012 Jan 13, 2026
dc77b5b
fix arm64 builds (#3355)
winglian Jan 14, 2026
1410e44
update PR template (#3349) [skip ci]
salmanmohammadi Jan 14, 2026
6331e4a
fix amd64 and set 2.9.1 as latest cloud image (#3356)
winglian Jan 14, 2026
d282f32
don't install deepspeed in arm64 images (#3357)
winglian Jan 14, 2026
790df75
don't install xformers in for arm64 (#3359)
winglian Jan 16, 2026
8f25124
upgrade transformers to 4.57.5 (#3358)
winglian Jan 16, 2026
c413480
upgrade transformers to 4.57.6 and peft to 0.17.1 and datasets to 4.5…
winglian Jan 16, 2026
6e42def
set version to v0.13.1 (#3363)
winglian Jan 20, 2026
8ab9d9e
Version dev (#3365)
winglian Jan 21, 2026
8cd75cf
use cuda 12.9.1 and add python 3.12 to base images (#3367)
winglian Jan 21, 2026
8623dd8
strip only starting 'v' char; e.g don't strip from '.dev' (#3368) [sk…
winglian Jan 21, 2026
d0d26d5
feat: Add GDPO Support (#3353)
ved1beta Jan 21, 2026
04328ae
cu129 targets for ci builds (#3369)
winglian Jan 21, 2026
a531e9d
upgrade vllm to v0.14.0 (#3345)
winglian Jan 22, 2026
fc4e379
transformers v5 upgrade (#3272)
winglian Jan 27, 2026
dd9ebae
EAFT (#3366) [skip ci]
salmanmohammadi Jan 28, 2026
3dd86d3
feat: add new cce support for glm series and exaone4 (#3373) [skip ci]
NanoCode012 Jan 28, 2026
6132a30
handle warnings from v5 upgrade (#3376)
winglian Jan 28, 2026
3738978
Add support for batched_mm, grouped_mm and scattermoe for MoE models …
winglian Jan 29, 2026
be00978
tag for v0.14.0 release (#3379)
winglian Jan 30, 2026
236dad3
set 0.15.0.dev0 version (#3380)
winglian Jan 31, 2026
0343a72
add glm support + patch (#3329) [skip ci]
ved1beta Feb 10, 2026
530a0c0
Changes from dataset_processes to dataset_num_proc (#3352) [skip ci]
tgoab Feb 10, 2026
86a5803
train_per_sec_per_gpu metric (#3364) [skip ci]
ved1beta Feb 10, 2026
97a4f28
fix: saving state dict and eval for Context Parallel (#3382) [skip ci]
ved1beta Feb 10, 2026
fcc4cfd
feat: add sageattention (#2823) [skip ci]
NanoCode012 Feb 10, 2026
b6d3653
feat: add step3p5 for cce (#3384) [skip ci]
NanoCode012 Feb 10, 2026
ed7105d
fix: GRPO config not accept max_prompt_length (#3390) [skip ci]
NanoCode012 Feb 10, 2026
37e9da7
add hub_revision support for specifying branch when pushing checkpoin…
madScientist10 Feb 10, 2026
a2da852
fix: improve lora kernels failure message and handle trust_remote_cod…
NanoCode012 Feb 10, 2026
c67cbcb
fix: ignore add_special_tokens and use test mode for generation for m…
NanoCode012 Feb 10, 2026
a4ee56c
fix: set rollout in GRPO training_kwargs (#3392)
ved1beta Feb 10, 2026
4e22cf0
fix: remove telemetry warning (#3397) [skip ci]
NanoCode012 Feb 10, 2026
06ac407
feat: improve telemetry log (#3398)
NanoCode012 Feb 10, 2026
5eb2655
fix generic patch for cce (#3405)
winglian Feb 12, 2026
d6a2532
feat(doc): clarify how to use scattermoe (#3408) [skip ci]
NanoCode012 Feb 15, 2026
4f1b5ad
fix: clarify how to use lm_eval plugin (#3404) [skip ci]
NanoCode012 Feb 15, 2026
145ffc9
upgrade transformers to 5.2.0 and torchao to 0.16.0 (#3407)
winglian Feb 19, 2026
7fbedbd
fix(doc): add limitation for unfrozen_parameters (#3416)
NanoCode012 Feb 19, 2026
29722de
use bunnycdn for CI assets (#3422) [skip ci]
winglian Feb 20, 2026
0ea252d
update to trackio 0.16.1 (#3425) [skip ci]
winglian Feb 20, 2026
43d60c7
bump cut-cross-entropy to 58d6572 (#3424)
NanoCode012 Feb 20, 2026
3f30572
Fix typo in dataset_processes field (#3426)
lorenzbaraldi Feb 23, 2026
5ed4557
feat: support dot-notation CLI args for nested config options (#3419)
ManasVardhan Feb 23, 2026
86ca1e2
fix: update MistralProcessor to be v5 compat (#3423)
NanoCode012 Feb 23, 2026
08441fe
fix: set allowed values for `adapter` config (#3415)
NanoCode012 Feb 23, 2026
68f1b70
ScatterMoE LoRA support (#3410)
winglian Feb 24, 2026
b40803d
build base images for torch 2.10.0 (#3429)
winglian Feb 25, 2026
1791d87
build axolotl images with torch 2.10.0 (#3430)
winglian Feb 25, 2026
a131e4d
sample gen support sft (#3240) [skip ci]
ved1beta Feb 25, 2026
8f54b4e
fix: pass revision parameter to tokenizer and processor loaders (#338…
madScientist10 Feb 25, 2026
2b6f4a6
Fix: excess_length_strategy truncation method (#3401)
rlronan Feb 25, 2026
18f26c1
add uv axolotl builds (#3431)
winglian Feb 25, 2026
7f23b30
bug-fix: use self.optimizer if optimizer not passed to SchedulerMixin…
kallewoof Mar 2, 2026
f447bce
fix: do not push telemetry on non-master rank (#3438)
NanoCode012 Mar 2, 2026
aa88c2e
fix uv cache subcommand (#3447)
winglian Mar 2, 2026
444020b
mark slow tests that are timing out in CI (#3428) [skip ci]
winglian Mar 2, 2026
474208b
fix: Save de-duplicated dataset during pre-processing (#3427)
ManasVardhan Mar 2, 2026
4272817
don't install torch ao on arm64 (#3448)
winglian Mar 2, 2026
77828d3
uv cloud image should use uv w pip (#3449)
winglian Mar 2, 2026
e672d37
fix: qwen3-next to use fla causal-conv1d to support packing (#3437
NanoCode012 Mar 3, 2026
945c8ae
Fix: quantize and target moe layers in transformers v5 for adapters a…
NanoCode012 Mar 3, 2026
653f90b
Add torch 2.10.0 to unit tests and use python 3.14 (#3450)
winglian Mar 3, 2026
b6b8db8
fix python version typo for building 3.11 (#3454)
winglian Mar 4, 2026
753906c
feat: add doc for expert quantization, glm45 air example configs, and…
NanoCode012 Mar 5, 2026
8e2a102
Fix FSDP2 sharding and validate AO version for LR groups (#3403)
bekk02 Mar 5, 2026
28cc085
include number of params and rounded est of params so we can easily g…
winglian Mar 5, 2026
4b8bc52
fix: correct total_num_steps and batch_size calculation with context …
Yatimai Mar 5, 2026
1eaf4d7
add: support mxfp4 axo (#3375)
ved1beta Mar 5, 2026
6a8baf8
feat: add sonicmoe (#3411)
NanoCode012 Mar 5, 2026
234931d
extend pytest-sdist timeout to 30 min for slow/flaky tests (#3456) [s…
winglian Mar 5, 2026
6c44afa
chore: update pre-commit hooks (#3381) [skip ci]
github-actions[bot] Mar 6, 2026
56162f7
monkeypatch fix for fsdp with cpu ram efficient loading (#3464) [skip…
winglian Mar 6, 2026
cada93c
upgrade transformers==5.3.0 trl==0.29.0 kernels (#3459)
winglian Mar 6, 2026
da17c7c
fix: use dp_world_size instead of world_size for batch_size with tens…
Yatimai Mar 6, 2026
a260d33
add info about linting that was removed at some point (#3458) [skip ci]
winglian Mar 6, 2026
6c8c73e
fix(validation): add validation for lora target linear with quantize …
NanoCode012 Mar 6, 2026
c119382
add: qwen 3.5 (#3442)
ved1beta Mar 6, 2026
fc2d63e
use new tf32 APIs for torch 2.9+ (#3467) [skip ci]
winglian Mar 6, 2026
0a23ae0
fix: position_ids casted to int64 for qwen35 patch (#3468) [skip ci]
NanoCode012 Mar 6, 2026
d65e1b9
fix: add guard for _initialize_missing_keys patch (#3469) [skip ci]
NanoCode012 Mar 6, 2026
876941f
install flash-linear-attention (#3466)
winglian Mar 6, 2026
8f19169
tag for v0.15.0 release (#3470)
winglian Mar 6, 2026
46b9f40
bump dev version to 0.16.0.dev0 (#3472) [skip ci]
winglian Mar 6, 2026
80f7088
update setuptools so trl can be installed from main for nightlies (#3…
winglian Mar 6, 2026
a36aaa7
add gpu tests for scattermoe (#3474) [skip ci]
winglian Mar 7, 2026
43b1c80
load weights synchronously so they can be converted and not OOM: (#3477)
winglian Mar 7, 2026
cf4d550
fix: reduce permissions for preview docs CI (#3480) [skip ci]
NanoCode012 Mar 9, 2026
23ad40b
fix: disable async load when loading quantized bnb
NanoCode012 Mar 11, 2026
fccc712
builds for py312-cu128-torch2.9.1 (#3489)
winglian Mar 12, 2026
819b157
swap around what we're building for docker (#3490)
winglian Mar 12, 2026
79908b3
use ubuntu user instead of root for uv docker images (#3491)
winglian Mar 13, 2026
083c5a0
check ubuntu user and set uv python dir (#3492)
winglian Mar 13, 2026
e1ff756
become the ubuntu user when root logs in (#3494)
winglian Mar 13, 2026
ff77fa2
preserve env for root -> ubuntu user (#3495)
winglian Mar 13, 2026
d8a0574
Reverts commits 79908b3c6, 083c5a042, e1ff75624, ff77fa248. (#3496)
winglian Mar 13, 2026
a806704
moe quant patch for merge miss match (#3483)
ved1beta Mar 16, 2026
d8a646c
chore: logging cleanup (#3482) [skip ci]
NanoCode012 Mar 16, 2026
f56efdb
fix: high eval loss w/ sample packing (#3478) [skip ci]
ved1beta Mar 16, 2026
defee62
fix: fix CONTRIBUTING.md placeholders, bare except clauses, and add c…
Hadar01 Mar 16, 2026
4a5876d
fix: explicit set workflow permission and move secrets to necessary (…
NanoCode012 Mar 16, 2026
7da5f94
feat: add FA4 (#3481)
NanoCode012 Mar 16, 2026
a098df5
feat: add Mistral Small 4 (#3502)
NanoCode012 Mar 17, 2026
d230cbb
chore(doc): update readme (#3503) [skip ci]
NanoCode012 Mar 17, 2026
830e9f7
automatically enable tf32 if supported (#3473) [skip ci]
winglian Mar 17, 2026
8f3fb51
consolidate behavioud of routing in scattermoe kernels (#3475)
winglian Mar 17, 2026
999b3fe
fix: replace shell=True subprocess with argument list in modal CLI (#…
Hadar01 Mar 17, 2026
5ef3f28
Support for Async GRPO (#3486)
winglian Mar 17, 2026
f291ac0
fix for flaky tests in lora ops kernels w autotune (#3511) [skip ci]
winglian Mar 19, 2026
163bd4d
use custom triton kernels for entropy from logits and selective softm…
winglian Mar 19, 2026
bb483ad
make the CI fail GitHub Actions on test failures (#3517)
winglian Mar 19, 2026
1fc86d5
Scattermoe LoRA optimizations (#3513)
winglian Mar 20, 2026
7920fe7
fix num_labels= 1 test fail (#3493) [skip ci]
ved1beta Mar 20, 2026
113d275
qwen docs + new config (#3499) [skip ci]
ved1beta Mar 20, 2026
b3823cc
fix: gemma3 configs (#3500) [skip ci]
ved1beta Mar 20, 2026
c13cb7c
feat: add nemotron config (#3506)
ved1beta Mar 20, 2026
038ffe3
fix: solved double sequence partition from SequenceParallelContextMan…
lorenzbaraldi Mar 20, 2026
c57acef
Qwen3.5-MoE example config with lora_target_modules regex (#3515) [sk…
Nero10578 Mar 20, 2026
7ddfb2d
cleanup: remove dead SDPA patches (#3488) [skip ci]
OnePunchMonk Mar 20, 2026
5a5cf30
fix: add dequant bf16 repo (#3507) [skip ci]
NanoCode012 Mar 20, 2026
1bcfc08
feat: add support and end-to-end tests for multiple custom optimizers…
OnePunchMonk Mar 20, 2026
b0294b3
handle qwen3.5 moe loading (#3523) [skip ci]
winglian Mar 20, 2026
2c05847
reduce autotune search space (#3525) [skip ci]
winglian Mar 21, 2026
0ee98a0
fix token state json and mistral tokenizer issue (#3522) [skip ci]
winglian Mar 22, 2026
c9df6ef
support offloading layers to CPU (#3512) [skip ci]
winglian Mar 22, 2026
fc3b3d1
synthetic datasets for benchmarking and testing (#3518) [skip ci]
winglian Mar 22, 2026
5b2e3f0
fix: handle connection errors when checking user whoami (#3529)
winglian Mar 22, 2026
a67392c
liger support for qwen 3.5 and fused rmsnorm+gated (#3531) [skip ci]
winglian Mar 22, 2026
b3289fd
feat: LoRA kernel support for bias, dropout, dora, embeddings (#3528)…
winglian Mar 22, 2026
0e583ef
increase rtol, codecov informational only, don't silently fail errors…
winglian Mar 22, 2026
86be9f3
post merge lora fixes for CI (#3536) [skip ci]
winglian Mar 23, 2026
e412370
roundup_power2_divisions not needed with newer pytorch versions (#3540)
winglian Mar 24, 2026
e9883c9
fix: robust handling of race condition on patching check (#3543) [ski…
winglian Mar 24, 2026
c50c4ac
EBFT: Matching Features, Not Tokens: Energy-Based Fine-Tuning of Lang…
winglian Mar 24, 2026
1f1ebb8
feat: move to uv first
NanoCode012 Mar 25, 2026
2fb7279
Revert "feat: move to uv first" (#3544)
NanoCode012 Mar 25, 2026
c2bd75a
Nemo gym integration (#3516) [skip ci]
winglian Mar 25, 2026
678ebb1
Fix Ray train crashing after succeeding (#3542) [skip ci]
mhambre Mar 25, 2026
ff0f67c
feat: add custom routing support for ernie4_5_moe, and hunyuan_v1_moe…
OnePunchMonk Mar 25, 2026
b55706b
feat:merge-lora iterate through bins without loading (#3095)
ved1beta Mar 25, 2026
74b959e
dispatch scored rollouts to plugins, extend path for external plugins…
winglian Mar 25, 2026
5191e4e
More minor RL fixes (#3551)
winglian Mar 25, 2026
99bde01
deprecate torch 2.8.0 support (#3550)
winglian Mar 25, 2026
00dee05
support flattening/packing for GRPO (#3552)
winglian Mar 28, 2026
bb622b8
super nemo support (#3508)
ved1beta Mar 30, 2026
a81feab
DPO transformers v0.29 fixes (#3560) [skip ci]
BrownianNotion Mar 31, 2026
a4c9441
bug-fix: only apply patches when CUDA is available (#3561)
kallewoof Mar 31, 2026
5e5603c
upgrade transformers to 5.4.0 (#3562)
winglian Mar 31, 2026
9e64c76
qwen3.5 configs (#3554) [skip ci]
ved1beta Apr 1, 2026
f6c122b
allow bf16 flag but warn (#3563) [skip ci]
kallewoof Apr 1, 2026
438ea7b
chore: update pre-commit hooks (#3567) [skip ci]
github-actions[bot] Apr 1, 2026
96ae8bd
Add troubleshooting note for GLM4 GGUF MTP mismatch (#3559) [skip ci]
mariozupan Apr 1, 2026
1b1fc91
Add precompute_ref_log_probs to config schema (#3555) [skip ci]
joaquinhuigomez Apr 1, 2026
6c92b5c
lazy load trainer classes to prevent unnecesary imports (#3568)
winglian Apr 1, 2026
c92b71b
MX QAT patch (#3553)
ved1beta Apr 1, 2026
55a7950
fix: DPO tool role KeyError (#3217), dataset hash output_dir (#3303),…
Edward-Zion-Saji Apr 1, 2026
50e9573
Update lm-eval for transformers v5 support (#3571) [skip ci]
BrownianNotion Apr 2, 2026
16e3223
feat(docs): comprehensive improvement (#3564)
NanoCode012 Apr 2, 2026
842fa03
feat: add sonicmoe fused lora support (#3519)
NanoCode012 Apr 2, 2026
573726c
upgrade torchao to 0.17.0 (#3569)
winglian Apr 2, 2026
08fc7de
gemma4 support (#3574)
winglian Apr 2, 2026
900eec7
Fix DO_NOT_TRACK not being correctly handled (#3580)
maximegmd Apr 4, 2026
6f15da4
make it easier for agents to discover docs (#3579) [skip ci]
winglian Apr 6, 2026
dc638e7
fix(config): add cce and liger to nemotron-h example (#3573) [skip ci]
NanoCode012 Apr 6, 2026
149178d
chore: cleanup post release v0.16 (#3577)
NanoCode012 Apr 6, 2026
7c56809
use vllm 0.19.0 for torch 2.10.0 (#3582)
winglian Apr 7, 2026
7daf7d9
fix: regex for unfrozen language tower (#3586) [skip ci]
NanoCode012 Apr 8, 2026
4ef608d
fix ddp/fsdp w gemma4 (#3584)
winglian Apr 10, 2026
4dfa0a5
Add uninstall command to cut_cross_entropy import message (#3583) [sk…
floaty3 Apr 10, 2026
bfb4da1
fix: document jinja2 file path support (#3588) [skip ci]
NanoCode012 Apr 10, 2026
e7a6a5b
fix: move warning after we've set any overrides (#3589) [skip ci]
NanoCode012 Apr 10, 2026
315cdee
handle trainable/masked spans in content and reasoning content (#3592)
winglian Apr 10, 2026
29fa4de
Gemma4 fixes and profiler (#3591)
winglian Apr 10, 2026
e77a185
upgrade transformers to use v5.5.3 (#3593)
winglian Apr 10, 2026
122b50b
pre-cache the eot token ids rather than on each iteration (#3594) [sk…
winglian Apr 12, 2026
e2f6982
[fix][fsdp2] clone sharded param so original full size shard can be g…
winglian Apr 12, 2026
e079cf1
qwen3_5.jinja: handle list content on system messages (#3595) [skip ci]
joaquinhuigomez Apr 12, 2026
b8358aa
[gemma4] use mixed Flash Attention and SDPA and add fused RMSNorm+RoP…
winglian Apr 12, 2026
66c3e5a
better handling of dora merge on Conv layers in Qwen 3.5 (#3599)
winglian Apr 12, 2026
a44edda
Skip redundant evaluation when resuming from checkpoint (#3575) [skip…
joaquinhuigomez Apr 13, 2026
3985ec2
feat: add FineGrainedFP8Config support for model quantization (#3587)…
madScientist10 Apr 13, 2026
63a58cf
feat: support excess_length_strategy for RL trainers (#3578) [skip ci]
yurekami Apr 13, 2026
6990478
fix: rename model to adapter_model for fsdp sharded final model (#3585)
NanoCode012 Apr 13, 2026
323da79
bump transformers to 5.5.4 and trl to latest 1.1.0 (#3603)
winglian Apr 15, 2026
9de5b76
feat: move to uv first (#3545)
NanoCode012 Apr 21, 2026
e562e14
fix: [gemma4] fix VRAM leak in hybrid FA2+SDPA (hybrid attentiuon) pa…
thad0ctor Apr 21, 2026
05113bc
train on remote compute using Tinker compatible APIs (#3614)
winglian Apr 22, 2026
7420fd4
fix async prefetch with nemogym (#3606)
winglian Apr 22, 2026
90090fa
DPO support loss types (#3566)
BrownianNotion Apr 23, 2026
bcbe049
Feat: add support for datasets with `str` saved `messages` field (#3607)
brightwind26 Apr 23, 2026
1bf65c5
feat: add processor_kwargs YAML field forwarded to from_pretrained (#…
thad0ctor Apr 23, 2026
901f235
dpo collation/padding (#3601) [skip ci]
winglian Apr 23, 2026
17fc747
fix: docker build failing (#3622)
NanoCode012 Apr 24, 2026
798c8fb
chore: update docker docs (#3623)
NanoCode012 Apr 24, 2026
ac77da9
use smaller pretrained models for ci (#3620) [skip ci]
winglian Apr 27, 2026
ebbd7fa
feat: Add Mistral Medium 3.5 (#3633)
NanoCode012 Apr 29, 2026
e662972
Feat: Add bitnet integration (#3634)
younesbelkada Apr 30, 2026
6136ae6
Fix: add bitnet config (#3636)
younesbelkada Apr 30, 2026
e4032fc
Refactor separate attention flags with attn_implementation and capabi…
winglian May 5, 2026
c15f6cf
fix: FSDP FULL_STATE_DICT oom from memory leak (#3635)
ved1beta May 5, 2026
5352d41
feat: systemic multimodal assistant-only loss masking + cfg.role_boun…
thad0ctor May 5, 2026
e2f01de
Fix Axolotl ReLoRA optimizer reset scope (#3646)
winglian May 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
15 changes: 12 additions & 3 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,11 @@ PRs are **greatly welcome**!

Please run below to setup env
```bash
pip3 install -r requirements-dev.txt -r requirements-tests.txt
# Install axolotl + dev and test dependencies
export UV_TORCH_BACKEND=cu128 # or cu130
uv venv --no-project --relocatable
source .venv/bin/activate
uv pip install --no-build-isolation -e '.[deepspeed]' --group dev --group test
pre-commit install

# test
Expand Down Expand Up @@ -68,7 +72,12 @@ You can skip certain CI checks by including specific keywords in your commit mes

### Code Style

axolotl uses [{codestyle}]({URLofCodestyle}) as its code style guide. Please ensure that your code follows these guidelines.
axolotl uses [Ruff](https://docs.astral.sh/ruff/) as its code style guide. Please ensure that your code follows these guidelines.

Use the pre-commit linter to ensure that your code is formatted consistently.
```bash
pre-commit run --all-files
```

### Commit Messages

Expand All @@ -78,6 +87,6 @@ Write clear and concise commit messages that briefly describe the changes made i

- [GitHub Help](https://help.github.com/)
- [GitHub Pull Request Documentation](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests)
- [{codestyle}]({URLofCodestyle})
- [Ruff](https://docs.astral.sh/ruff/)

Thank you once again for your interest in contributing to axolotl. We look forward to collaborating with you and creating an even better project together!
6 changes: 3 additions & 3 deletions .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# These are supported funding model platforms

github: [winglian, OpenAccess-AI-Collective] # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: axolotl_ai # Replace with a single Ko-fi username
ko_fi: # Replace with a single Ko-fi username
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
otechie: # Replace with a single Otechie username
lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
custom: ['https://quickchart.io/qr?text=bitcoin%3Abc1qxlgwlqwfea5s2cxm42xqsfmwjct0rj8w8ea5np&size=480&centerImageUrl=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Fthumb%2F4%2F46%2FBitcoin.svg%2F64px-Bitcoin.svg.png'] # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
5 changes: 5 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@
<!--- Include details of your testing environment, tests ran to see how -->
<!--- your change affects other areas of the code, etc. -->

## AI Usage Disclaimer

<!--- Was AI (e.g., ChatGPT, Claude, Copilot) used to generate or assist with this PR? -->
<!--- Please indicate: No / Yes (specify which tool and to what extent) -->

## Screenshots (if appropriate)

## Types of changes
Expand Down
138 changes: 99 additions & 39 deletions .github/workflows/base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,58 +15,77 @@ on:
- '.github/workflows/base.yml'
workflow_dispatch:

permissions:
contents: read

jobs:
build-base:
if: ${{ github.repository_owner == 'axolotl-ai-cloud' && (github.event_name != 'pull_request' || !github.event.pull_request.draft) }}
timeout-minutes: 480
# this job needs to be run on self-hosted GPU runners...
runs-on: ubuntu-latest-m
env:
HAS_DOCKERHUB_CREDS: ${{ secrets.DOCKERHUB_USERNAME != '' && secrets.DOCKERHUB_TOKEN != '' }}
strategy:
fail-fast: false
matrix:
include:
- cuda: "124"
cuda_version: 12.4.1
- cuda: "128"
cuda_version: 12.8.1
cudnn_version: ""
python_version: "3.11"
pytorch: 2.6.0
pytorch: 2.9.1
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-base"
- cuda: "126"
cuda_version: 12.6.3
platforms: "linux/amd64,linux/arm64"
- cuda: "128"
cuda_version: 12.8.1
cudnn_version: ""
python_version: "3.11"
pytorch: 2.6.0
pytorch: 2.10.0
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-base"
- cuda: "126"
cuda_version: 12.6.3
platforms: "linux/amd64,linux/arm64"
- cuda: "128"
cuda_version: 12.8.1
cudnn_version: ""
python_version: "3.11"
pytorch: 2.7.0
python_version: "3.12"
pytorch: 2.10.0
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-base"
- cuda: "126"
cuda_version: 12.6.3
platforms: "linux/amd64,linux/arm64"
# - cuda: "129"
# cuda_version: 12.9.1
# cudnn_version: ""
# python_version: "3.12"
# pytorch: 2.9.1
# torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
# dockerfile: "Dockerfile-base"
# platforms: "linux/amd64,linux/arm64"
- cuda: "130"
cuda_version: 13.0.0
cudnn_version: ""
python_version: "3.11"
pytorch: 2.7.1
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
pytorch: 2.9.1
torch_cuda_arch_list: "9.0+PTX"
dockerfile: "Dockerfile-base"
- cuda: "128"
cuda_version: 12.8.1
platforms: "linux/amd64,linux/arm64"
- cuda: "130"
cuda_version: 13.0.0
cudnn_version: ""
python_version: "3.11"
pytorch: 2.7.1
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
python_version: "3.12"
pytorch: 2.9.1
torch_cuda_arch_list: "9.0+PTX"
dockerfile: "Dockerfile-base"
- cuda: "128"
cuda_version: 12.8.1
platforms: "linux/amd64,linux/arm64"
- cuda: "130"
cuda_version: 13.0.0
cudnn_version: ""
python_version: "3.11"
pytorch: 2.8.0
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
python_version: "3.12"
pytorch: 2.10.0
torch_cuda_arch_list: "9.0+PTX"
dockerfile: "Dockerfile-base"
platforms: "linux/amd64,linux/arm64"
# - cuda: "128"
# cuda_version: 12.8.1
# cudnn_version: ""
Expand All @@ -90,20 +109,21 @@ jobs:
uses: docker/metadata-action@v5
with:
images: |
winglian/axolotl-base
axolotlai/axolotl-base
- name: Login to Docker Hub
uses: docker/login-action@v2
uses: docker/login-action@v3
if: ${{ github.event_name != 'pull_request' && env.HAS_DOCKERHUB_CREDS == 'true' }}
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build
uses: docker/build-push-action@v4
uses: docker/build-push-action@v5
with:
context: .
file: ./docker/${{ matrix.dockerfile }}
platforms: ${{ matrix.platforms }}
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.metadata.outputs.tags }}-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
labels: ${{ steps.metadata.outputs.labels }}
Expand All @@ -118,38 +138,76 @@ jobs:
if: ${{ github.repository_owner == 'axolotl-ai-cloud' && (github.event_name != 'pull_request' || !github.event.pull_request.draft) }}
timeout-minutes: 480
runs-on: ubuntu-latest-m
env:
HAS_DOCKERHUB_CREDS: ${{ secrets.DOCKERHUB_USERNAME != '' && secrets.DOCKERHUB_TOKEN != '' }}
strategy:
fail-fast: false
matrix:
include:
- cuda: "126"
cuda_version: 12.6.3
- cuda: "128"
cuda_version: 12.8.1
cudnn_version: ""
python_version: "3.11"
pytorch: 2.6.0
pytorch: 2.9.1
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-uv-base"
- cuda: "126"
cuda_version: 12.6.3
platforms: "linux/amd64,linux/arm64"
- cuda: "128"
cuda_version: 12.8.1
cudnn_version: ""
python_version: "3.11"
pytorch: 2.7.1
python_version: "3.12"
pytorch: 2.9.1
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
- cuda: "128"
cuda_version: 12.8.1
cudnn_version: ""
python_version: "3.11"
pytorch: 2.7.1
pytorch: 2.10.0
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
- cuda: "128"
cuda_version: 12.8.1
cudnn_version: ""
python_version: "3.11"
pytorch: 2.8.0
python_version: "3.12"
pytorch: 2.10.0
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
# - cuda: "129"
# cuda_version: 12.9.1
# cudnn_version: ""
# python_version: "3.12"
# pytorch: 2.9.1
# torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
# dockerfile: "Dockerfile-uv-base"
# platforms: "linux/amd64,linux/arm64"
- cuda: "130"
cuda_version: 13.0.0
cudnn_version: ""
python_version: "3.11"
pytorch: 2.9.1
torch_cuda_arch_list: "9.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
- cuda: "130"
cuda_version: 13.0.0
cudnn_version: ""
python_version: "3.12"
pytorch: 2.9.1
torch_cuda_arch_list: "9.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
- cuda: "130"
cuda_version: 13.0.0
cudnn_version: ""
python_version: "3.12"
pytorch: 2.10.0
torch_cuda_arch_list: "9.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
steps:
- name: Checkout
uses: actions/checkout@v4
Expand All @@ -160,17 +218,19 @@ jobs:
images: |
axolotlai/axolotl-base-uv
- name: Login to Docker Hub
uses: docker/login-action@v2
uses: docker/login-action@v3
if: ${{ github.event_name != 'pull_request' && env.HAS_DOCKERHUB_CREDS == 'true' }}
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build
uses: docker/build-push-action@v4
uses: docker/build-push-action@v5
with:
context: .
file: ./docker/${{ matrix.dockerfile }}
platforms: ${{ matrix.platforms }}
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.metadata.outputs.tags }}-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
labels: ${{ steps.metadata.outputs.labels }}
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ jobs:
build-deploy:
runs-on: ubuntu-latest
steps:
- name: cleanup node
run: |
sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL
- name: Check out repository
uses: actions/checkout@v4
- name: Set up Quarto
Expand Down
5 changes: 4 additions & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,16 @@ on:
types: [opened, synchronize, reopened, ready_for_review]
paths:
- '**.py'
- 'requirements.txt'
- 'pyproject.toml'
- '.github/workflows/*.yml'
- "*.[q]md"
- "examples/**/*.y[a]?ml"
- ".pre-commit-config.yaml"
workflow_dispatch:

permissions:
contents: read

jobs:
pre-commit:
name: pre-commit
Expand Down
Loading