GitHub · Where software is built

Preview

v0.6.0 tracker
#2232 · joecummings opened on Jan 6, 2025
Testing tracker
#1890 · felipemello1 opened on Oct 23, 2024

Labels Milestones New issue

How to separately load a local llama8B and a self-trained LoRA adapter during distillation?

#2520

· whale2133 opened

on Mar 21, 2025

Use foreach compilable scale/grad / clip_grad

#2517

· IvanKobzarev opened

on Mar 19, 2025

Add torch.compile to optimizer.step()

#2516

· IvanKobzarev opened

on Mar 19, 2025

Unnecessarily scaling gradients when gradient_accumulation_steps is 1

#2515

· shunting314 opened

on Mar 19, 2025

[Model] Support for Mistral Small 3.1

#2508

· maximegmd opened

on Mar 17, 2025

Can we decouple the data preprocssing/tokenization step from the fine-tuning phase?

#2497

· Electronic-Waste opened

on Mar 13, 2025

cross-attention in llama 3.2 vision has is_causal=False

#2493

· mmehdig opened

on Mar 13, 2025

Consolidate tok_encode logic between _LLMEvalWrapper and _VLMEvalWrapper

#2488

· pbontrager opened

on Mar 12, 2025

Gemma3

#2484

· krammnic opened

on Mar 12, 2025

recursive_reshard

#2483

· caiqi opened

on Mar 12, 2025

Add add_end_token to the Qwen Models

community help wanted

good first issue

#2481

· pbontrager opened

on Mar 11, 2025

Add add_end_token to Phi tokenizers

community help wanted

good first issue

#2480

· pbontrager opened

on Mar 11, 2025