Open
Description
Owner: @RdoubleA
Estimated branch cut date: February 4, 2025
Estimated release date: February 11, 2025
Please comment below if there is a feature or fix you would like to see in the next release that is not listed below.
If you are tagged in any feature below, please comment on this thread if you are not able to merge it in by the branch cut date, or if you are blocked on anything.
New features
- Implement activation offloading and opt_in_bwd in knowledge_distillation recipes #2088 - @AnuravModak
- Add Phi4 #2197 - @krammnic
- Full DPO Distributed #2275 - @SalmanMohammadi @sam-pi
- Add masking strategies to message transforms #2284 - @RdoubleA @supreethmanyam
- Adds validation loss to LoRA fine tune single device #2238 - @felipemello1 @MaxFrax
- Adding reverse and symmetric KLD losses #2094 - @insop
- [RFC] Proposal for
tune cat
Command #2281 - @joecummings @Ankur-singh - adding support for LR schedule for full distributed finetune #2263
- Logging resolved config #2274
- Added Distributed(Tensor Parallel) Inference Recipe #2245
- Sample packing for ConcatDataset #2278
- Add Ascend NPU as a backend for single device recipes #2234
- PPO Performance Improvements #2066
- Multi-tile support in vision rope #2247
- Update checkpointing directory -> using vLLM and from_pretrained #2074
- Llama3.2 3B eval #2186
- Add evaluation file for code_llama2 model #2209
- Add eval config for QWEN2_5 model using 0.5B variant #2230
- T5 Encoder #2069
- Migrate distributed state dict API #2138
- Flux Autoencoder #2098
Better engineering
- Refactored modules/tokenizers to be a subdir of modules/transforms #2231 - @Ankur-singh
- Update model builders #2282 - @ebsmothers @Ankur-singh
- Update QuantizationRecipe to use checkpointer.save_checkpoint #2257 - @Ankur-singh
- Adds clip_grad_norm to all recipe config that supports it #2220
- llama 3.1 has correct
max_seq_len
for all versions #2203 - Log grad norm aggregated over all ranks, not just rank zero #2248
- Remove example inputs from aoti_compile_and_package #2244
- Small formatting fix #2256
Documentation
- Documentation for diffusion model components - @calvinpelletier
- Documentation for TP utilities - @acisseJZhong
- Update the e2e flow tutorial to fix errors of generate #2251 - @RdoubleA @isseyuan
- Add AlpacaToMessages to message transforms doc page #2265
- Adds message_transform link from SFTDataset docstring to docs #2219
- Change alpaca_dataset train_on_input doc to match default value #2227
- Set default value for 'subset' parameter in the_cauldron_dataset #2228
- Update E2E Tutorial w/ vLLM and HF Hub #2192
- [Small fix] Update CUDA version in README #2242
- Fix issue #2243, update the document to show correct usage #2252
Bug Fixes
- [EZ] Pass seed to data sampler. #2266 - @RdoubleA @EugenHotaj
- Finetune meta-llama/Llama-Guard-3-1B #2237 - @RdoubleA
- Lora and Dora finetuning produces identical results #2250 - @ebsmothers
- Very slow convergence with bf16 #2254 - @felipemello1
- Llama3.2 vision does not run with distributed state dict #2277 - @acisseJZhong
- Construct EarlyFusion's encoder_token_ids on correct device #2276
- Fix a bug in set float32 precision #2271
- Fix tests due to upgrade to cuda126 #2260
- Fixing docstring linter #2163
- Add a "division by zero" check in chunked loss handling in kd_losses.py #2239
- [metric_logging][wandb] Fix wandb metric logger config save path #2196
- [EZ] Fix config bug where interpolation happens too early #2236
Deprecations/Removals
Metadata
Assignees
Labels
No labels