Open
Description
This issue tracks which HOWTOs we would like to add.
Process
- If you like to work on HOWTO, please create a PR for it and mention this issue in the PR description, once it is merged we will check the box below.
- If you think we should add a HOWTO, please reply to this issue and we will add it to the list below.
HOWTOs
- Data-parallel training. @gmittal mentioned they would like to add this based on Add multi-GPU support for seq2seq example #1982.
- Best practices for dynamic length inputs.
- Loading MNIST from torchvision and HuggingFace dataset (see Add HOWTO explaining how to load MNIST from torchvision and HuggingFace datasets #1853 for more details).
- Correctly dealing with the last batch during eval (see Write a guide, HOWTO and/or example that shows how to correctly deal with the last batch during eval #1850 for more details).
- Gradient checkpointing.
- Using
nn.apply
andnn.bind
(See HOWTO use nn.apply and bind #1087). - Mixed precision training (suggested by @lkhphuc).
- Dropout guide (similar to BatchNorm guide)
- How to load from different datasets: torch, tf.data, HuggingFace and explain that in Flax we really only care about jax numpy arrays.
- How to do gradient accumulation
- Freezing parameters
- Training with multiple optimizers
- Gradient checkpointing.
- Flax RNG Design
- Using scan-over-layers to trade off peak memory with speed
- How to use Module.bind()