v0.2.0
What's Changed
- Bump to AWS neuron sdk 2.22 by @JingyaHuang in #828
- chore: bump AMI base version for Neuron SDK 2.22 by @dacorvo in #831
Inference
- Cache granite and phi4 models by @dacorvo in #809
- Refactor hub neuronx cache by @dacorvo in #829
- Add Whisper for the task "automatic-speech-recognition" w/o. KV cache by @JingyaHuang in #789
- Add support for Modern BERT by @JingyaHuang in #818
- Set task to none for multi models cache entry by @dacorvo in #832
- ci: add cv2 to workaround transformers spurious import by @dacorvo in #834
- Refactor decoder modeling by @dacorvo in #835
- Refactor decoder export by @dacorvo in #837
- Add decoder custom modeling for inference based on NxD by @dacorvo in #840
- Activate continuous batching for Llama on NxD by @dacorvo in #848
- Tgi integration by @dacorvo in #855
- Avoid loading weights when exporting an NxD model using the CLI by @dacorvo in #860
- test(speculation): do not load weights during export by @dacorvo in #861
Training
- Training remove gpt neo models support by @tengomucho in #807
- chore(test): add test comparing Linear and RowParallelLinear outputs by @tengomucho in #814
- More training tests updates by @tengomucho in #808
- test(training): add flash attention test by @tengomucho in #824
- Granite modeling for training by @tengomucho in #830
- Cache Hub API Changes by @tengomucho in #836
- Custom modeling for training by @michaelbenayoun in #801
- 🪨 Granite Training by @tengomucho in #845
- Training granite warning flash attention by @michaelbenayoun in #849
- Add Qwen3 modeling for training by @tengomucho in #850
Documentation
- latest available tgi dlc uri by @pagezyhf in #812
- Add guidelines on EC2 creation with the DLAMI by @pagezyhf in #795
- Add per service section in tutorials and a first example for tutorial > inference > SageMaker by @pagezyhf in #796
- Mixtral Sagemaker Inference tutorial by @pagezyhf in #820
- spelling nit in pipelines.mdx by @jimburtoft in #823
- Initial PR for the documentation refactoring by @JingyaHuang in #791
- training dlc doc by @pagezyhf in #844
- Adding environment options explanation by @jimburtoft in #798
- Update the list of supported LLM models by @dacorvo in #859
- Update Llama benchmarks by @dacorvo in #858
- feat: Add Continuous pre-training example for SageMaker hyperpod by @Captainia in #842
- Fix typos by @omahs in #846
Bug fixes
- Fix broken cache for traced models & fix runtime error of diffusion models when batch_size > 1 by @JingyaHuang in #811
- Fix doc ci by @JingyaHuang in #838
New Contributors
- @omahs made their first contribution in #846
- @Captainia made their first contribution in #842
Full Changelog: v0.1.0...v0.2.0