What's Changed
Inference
- Reduce llm tests by @dacorvo in #1033
- (Re-)disable hard-coded check in vLLM ModelConfig (fix for llama4) by @dacorvo in #1035
- fix: Flux timeout issue + nxd implementation refactoring by @JingyaHuang in #1022
- vLLM docker takes params by @tengomucho in #1039
Other
- AWS Neuron SDK 2.6.1 by @dacorvo in #1037
- Device memory usage by @dacorvo in #1036
- Cleanup CI workflows and bump development version by @dacorvo in #1034
Full Changelog: v0.4.2...v0.4.3