v0.4.3: fix for on Llama4, device memory usage details, vLLM container accepts params

Latest

Latest

tengomucho released this 10 Dec 16:23

9b9eb9a

What's Changed

Inference

Reduce llm tests by @dacorvo in #1033
(Re-)disable hard-coded check in vLLM ModelConfig (fix for llama4) by @dacorvo in #1035
fix: Flux timeout issue + nxd implementation refactoring by @JingyaHuang in #1022
vLLM docker takes params by @tengomucho in #1039

Other

AWS Neuron SDK 2.6.1 by @dacorvo in #1037
Device memory usage by @dacorvo in #1036
Cleanup CI workflows and bump development version by @dacorvo in #1034

Full Changelog: v0.4.2...v0.4.3

Contributors

dacorvo, tengomucho, and JingyaHuang

Assets 2