What's Changed
- doc: add a guide to explain how vLLM deployment on Inference Endpoints by @tengomucho in #1057
- Add Qwen embedding guide and notebook by @pinak-p in #1045
- Serve embedding models using vLLM by @dacorvo in #1072
Other changes
- Update container URIs by @dacorvo in #1056
- Implement predownload and instance type detection for trn2 example by @jimburtoft in #1041
- Fix vllm IE Images by @tengomucho in #1058
- Fixed broken link to Neuron setup in README.md by @mlopezr in #1059
- Update LLM deployment documentation by @dacorvo in #1060
- doc: remove last mentions of TGI by @dacorvo in #1061
- Optimize cache lookup by @dacorvo in #1062
- Update vLLM container for Sagemaker to v0.4.4 in documentation by @tengomucho in #1063
- Optimize lookup by @dacorvo in #1066
- Fix cache registry for embedding models by @dacorvo in #1067
- dlc doc vllm new tag by @pagezyhf in #1069
- chore: add agentic instructions by @dacorvo in #1070
- Ci sequential and cache by @tengomucho in #1075
- Fix ci sanity in PRs by @tengomucho in #1077
- fix(exporter): remove deprecation warning by @dacorvo in #1076
- feat(docker): vllm container uses uv to install package by @tengomucho in #1071
New Contributors
Full Changelog: v0.4.4...v0.4.5