v0.4.5: serving LLM Embeddings models

Latest

Latest

dacorvo released this 11 Feb 10:28

abf5985

What's Changed

doc: add a guide to explain how vLLM deployment on Inference Endpoints by @tengomucho in #1057
Add Qwen embedding guide and notebook by @pinak-p in #1045
Serve embedding models using vLLM by @dacorvo in #1072

Other changes

Update container URIs by @dacorvo in #1056
Implement predownload and instance type detection for trn2 example by @jimburtoft in #1041
Fix vllm IE Images by @tengomucho in #1058
Fixed broken link to Neuron setup in README.md by @mlopezr in #1059
Update LLM deployment documentation by @dacorvo in #1060
doc: remove last mentions of TGI by @dacorvo in #1061
Optimize cache lookup by @dacorvo in #1062
Update vLLM container for Sagemaker to v0.4.4 in documentation by @tengomucho in #1063
Optimize lookup by @dacorvo in #1066
Fix cache registry for embedding models by @dacorvo in #1067
dlc doc vllm new tag by @pagezyhf in #1069
chore: add agentic instructions by @dacorvo in #1070
Ci sequential and cache by @tengomucho in #1075
Fix ci sanity in PRs by @tengomucho in #1077
fix(exporter): remove deprecation warning by @dacorvo in #1076
feat(docker): vllm container uses uv to install package by @tengomucho in #1071

New Contributors

@pinak-p made their first contribution in #1045
@mlopezr made their first contribution in #1059

Full Changelog: v0.4.4...v0.4.5

Contributors

dacorvo, tengomucho, and 4 other contributors

Assets 2