Skip to content

Commit aacd140

Browse files
authored
feat: embedding provider now defaults to vLLM (#148)
# What does this PR do? this commit removes the inline::sentence-transformer provider as the default embedding mode provider as it was causing undesired load on the server CPU process now default to a new vLLM provider dedicated to embedding or a different provider specified via EMBEDDING_PROVIDER inline::sentence-transformers must now be enabled by setting ENABLE_SENTENCE_TRANSFORMERS ## Summary by CodeRabbit * **New Features** * Added support for a remote VLLM embedding provider with configurable endpoint, token and TLS options; embedding provider can be selected via environment variables. * **Documentation** * Embedding defaults updated: sentence-transformers is now disabled by default and can be enabled via an environment flag. * Guidance added to configure the VLLM embedding URL alongside existing VLLM settings. * **Tests** * Test startup adjusted to set env vars to exercise provider selection. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> Approved-by: VaishnaviHire Approved-by: cdoern
2 parents e401d59 + 6303442 commit aacd140

File tree

3 files changed

+13
-4
lines changed

3 files changed

+13
-4
lines changed

distribution/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,13 @@ You can see an overview of the APIs and Providers the image ships with in the ta
1919
| eval | remote::trustyai_ragas | Yes (version 0.5.1) || Set the `KUBEFLOW_LLAMA_STACK_URL` environment variable |
2020
| files | inline::localfs | No || N/A |
2121
| files | remote::s3 | No || Set the `ENABLE_S3` environment variable |
22-
| inference | inline::sentence-transformers | No | | N/A |
22+
| inference | inline::sentence-transformers | No | | Set the `ENABLE_SENTENCE_TRANSFORMERS` environment variable |
2323
| inference | remote::azure | No || Set the `AZURE_API_KEY` environment variable |
2424
| inference | remote::bedrock | No || Set the `AWS_ACCESS_KEY_ID` environment variable |
2525
| inference | remote::openai | No || Set the `OPENAI_API_KEY` environment variable |
2626
| inference | remote::vertexai | No || Set the `VERTEX_AI_PROJECT` environment variable |
2727
| inference | remote::vllm | No || Set the `VLLM_URL` environment variable |
28+
| inference | remote::vllm | No || Set the `VLLM_EMBEDDING_URL` environment variable |
2829
| inference | remote::watsonx | No || Set the `WATSONX_API_KEY` environment variable |
2930
| safety | remote::trustyai_fms | Yes (version 0.3.1) || N/A |
3031
| scoring | inline::basic | No || N/A |

distribution/run.yaml

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,13 @@ providers:
2020
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
2121
api_token: ${env.VLLM_API_TOKEN:=fake}
2222
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
23+
- provider_id: ${env.VLLM_EMBEDDING_URL:+vllm-embedding}
24+
provider_type: remote::vllm
25+
config:
26+
url: ${env.VLLM_EMBEDDING_URL:=}
27+
max_tokens: ${env.VLLM_EMBEDDING_MAX_TOKENS:=4096}
28+
api_token: ${env.VLLM_EMBEDDING_API_TOKEN:=fake}
29+
tls_verify: ${env.VLLM_EMBEDDING_TLS_VERIFY:=true}
2330
- provider_id: ${env.AWS_ACCESS_KEY_ID:+bedrock}
2431
provider_type: remote::bedrock
2532
config:
@@ -33,7 +40,7 @@ providers:
3340
connect_timeout: ${env.AWS_CONNECT_TIMEOUT:=60}
3441
read_timeout: ${env.AWS_READ_TIMEOUT:=60}
3542
session_ttl: ${env.AWS_SESSION_TTL:=3600}
36-
- provider_id: sentence-transformers
43+
- provider_id: ${env.ENABLE_SENTENCE_TRANSFORMERS:+sentence-transformers}
3744
provider_type: inline::sentence-transformers
3845
config: {}
3946
- provider_id: ${env.WATSONX_API_KEY:+watsonx}
@@ -256,11 +263,10 @@ registered_resources:
256263
model_id: ${env.INFERENCE_MODEL}
257264
provider_id: vllm-inference
258265
model_type: llm
259-
260266
- metadata:
261267
embedding_dimension: 768
262268
model_id: granite-embedding-125m
263-
provider_id: sentence-transformers
269+
provider_id: ${env.EMBEDDING_PROVIDER:=vllm-embedding}
264270
provider_model_id: ibm-granite/granite-embedding-125m-english
265271
model_type: embedding
266272
shields: []

tests/smoke.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ function start_and_wait_for_llama_stack_container {
1212
--env INFERENCE_MODEL="$INFERENCE_MODEL" \
1313
--env EMBEDDING_MODEL="$EMBEDDING_MODEL" \
1414
--env VLLM_URL="$VLLM_URL" \
15+
--env ENABLE_SENTENCE_TRANSFORMERS=True \
16+
--env EMBEDDING_PROVIDER=sentence-transformers \
1517
--env TRUSTYAI_LMEVAL_USE_K8S=False \
1618
--name llama-stack \
1719
"$IMAGE_NAME:$GITHUB_SHA"

0 commit comments

Comments
 (0)