-
Notifications
You must be signed in to change notification settings - Fork 310
feat: re-enable EmbeddingGemma-300m support #816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
45e4516 to
52451b1
Compare
|
This PR will remain in draft for now—I’m waiting to confirm whether an HF_TOKEN is already configured. |
|
@liavweiss the HF_TOKEN is configured |
8aa1c5e to
cec298c
Compare
@rootfs I added debug lines into test-ci-compose (in the download models section) and the output confirms the issue:
Root cause: GitHub Actions does not make secrets available in Current SolutionThe workflow gracefully skips Gemma download when the token is unavailable(currently implemented only on Integration Docker compose workflow), allowing PRs to pass while still running full tests on push events (after merge) where secrets are available. Options
What do you think about these approaches? |
|
@liavweiss thanks for the analysis. Let's skip it then. If there is a need to use gemma, we'll revisit this issue. |
cec298c to
fee395f
Compare
Signed-off-by: Liav Weiss <[email protected]>
fee395f to
2f67fcb
Compare
Signed-off-by: Liav Weiss <[email protected]>
2f67fcb to
8e1c5ca
Compare
Signed-off-by: Liav Weiss <[email protected]>
f745a5c to
ee7c0c5
Compare

Re-enable Gemma Embedding Model Support
Task
Fix #790
Summary
This PR re-enables support for the
google/embeddinggemma-300mgated model across the codebase, following the resolution of HuggingFace token access (HF_TOKENnow configured by maintainers in CI).Background
The EmbeddingGemma-300m model (
google/embeddinggemma-300m) is a gated model on HuggingFace that requires authentication viaHF_TOKEN. Due to CI/CD authentication limitations, Gemma support was previously disabled in work related to Issue #573 to allow tests to pass without the gated model.Now that the maintainer has configured
HF_TOKENin the CI environment, we can restore full Gemma embedding model support.Changes Made
1. Model Download Configuration (
tools/make/models.mk)✅ Added
embeddinggemma-300mtodownload-models-minimaltarget✅ Added
embeddinggemma-300mtodownload-models-loratarget✅ Updated comments to reflect that
HF_TOKENis now available in CI✅ Added informative echo message for Gemma download
2. Go Test Constants (
candle-binding/semantic-router_test.go)✅ Set
GemmaEmbeddingModelPath = "../models/embeddinggemma-300m"✅ Removed
t.Skip()fromInitGemmaOnlytest✅ Updated test assertions to handle both Qwen3 (1024-dim) and Gemma (768-dim) embeddings
✅ Added
isModelInitializationErrorchecks for graceful test skipping when model unavailable3. E2E Profile Configurations
✅ Updated
e2e/profiles/ai-gateway/values.yaml: Addedgemma_model_path✅ Updated
e2e/profiles/dynamic-config/values.yaml:Set
gemma_model_pathto"models/embeddinggemma-300m"Changed
EMBEDDING_MODEL_OVERRIDEfrom"qwen3"to"auto"for intelligent model selection✅ Verified
e2e/profiles/routing-strategies/values.yaml: Already configured correctly4. Helm Chart Configuration (
deploy/helm/semantic-router/values.yaml)✅ Added
embeddinggemma-300mtoinitContainer.modelslist✅ Configured
initContainer.envto useHF_TOKENfrom Kubernetes secret (hf-token-secret)✅ Set
optional: trueto allow deployment even if secret doesn't exist (for local testing)5. Rust Test Fixtures (
candle-binding/src/test_fixtures.rs)✅ Verified
gemma_embedding_model()fixture is correctly implemented✅ Verified
gemma3_model_only()fixture is correctly implemented✅ All Gemma-related Rust tests are enabled and using fixtures correctly
6. GitHub Actions Workflow
The following GitHub Actions workflows were updated to include
HF_TOKENfor downloading gated models:✅
integration-test-k8s.yml- Kubernetes E2E integration tests✅
test-and-build.yml- Main test and build workflow✅
integration-test-docker.yml- Docker Compose integration tests✅
performance-test.yml- Performance benchmarking tests✅
performance-nightly.yml- Nightly performance baseline testsAll workflows now pass
HF_TOKEN: ${{ secrets.HF_TOKEN }}as an environment variable to the model download steps, enabling successful download of the gatedembeddinggemma-300mmodel.7. Quickstart Script (
scripts/quickstart.sh)✅ Updated fallback messages to clearly indicate
HF_TOKENrequirement✅ Added pre-download check to warn users if
HF_TOKENis not set✅ Improved error messages with actionable instructions for setting
HF_TOKEN✅ Added note about feature limitations when Gemma is skipped
8. E2E Framework HF_TOKEN Secret Creation (
e2e/pkg/framework/runner.go)✅ Added
createHFTokenSecret()function: Creates a Kubernetes secret namedhf-token-secretin thevllm-semantic-router-systemnamespace from theHF_TOKENenvironment variable. This secret is required for the init container to download gated models likegoogle/embeddinggemma-300m.✅ Added secret creation call in
Run()method: After creating the Kubernetes client, the framework checks for theHF_TOKENenvironment variable and automatically creates the secret if present. This ensures that all E2E profiles that require gated model downloads have access to the authentication token.HF_TOKENis not set401 UnauthorizedandGatedRepoErrorissues that were preventing Gemma model downloads in E2E tests9. Helm Deployment Template Fix (
deploy/helm/semantic-router/templates/deployment.yaml)initContainer.envsection (nindent 10changed tonindent 8) that caused Helm installation failures wheninitContainer.envwas actually definedTesting
Local Verification ✅
✅ Go linter: Passed (0 issues)
✅ Go mod tidy: Passed
✅ Configuration files verified: All changes in place
✅ Model download: Correctly attempts to download Gemma (401 expected without HF_TOKEN)
✅ Quickstart script: Updated messages verified
Expected CI Behavior
✅ I ran on my fork
test-and-build.yml- Passed✅ I ran on my fork
integration-test-k8s.yml:- Passed