feat: re-enable EmbeddingGemma-300m support #816

liavweiss · 2025-12-11T20:39:57Z

Re-enable Gemma Embedding Model Support

Task

Summary

This PR re-enables support for the google/embeddinggemma-300m gated model across the codebase, following the resolution of HuggingFace token access (HF_TOKEN now configured by maintainers in CI).

Background

The EmbeddingGemma-300m model (google/embeddinggemma-300m) is a gated model on HuggingFace that requires authentication via HF_TOKEN. Due to CI/CD authentication limitations, Gemma support was previously disabled in work related to Issue #573 to allow tests to pass without the gated model.

Now that the maintainer has configured HF_TOKEN in the CI environment, we can restore full Gemma embedding model support.

Changes Made

1. Model Download Configuration (`tools/make/models.mk`)

✅ Added embeddinggemma-300m to download-models-minimal target
✅ Added embeddinggemma-300m to download-models-lora target
✅ Updated comments to reflect that HF_TOKEN is now available in CI
✅ Added informative echo message for Gemma download

2. Go Test Constants (`candle-binding/semantic-router_test.go`)

✅ Set GemmaEmbeddingModelPath = "../models/embeddinggemma-300m"
✅ Removed t.Skip() from InitGemmaOnly test
✅ Updated test assertions to handle both Qwen3 (1024-dim) and Gemma (768-dim) embeddings
✅ Added isModelInitializationError checks for graceful test skipping when model unavailable

3. E2E Profile Configurations

✅ Updated e2e/profiles/ai-gateway/values.yaml: Added gemma_model_path
✅ Updated e2e/profiles/dynamic-config/values.yaml:
- Set gemma_model_path to "models/embeddinggemma-300m"
- Changed EMBEDDING_MODEL_OVERRIDE from "qwen3" to "auto" for intelligent model selection
✅ Verified e2e/profiles/routing-strategies/values.yaml: Already configured correctly

4. Helm Chart Configuration (`deploy/helm/semantic-router/values.yaml`)

✅ Added embeddinggemma-300m to initContainer.models list
✅ Configured initContainer.env to use HF_TOKEN from Kubernetes secret (hf-token-secret)
✅ Set optional: true to allow deployment even if secret doesn't exist (for local testing)

5. Rust Test Fixtures (`candle-binding/src/test_fixtures.rs`)

✅ Verified gemma_embedding_model() fixture is correctly implemented
✅ Verified gemma3_model_only() fixture is correctly implemented
✅ All Gemma-related Rust tests are enabled and using fixtures correctly

6. GitHub Actions Workflow

The following GitHub Actions workflows were updated to include HF_TOKEN for downloading gated models:

✅ integration-test-k8s.yml - Kubernetes E2E integration tests
✅ test-and-build.yml - Main test and build workflow
✅ integration-test-docker.yml - Docker Compose integration tests
✅ performance-test.yml - Performance benchmarking tests
✅ performance-nightly.yml - Nightly performance baseline tests

All workflows now pass HF_TOKEN: ${{ secrets.HF_TOKEN }} as an environment variable to the model download steps, enabling successful download of the gated embeddinggemma-300m model.

7. Quickstart Script (`scripts/quickstart.sh`)

✅ Updated fallback messages to clearly indicate HF_TOKEN requirement
✅ Added pre-download check to warn users if HF_TOKEN is not set
✅ Improved error messages with actionable instructions for setting HF_TOKEN
✅ Added note about feature limitations when Gemma is skipped

8. E2E Framework HF_TOKEN Secret Creation (`e2e/pkg/framework/runner.go`)

✅ Added createHFTokenSecret() function: Creates a Kubernetes secret named hf-token-secret in the vllm-semantic-router-system namespace from the HF_TOKEN environment variable. This secret is required for the init container to download gated models like google/embeddinggemma-300m.
- Ensures the namespace exists (creating it if necessary) before creating the secret
- Handles cases where the secret already exists (updates it) or the namespace doesn't exist yet (logs a warning, as Helm will create it)
- The secret is namespace-scoped and must be in the same namespace as the semantic-router deployment
✅ Added secret creation call in Run() method: After creating the Kubernetes client, the framework checks for the HF_TOKEN environment variable and automatically creates the secret if present. This ensures that all E2E profiles that require gated model downloads have access to the authentication token.
- Logs appropriate messages indicating whether the secret was created successfully, or if HF_TOKEN is not set
- This fix resolves the 401 Unauthorized and GatedRepoError issues that were preventing Gemma model downloads in E2E tests

9. Helm Deployment Template Fix (`deploy/helm/semantic-router/templates/deployment.yaml`)

✅ Fixed YAML indentation issue in initContainer.env section (nindent 10 changed to nindent 8) that caused Helm installation failures when initContainer.env was actually defined

Testing

Local Verification ✅

✅ Go linter: Passed (0 issues)
✅ Go mod tidy: Passed
✅ Configuration files verified: All changes in place
✅ Model download: Correctly attempts to download Gemma (401 expected without HF_TOKEN)
✅ Quickstart script: Updated messages verified

Expected CI Behavior

✅ I ran on my fork test-and-build.yml - Passed
✅ I ran on my fork integration-test-k8s.yml:- Passed

netlify · 2025-12-11T20:40:04Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`ee7c0c5`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/6940709e3eecad00082c64ba
😎 Deploy Preview	https://deploy-preview-816--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-12-11T20:40:18Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

.github/workflows/integration-test-docker.yml
.github/workflows/integration-test-k8s.yml
.github/workflows/performance-nightly.yml
.github/workflows/performance-test.yml
.github/workflows/test-and-build.yml
scripts/quickstart.sh

📁 `candle-binding`

Owners: @rootfs
Files changed:

candle-binding/semantic-router_test.go

📁 `config`

Owners: @rootfs, @Xunzhuo
Files changed:

config/model_manager/models.lora.yaml
config/model_manager/models.minimal.yaml

📁 `deploy`

Owners: @rootfs, @Xunzhuo
Files changed:

deploy/helm/semantic-router/templates/deployment.yaml
deploy/helm/semantic-router/values.yaml

📁 `e2e`

Owners: @Xunzhuo
Files changed:

e2e/pkg/framework/runner.go
e2e/profiles/ai-gateway/values.yaml
e2e/profiles/dynamic-config/values.yaml

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/model_manager/__init__.py
src/model_manager/downloader.py
src/model_manager/errors.py

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

liavweiss · 2025-12-11T21:20:07Z

This PR will remain in draft for now—I’m waiting to confirm whether an HF_TOKEN is already configured.

rootfs · 2025-12-12T00:45:55Z

@liavweiss the HF_TOKEN is configured

liavweiss · 2025-12-13T18:59:50Z

@liavweiss the HF_TOKEN is configured

@rootfs I added debug lines into test-ci-compose (in the download models section) and the output confirms the issue:

Repository context: vllm-project/semantic-router (workflow runs in upstream ✅)
Event: pull_request
PR head repo: liavweiss/semantic-router (PR is from a fork)
HF_TOKEN: Not available ❌

Root cause: GitHub Actions does not make secrets available in pull_request events when the PR is from a fork, even though the workflow runs in the upstream repository context. This is a security feature to prevent malicious fork code from accessing secrets.

Current Solution

The workflow gracefully skips Gemma download when the token is unavailable(currently implemented only on Integration Docker compose workflow), allowing PRs to pass while still running full tests on push events (after merge) where secrets are available.

Options

Keep current approach (graceful skip) - Secure, but contributors can't test Gemma on PRs (only if they set a private hf_token in their fork)
Use pull_request_target - Allows secrets, but has security implications
Hybrid: Keep graceful skip + document manual testing via workflow_dispatch with fork secrets

What do you think about these approaches?
My suggestion is to keep the current approach (graceful skip). It's secure and allows contributors to test manually via workflow_dispatch with their fork secrets (but we need to document it).

rootfs · 2025-12-15T13:57:51Z

@liavweiss thanks for the analysis. Let's skip it then. If there is a need to use gemma, we'll revisit this issue.

Signed-off-by: Liav Weiss <[email protected]>

liavweiss requested review from Xunzhuo and rootfs as code owners December 11, 2025 20:39

github-actions bot assigned rootfs, Xunzhuo and yuluo-yx Dec 11, 2025

liavweiss force-pushed the feature/gemma-model-enable branch 3 times, most recently from 45e4516 to 52451b1 Compare December 11, 2025 21:05

liavweiss marked this pull request as draft December 11, 2025 21:18

samzong mentioned this pull request Dec 12, 2025

bug: Access to restricted models like google/embeddinggemma-300m requires authorization(HF TOKEN) #817

Open

liavweiss force-pushed the feature/gemma-model-enable branch 2 times, most recently from 8aa1c5e to cec298c Compare December 13, 2025 18:46

liavweiss marked this pull request as ready for review December 13, 2025 18:47

liavweiss marked this pull request as draft December 13, 2025 18:54

liavweiss force-pushed the feature/gemma-model-enable branch from cec298c to fee395f Compare December 15, 2025 20:07

liavweiss marked this pull request as ready for review December 15, 2025 20:08

feat: re-enable EmbeddingGemma-300m support

5cc83e3

Signed-off-by: Liav Weiss <[email protected]>

liavweiss force-pushed the feature/gemma-model-enable branch from fee395f to 2f67fcb Compare December 15, 2025 20:12

fix: resolve remaining conflict in models.mk after rebase

8e1c5ca

Signed-off-by: Liav Weiss <[email protected]>

liavweiss force-pushed the feature/gemma-model-enable branch from 2f67fcb to 8e1c5ca Compare December 15, 2025 20:13

feat: re-enable EmbeddingGemma-300m support

ee7c0c5

Signed-off-by: Liav Weiss <[email protected]>

liavweiss force-pushed the feature/gemma-model-enable branch from f745a5c to ee7c0c5 Compare December 15, 2025 20:33

liavweiss requested a review from wangchen615 as a code owner December 15, 2025 20:33

github-actions bot assigned wangchen615 Dec 15, 2025

liavweiss marked this pull request as draft December 15, 2025 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: re-enable EmbeddingGemma-300m support #816

feat: re-enable EmbeddingGemma-300m support #816

liavweiss commented Dec 11, 2025

Uh oh!

netlify bot commented Dec 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 11, 2025 •

edited

Loading

Uh oh!

liavweiss commented Dec 11, 2025

Uh oh!

rootfs commented Dec 12, 2025

Uh oh!

liavweiss commented Dec 13, 2025 •

edited

Loading

Uh oh!

rootfs commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: re-enable EmbeddingGemma-300m support #816

Are you sure you want to change the base?

feat: re-enable EmbeddingGemma-300m support #816

Conversation

liavweiss commented Dec 11, 2025

Re-enable Gemma Embedding Model Support

Task

Summary

Background

Changes Made

1. Model Download Configuration (tools/make/models.mk)

2. Go Test Constants (candle-binding/semantic-router_test.go)

3. E2E Profile Configurations

4. Helm Chart Configuration (deploy/helm/semantic-router/values.yaml)

5. Rust Test Fixtures (candle-binding/src/test_fixtures.rs)

6. GitHub Actions Workflow

7. Quickstart Script (scripts/quickstart.sh)

8. E2E Framework HF_TOKEN Secret Creation (e2e/pkg/framework/runner.go)

9. Helm Deployment Template Fix (deploy/helm/semantic-router/templates/deployment.yaml)

Testing

Local Verification ✅

Expected CI Behavior

Uh oh!

netlify bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 Root Directory

📁 candle-binding

📁 config

📁 deploy

📁 e2e

📁 src

🎉 Thanks for your contributions!

Uh oh!

liavweiss commented Dec 11, 2025

Uh oh!

rootfs commented Dec 12, 2025

Uh oh!

liavweiss commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current Solution

Options

Uh oh!

rootfs commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

1. Model Download Configuration (`tools/make/models.mk`)

2. Go Test Constants (`candle-binding/semantic-router_test.go`)

4. Helm Chart Configuration (`deploy/helm/semantic-router/values.yaml`)

5. Rust Test Fixtures (`candle-binding/src/test_fixtures.rs`)

7. Quickstart Script (`scripts/quickstart.sh`)

8. E2E Framework HF_TOKEN Secret Creation (`e2e/pkg/framework/runner.go`)

9. Helm Deployment Template Fix (`deploy/helm/semantic-router/templates/deployment.yaml`)

netlify bot commented Dec 11, 2025 •

edited

Loading

github-actions bot commented Dec 11, 2025 •

edited

Loading

📁 `Root Directory`

📁 `candle-binding`

📁 `config`

📁 `deploy`

📁 `e2e`

📁 `src`

liavweiss commented Dec 13, 2025 •

edited

Loading