Skip to content

Conversation

@liavweiss
Copy link
Contributor

Re-enable Gemma Embedding Model Support

Task

Fix #790

Summary

This PR re-enables support for the google/embeddinggemma-300m gated model across the codebase, following the resolution of HuggingFace token access (HF_TOKEN now configured by maintainers in CI).

Background

The EmbeddingGemma-300m model (google/embeddinggemma-300m) is a gated model on HuggingFace that requires authentication via HF_TOKEN. Due to CI/CD authentication limitations, Gemma support was previously disabled in work related to Issue #573 to allow tests to pass without the gated model.

Now that the maintainer has configured HF_TOKEN in the CI environment, we can restore full Gemma embedding model support.

Changes Made

1. Model Download Configuration (tools/make/models.mk)

  • ✅ Added embeddinggemma-300m to download-models-minimal target

  • ✅ Added embeddinggemma-300m to download-models-lora target

  • ✅ Updated comments to reflect that HF_TOKEN is now available in CI

  • ✅ Added informative echo message for Gemma download

2. Go Test Constants (candle-binding/semantic-router_test.go)

  • ✅ Set GemmaEmbeddingModelPath = "../models/embeddinggemma-300m"

  • ✅ Removed t.Skip() from InitGemmaOnly test

  • ✅ Updated test assertions to handle both Qwen3 (1024-dim) and Gemma (768-dim) embeddings

  • ✅ Added isModelInitializationError checks for graceful test skipping when model unavailable

3. E2E Profile Configurations

  • ✅ Updated e2e/profiles/ai-gateway/values.yaml: Added gemma_model_path

  • ✅ Updated e2e/profiles/dynamic-config/values.yaml:

    • Set gemma_model_path to "models/embeddinggemma-300m"

    • Changed EMBEDDING_MODEL_OVERRIDE from "qwen3" to "auto" for intelligent model selection

  • ✅ Verified e2e/profiles/routing-strategies/values.yaml: Already configured correctly

4. Helm Chart Configuration (deploy/helm/semantic-router/values.yaml)

  • ✅ Added embeddinggemma-300m to initContainer.models list

  • ✅ Configured initContainer.env to use HF_TOKEN from Kubernetes secret (hf-token-secret)

  • ✅ Set optional: true to allow deployment even if secret doesn't exist (for local testing)

5. Rust Test Fixtures (candle-binding/src/test_fixtures.rs)

  • ✅ Verified gemma_embedding_model() fixture is correctly implemented

  • ✅ Verified gemma3_model_only() fixture is correctly implemented

  • ✅ All Gemma-related Rust tests are enabled and using fixtures correctly

6. GitHub Actions Workflow

The following GitHub Actions workflows were updated to include HF_TOKEN for downloading gated models:

  1. integration-test-k8s.yml - Kubernetes E2E integration tests

  2. test-and-build.yml - Main test and build workflow

  3. integration-test-docker.yml - Docker Compose integration tests

  4. performance-test.yml - Performance benchmarking tests

  5. performance-nightly.yml - Nightly performance baseline tests

All workflows now pass HF_TOKEN: ${{ secrets.HF_TOKEN }} as an environment variable to the model download steps, enabling successful download of the gated embeddinggemma-300m model.

7. Quickstart Script (scripts/quickstart.sh)

  • ✅ Updated fallback messages to clearly indicate HF_TOKEN requirement

  • ✅ Added pre-download check to warn users if HF_TOKEN is not set

  • ✅ Improved error messages with actionable instructions for setting HF_TOKEN

  • ✅ Added note about feature limitations when Gemma is skipped

8. E2E Framework HF_TOKEN Secret Creation (e2e/pkg/framework/runner.go)

  • Added createHFTokenSecret() function: Creates a Kubernetes secret named hf-token-secret in the vllm-semantic-router-system namespace from the HF_TOKEN environment variable. This secret is required for the init container to download gated models like google/embeddinggemma-300m.

    • Ensures the namespace exists (creating it if necessary) before creating the secret
    • Handles cases where the secret already exists (updates it) or the namespace doesn't exist yet (logs a warning, as Helm will create it)
    • The secret is namespace-scoped and must be in the same namespace as the semantic-router deployment
  • Added secret creation call in Run() method: After creating the Kubernetes client, the framework checks for the HF_TOKEN environment variable and automatically creates the secret if present. This ensures that all E2E profiles that require gated model downloads have access to the authentication token.

    • Logs appropriate messages indicating whether the secret was created successfully, or if HF_TOKEN is not set
    • This fix resolves the 401 Unauthorized and GatedRepoError issues that were preventing Gemma model downloads in E2E tests

9. Helm Deployment Template Fix (deploy/helm/semantic-router/templates/deployment.yaml)

  • ✅ Fixed YAML indentation issue in initContainer.env section (nindent 10 changed to nindent 8) that caused Helm installation failures when initContainer.env was actually defined

Testing

Local Verification ✅

  • ✅ Go linter: Passed (0 issues)

  • ✅ Go mod tidy: Passed

  • ✅ Configuration files verified: All changes in place

  • ✅ Model download: Correctly attempts to download Gemma (401 expected without HF_TOKEN)

  • ✅ Quickstart script: Updated messages verified

Expected CI Behavior

  • ✅ I ran on my fork test-and-build.yml - Passed

  • ✅ I ran on my fork integration-test-k8s.yml:- Passed

@netlify
Copy link

netlify bot commented Dec 11, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit ee7c0c5
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/6940709e3eecad00082c64ba
😎 Deploy Preview https://deploy-preview-816--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

github-actions bot commented Dec 11, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • .github/workflows/integration-test-docker.yml
  • .github/workflows/integration-test-k8s.yml
  • .github/workflows/performance-nightly.yml
  • .github/workflows/performance-test.yml
  • .github/workflows/test-and-build.yml
  • scripts/quickstart.sh

📁 candle-binding

Owners: @rootfs
Files changed:

  • candle-binding/semantic-router_test.go

📁 config

Owners: @rootfs, @Xunzhuo
Files changed:

  • config/model_manager/models.lora.yaml
  • config/model_manager/models.minimal.yaml

📁 deploy

Owners: @rootfs, @Xunzhuo
Files changed:

  • deploy/helm/semantic-router/templates/deployment.yaml
  • deploy/helm/semantic-router/values.yaml

📁 e2e

Owners: @Xunzhuo
Files changed:

  • e2e/pkg/framework/runner.go
  • e2e/profiles/ai-gateway/values.yaml
  • e2e/profiles/dynamic-config/values.yaml

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/model_manager/__init__.py
  • src/model_manager/downloader.py
  • src/model_manager/errors.py

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@liavweiss liavweiss force-pushed the feature/gemma-model-enable branch 3 times, most recently from 45e4516 to 52451b1 Compare December 11, 2025 21:05
@liavweiss liavweiss marked this pull request as draft December 11, 2025 21:18
@liavweiss
Copy link
Contributor Author

This PR will remain in draft for now—I’m waiting to confirm whether an HF_TOKEN is already configured.

@rootfs
Copy link
Collaborator

rootfs commented Dec 12, 2025

@liavweiss the HF_TOKEN is configured

@liavweiss liavweiss force-pushed the feature/gemma-model-enable branch 2 times, most recently from 8aa1c5e to cec298c Compare December 13, 2025 18:46
@liavweiss liavweiss marked this pull request as ready for review December 13, 2025 18:47
@liavweiss liavweiss marked this pull request as draft December 13, 2025 18:54
@liavweiss
Copy link
Contributor Author

liavweiss commented Dec 13, 2025

@liavweiss the HF_TOKEN is configured

@rootfs I added debug lines into test-ci-compose (in the download models section) and the output confirms the issue:

  • Repository context: vllm-project/semantic-router (workflow runs in upstream ✅)
  • Event: pull_request
  • PR head repo: liavweiss/semantic-router (PR is from a fork)
  • HF_TOKEN: Not available ❌

Root cause: GitHub Actions does not make secrets available in pull_request events when the PR is from a fork, even though the workflow runs in the upstream repository context. This is a security feature to prevent malicious fork code from accessing secrets.

Current Solution

The workflow gracefully skips Gemma download when the token is unavailable(currently implemented only on Integration Docker compose workflow), allowing PRs to pass while still running full tests on push events (after merge) where secrets are available.

Options

  1. Keep current approach (graceful skip) - Secure, but contributors can't test Gemma on PRs (only if they set a private hf_token in their fork)
  2. Use pull_request_target - Allows secrets, but has security implications
  3. Hybrid: Keep graceful skip + document manual testing via workflow_dispatch with fork secrets

What do you think about these approaches?
My suggestion is to keep the current approach (graceful skip). It's secure and allows contributors to test manually via workflow_dispatch with their fork secrets (but we need to document it).

@rootfs
Copy link
Collaborator

rootfs commented Dec 15, 2025

@liavweiss thanks for the analysis. Let's skip it then. If there is a need to use gemma, we'll revisit this issue.

@liavweiss liavweiss force-pushed the feature/gemma-model-enable branch from cec298c to fee395f Compare December 15, 2025 20:07
@liavweiss liavweiss marked this pull request as ready for review December 15, 2025 20:08
@liavweiss liavweiss force-pushed the feature/gemma-model-enable branch from fee395f to 2f67fcb Compare December 15, 2025 20:12
@liavweiss liavweiss force-pushed the feature/gemma-model-enable branch from 2f67fcb to 8e1c5ca Compare December 15, 2025 20:13
@liavweiss liavweiss marked this pull request as draft December 15, 2025 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Re-enable EmbeddingGemma-300m Support with HF_TOKEN Configuration

5 participants